Risk and resilience are the top topics in McKinsey supply chain survey

Four years on from start of pandemic, companies are accelerating efforts to diversify and localize their supply networks.

mckinsey Screen Shot 2023-11-03 at 3.46.00 PM.jpg

Companies around the globe are accelerating their efforts to diversify and localize their supply networks, as the topics of risk and resilience still dominate the supply chain agenda four years on from the start of the pandemic, according to a report from consulting firm McKinsey & Co.

Under those pressures, supply chains are seeing a profound revolution, with a dramatic increase in the adoption of advanced techniques for planning, execution, and risk management, McKinsey said in its “2023 Supply Chain Pulse Survey.” Data for this year’s survey were collected from 101 respondents, who represented a range of industry sectors on six continents. Collected from the middle of April to the middle of May 2023, the survey included questions on four major areas of supply chain management: network design, planning, digitization, and risk management.

The problems those organizations are trying to solve are clear: almost every supply chain manager in the survey said they had experienced significant issues over the previous 12 months. Some 44% reported major challenges arising from their supply chain footprint that required them to make changes during the year. And almost half (49%) said that supply chain disruptions had caused major planning challenges.

Two approaches have quickly emerged as the most common solutions in the past two years. Companies say they have improved resilience through physical changes to their supply chains by increasing their inventory buffers (78%) and by pursuing dual-sourcing strategies for critical raw materials (78%).

But this year’s adds a third option, with twice as many companies reporting using “nearshoring” strategies as last year, McKinsey found. In all, two-thirds of respondents said they were obtaining more inputs from suppliers located closer to their production sites over the past 12 months (led by the automotive and consumer goods industries). Likewise, almost two-thirds (64%) of respondents said they are transitioning from global to regional supply chains, up from 44% last year. Regionalization takes time, however. Once an organization commits to a new footprint strategy, it can be two years or longer before changes happen on the ground, especially if the strategy requires implementation of new manufacturing sites.

Looking into 2024, one major question will be what happens to the world’s bulging buffer stocks. Companies originally began to ramp up their inventories in response to pandemic-era supply chain disruptions, leading some observers to declare a transition from just-in-time supply chains to a just-in-case model. But that future is unclear, with survey results showing that inventories still remain high, and respondents are divided about their future direction. Roughly equal numbers of respondents believe that stocks will continue to rise, remain at today’s levels, or fall back to precrisis levels.

Answers to those questions could come from the digital-planning revolution that has been brewing in the supply chain field since well before 2020. The pandemic dramatically accelerated the adoption of those new technologies, including three main ingredients: end-to-end visibility, high-quality master data, and effective scenario planning. Companies are also continuing to use cross-functional integrated business planning (IBP) processes, and increasing their use of advanced planning and scheduling (APS) systems that match supply and demand in complex networks.

However, one hurdle that could slow supply chain technology adoption is the long-standing barrier of access to talent. Repeating the results of last year’s survey, only 8% of respondents say they have enough in-house talent to support their digitization ambitions. And yet to fill the gap, companies are backing away from running internal reskilling programs in the supply chain function and turning increasingly to external hiring instead.

All that turmoil has gotten the attention of executives in the board room. McKinsey said. After the large-scale disruptions of recent years, supply chain risk has moved from being a niche topic to a top three item on the senior-management agenda. With the ongoing war in Europe and heightened geopolitical tensions around the world, supply chain risks remain real in many industries. Yet structural and organizational issues may be hampering companies’ ability to make effective decisions based on their growing understanding of supply chain risks. Responsibility for risk management remains fragmented, with many companies operating multiple risk teams within the supply chain function or bundling risk management into the portfolios of teams that are already busy with other topics.

Facing all those challenges, a key task in the coming year for supply chain leaders will be maintaining their hard-won seat at the top table and continually educating the board on the importance of risk and resilience. That’s because in the absence of an immediate supply chain crisis, top-management focus tends to shift onto other issues. Yet supply chains remain vulnerable to a wide range of disruptions, from geopolitical tensions to natural disasters and climate change.

Related Articles

Survey: top supply chain risk of 2023 is semiconductor shortage, gartner survey signals increased investment in resilience over the next two years, levadata and resilinc are leading the way for the digitalization of supply chain risk, recent articles by dc velocity staff, kuehne+nagel opens texas facility to support nearshoring trend, better inventory management can guard against cargo theft, fourkites joins freight industry consortium ssc, report abusive comment.

  • NEWSLETTERS
  • ADVERTISING
  • CUSTOMER CARE
  • PRIVACY POLICY

Copyright ©2024. All Rights ReservedDesign, CMS, Hosting & Web Development :: ePublishing

Government Relations Hero

McKinsey Publishes Report on the Growth of Value-Based Care and Risk Models in Healthcare

The management consulting firm McKinsey & Company issued a report on March 24, 2022 reviewing the future of healthcare delivery. Their key findings are detailed below: 

  • Value based plans including risk bearing models run by managed services organizations and accountable care organizations are expected to grow to 22% percent of insured lives by 2025 from 15% in 2021. This growth represents an increase in coverage to around 65 million Americans. 
  • Care is becoming more patient-centric and technology-driven. Patients expect to be able to schedule an appointment, review records, and refill medications online. New medical technologies will continue to reduce the patient contact needed to provide care and reduce care delivery costs. 
  • Care is becoming more integrated. Interoperability, data access, and price transparency are driving more informed decision making for patients, providers, and payers. 

To receive text messages about SCAI’s Government Relations activities, including grassroots messages to your Congressional Representatives, text “SCAI Advocacy” to 50457.

Data rates may apply. Limited to U.S. citizens and legal residents only. 

© Society for Cardiovascular Angiography & Interventions

SCAI ® is a registered trademark of the Society for Cardiovascular Angiography & Interventions.

All content found on the Society for Cardiovascular Angiography & Interventions website (www.scai.org) including text, images, audio, video, or other formats is created for informational purposes only. Links to other content are to be taken at your own risk. SCAI is not responsible for the content or claims of external websites.

Get in touch

Follow us on social media.

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Risk management

  • Change management
  • Competitive strategy
  • Corporate strategy
  • Customer strategy

Bringing the Environment Down to Earth

  • Forest L. Reinhardt
  • From the July–August 1999 Issue

Why Serial Entrepreneurs Don't Learn from Failure

  • Deniz Ucbasaran
  • Paul Westhead
  • Mike Wright
  • From the April 2011 Issue

Strategic Analysis for More Profitable Acquisitions

  • Alfred Rappaport
  • From the July 1979 Issue

risk case study mckinsey

Managing Risks: A New Framework

  • Robert S. Kaplan
  • Anette Mikes
  • From the June 2012 Issue

Treat Employees like Adults

  • Frank Furedi
  • From the May 2005 Issue

What’s Your Company’s Water Footprint?

  • August 05, 2009

IT Doesn't Matter

  • Nicholas G. Carr
  • May 01, 2003

Debating Disaster in Order to Prepare for It

  • Ben W. Heineman Jr.
  • August 03, 2012

From Superstorms to Factory Fires: Managing Unpredictable Supply-Chain Disruptions

  • David Simchi-Levi
  • William Schmidt
  • From the January–February 2014 Issue

Pitfalls in Evaluating Risky Projects

  • James E. Hodder
  • Henry E. Riggs
  • From the January 1985 Issue

risk case study mckinsey

Life's Work: Yo-Yo Ma

  • Alison Beard
  • From the June 2016 Issue

Sell Direct-to-Consumer or Through Amazon? (HBR Case Study)

  • Thales S. Teixeira
  • March 01, 2019

High Cost of Cheap Chinese Labor

  • Paul W. Beamish
  • From the June 2006 Issue

risk case study mckinsey

When Should Multinationals Move Back into Venezuela?

  • Pablo González Alonso
  • Alejandro Valerio
  • September 01, 2017

How Risky Is Your Company?

  • Robert Simons
  • From the May–June 1999 Issue

How Machines Learn (And You Win)

  • Harvard Business Review
  • From the November 2015 Issue

How Bad Is Your Company's Consumer Debt Exposure?

  • Sarah Cliffe
  • June 22, 2009

Better Decisions with Preference Theory

  • John S. Hammond
  • From the November 1967 Issue

The Safety Calculus After BP

  • June 04, 2010

risk case study mckinsey

As Climate Risk Grows, So Will Costs for Small Businesses

  • Benjamin Collier
  • August 16, 2022

risk case study mckinsey

Maxxed Out: TJX Companies and the Largest-Ever Consumer Data Breach

  • Russell Walker
  • October 02, 2013

Coats (B): Cash Flows and Small Lot Dyeing

  • Suri Gurumurthi
  • Rebecca Hughes
  • January 21, 2017

risk case study mckinsey

Entrepreneurship Reading: Recognizing and Shaping Opportunities

  • Lynda M. Applegate
  • Carole Carlson
  • September 01, 2014

Darden Capital Management--The Monticello Fund

  • Michael J. Schill
  • December 17, 2004

Science Technology Co.--1985

  • Thomas R. Piper
  • February 02, 1989

Merck: Managing Vioxx (G)

  • Natalie Kindred
  • April 20, 2009

Scandal at Societe Generale: Rogue Trader or Willing Accomplice?

  • December 06, 2013

Group Process in the Challenger Launch Decision (A)

  • Amy C. Edmondson
  • Laura R. Feldman
  • October 15, 2000

Hewlett-Packard: Singapore (B)

  • Dorothy Leonard-Barton
  • George Thill
  • September 27, 1993

The Crisis at Tyco-A Director's Perspective

  • Suraj Srinivasan
  • May 20, 2011

The Kashagan Production Sharing Agreement (PSA)

  • Benjamin C. Esty
  • Florian Bitsch
  • May 07, 2013

Red Hen Baking Company

  • Richard S. Ruback
  • Royce Yudkoff
  • March 07, 2011

Velong: Rethinking "Made in China"

  • Krishna G. Palepu
  • Nancy Hua Dai
  • January 17, 2023

A&D High Tech (A): Managing Projects for Success

  • Mark Jeffery
  • Alex Gershbeyn
  • January 01, 2006

Brigham and Women's Hospital in 1992

  • Elizabeth Olmsted Teisberg
  • Eric J. Vayle
  • April 20, 1992

Olam International - Managing Growth and Business Risks

  • See Liang Foo
  • D.G. Allampalli
  • March 24, 2009

2006 Hurricane Risk

  • Erik Stafford
  • Andre F. Perold
  • October 23, 2006
  • David E. Bell
  • Mary Shelman
  • December 21, 2011

Connecting the Dots at Microsoft: Global Planning for a Local World (A)

  • Jeremy Hutchison-Krupat
  • Jenny Craddock
  • September 07, 2017

risk case study mckinsey

Crisis as Opportunity

  • Fifty Lessons
  • August 15, 2009

risk case study mckinsey

Teaching Note: Maxxed Out: TJX Companies and the Largest-Ever Consumer Data Breach

  • December 02, 2013

Coats (B): Cash Flows and Small Lot Dyeing, Teaching Note

Tenalpina tools: the entrepreneur's dilemma, teaching note.

  • Alfred Nanni
  • June 02, 2015

Popular Topics

Partner center.

risk case study mckinsey

Resilience quest is reshaping supply chain, says McKinsey

"Four years on from the start of the COVID-19 pandemic, risk and resilience still dominate the supply chain agenda," says report co-author and McKinsey Partner, Knut Alicke.

The quest for resilience is driving the localising and diversifying of supply chains globally, as well as the deployment of technology-based advances in value chain planning, execution and risk management, a report from McKinsey concludes. But it also paints a picture of boardrooms oblivious to the risks inherent in modern supply chain management. 

“The race for resilience is changing the way global supply chains look and transforming the way they are run,” says report co-author and McKinsey Partner Knut Alicke  in a LinkedIn post. 

“Four years on from the start of the COVID-19 pandemic, risk and resilience still dominate the supply chain agenda,” he says. “Our latest annual survey of supply chain leaders shows that companies are accelerating their efforts to diversify and localise their supply networks.”

Alicke adds: “It also reveals a profound revolution in the way those supply chains are operated, with a dramatic increase in the adoption of advanced techniques for supply chain planning, execution, and risk management.”

But the report also charts how, at most companies, the links between supply chain risk and board-level decision making are fragile. Fewer than half say supply chain risks are regularly reported at board level, and only one in ten say they have a budget allocation to support risk management issues.

Many respondents also lack confidence that their most senior leaders are "sufficiently engaged with the challenges posed by supply chain risk", with a paltry one in five feeling their supervisory boards have a deep understanding of the topic.

However, the report – Tech and regionalisation bolster supply chains, but complacency looms – does reveal a dramatic increase in what McKinsey calls “resilience actions”. 

Two-thirds of respondents say that over the past 12 months they have “obtained more inputs” from suppliers located closer to their production sites – double the share of companies who reported using nearshoring strategies last year, says McKinsey. 

Automakers and consumer goods 'heavy nearshoring sectors'

The biggest reported increases were seen in the automotive and consumer goods industries, where the incidence of nearshoring strategies rose by 60%.

Related to this rise in nearshoring, the shift from global to regional supply networks continues to gain momentum. Almost two-thirds (64%) of respondents say they are regionalising their supply chains, up from 44% last year. 

Although half of respondents say their company is dependent on inputs from another region, 89% of those also say they are looking to reduce that dependency over time. 

The report says the push for independent regional supply networks is most prominent in two regions: Europe and Southeast Asia. Its finding echo the prevailing trend in supply chain of resilience shaping the manufacture and distribution of goods worldwide . 

Moving on to inventory levels, McKinsey survey suggests these remain high, but that businesses are divided about their future intentions here, with roughly equal numbers believing stocks will either continue to rise, remain at today’s levels, or fall back to pre-crisis levels. 

Inventory findings 'surprising', say McKinsey 

Around a quarter of respondents revealed they have aggressive inventory-reduction goals, expecting stocks to drop even below those levels. 

“That finding surprised us,” says Alicke, also co-author of the book,  From Source to Sold . “It suggests either that these organisations historically held more inventory than they needed or that they do not expect significant supply disruptions soon.”

Regarding the use of technology as a driver of resilience, the number of respondents who say they have “mastered the visibility challenge” has jumped from 37% last year to 79% in 2023 – with attention firmly fixed on improving supply chain planning processes.

McKinsey says this quest for better planning typically involves “revisiting basics”, including integrated business planning processes. 

But many (79%), it says, are also adopting better planning tools, including advanced planning and scheduling systems (APS) to match supply and demand in complex networks.

Supply chain tech 'requires too much intervention'

However, the report also reveals that almost half (41%) of APS users say the technology requires too many manual interventions. Also more than a third (37%) of APS adopters say their systems are being used too narrowly. 

At these companies, says McKinsey, “significant planning decisions are likely still being made using spreadsheets and other time-consuming, error-prone approaches”. 

On a more positive note, respondents report “significant evolution” in the development of their supply chain risk management capabilities.

A total of 71% say they now have such capabilities in-house, while 93% are “assessing the effect of supply chain risk in quantitative terms”. 

Alicke’s report co-authors are:

  • Partner, Tacy Napolillo Foster
  • Senior Supply Chain Expert, Vera Trautwein
  • Engagement Manager & Supply Chain Expert, Katharina Hauck
  • Top 100 Women 2024: Donna Warton, Microsoft – No. 10 Technology
  • Coface Provides Visibility and Suppliers' Financial Health Technology
  • T2 on Supplying Quality tea, Whatever the Challenges Supply Chain Risk Management
  • Gabriel Eytan

Featured Articles

risk case study mckinsey

FuturMaster: Unlocking Untapped Potential in Supply Networks

FuturMaster, a pioneer in supply chain planning solutions, has launched the Network Insight Graph in a bid to unlock untapped potential in supply networks …

risk case study mckinsey

Why you Should Automate your Supply Chain Analytics

Supply Chain Digital takes a look at some key vendors to consider when your business is automating its supply chain analytics …

risk case study mckinsey

P&SC LIVE New York welcomes Amanda Davies, Mars Snacking

Amanda Davies, Chief R&D, Procurement and Sustainability Officer at Mars Snacking, is set to speak at Procurement & Supply Chain LIVE New York …

risk case study mckinsey

P&SC LIVE New York welcomes Kirsten Loegering, ServiceNow

risk case study mckinsey

P&SC LIVE New York welcomes Dean Ocampo, ServiceNow

risk case study mckinsey

P&SC LIVE London Welcomes New Sponsor – LeanLinking

  • Procurement & Supply Chain LIVE Dubai is LIVE!
  • One Day to Go - Procurement & Supply Chain LIVE Dubai
  • Blue Yonder Thriving in Supply Chain Fulfilment
  • The Global P&SC Awards - Submissions Deadline Extended
  • One Week to Go: Procurement & Supply Chain LIVE Dubai

47 case interview examples (from McKinsey, BCG, Bain, etc.)

Case interview examples - McKinsey, BCG, Bain, etc.

One of the best ways to prepare for   case interviews  at firms like McKinsey, BCG, or Bain, is by studying case interview examples. 

There are a lot of free sample cases out there, but it's really hard to know where to start. So in this article, we have listed all the best free case examples available, in one place.

The below list of resources includes interactive case interview samples provided by consulting firms, video case interview demonstrations, case books, and materials developed by the team here at IGotAnOffer. Let's continue to the list.

  • McKinsey examples
  • BCG examples
  • Bain examples
  • Deloitte examples
  • Other firms' examples
  • Case books from consulting clubs
  • Case interview preparation

Click here to practise 1-on-1 with MBB ex-interviewers

1. mckinsey case interview examples.

  • Beautify case interview (McKinsey website)
  • Diconsa case interview (McKinsey website)
  • Electro-light case interview (McKinsey website)
  • GlobaPharm case interview (McKinsey website)
  • National Education case interview (McKinsey website)
  • Talbot Trucks case interview (McKinsey website)
  • Shops Corporation case interview (McKinsey website)
  • Conservation Forever case interview (McKinsey website)
  • McKinsey case interview guide (by IGotAnOffer)
  • McKinsey live case interview extract (by IGotAnOffer) - See below

2. BCG case interview examples

  • Foods Inc and GenCo case samples  (BCG website)
  • Chateau Boomerang written case interview  (BCG website)
  • BCG case interview guide (by IGotAnOffer)
  • Written cases guide (by IGotAnOffer)
  • BCG live case interview with notes (by IGotAnOffer)
  • BCG mock case interview with ex-BCG associate director - Public sector case (by IGotAnOffer)
  • BCG mock case interview: Revenue problem case (by IGotAnOffer) - See below

3. Bain case interview examples

  • CoffeeCo practice case (Bain website)
  • FashionCo practice case (Bain website)
  • Associate Consultant mock interview video (Bain website)
  • Consultant mock interview video (Bain website)
  • Written case interview tips (Bain website)
  • Bain case interview guide   (by IGotAnOffer)
  • Digital transformation case with ex-Bain consultant
  • Bain case mock interview with ex-Bain manager (below)

4. Deloitte case interview examples

  • Engagement Strategy practice case (Deloitte website)
  • Recreation Unlimited practice case (Deloitte website)
  • Strategic Vision practice case (Deloitte website)
  • Retail Strategy practice case  (Deloitte website)
  • Finance Strategy practice case  (Deloitte website)
  • Talent Management practice case (Deloitte website)
  • Enterprise Resource Management practice case (Deloitte website)
  • Footloose written case  (by Deloitte)
  • Deloitte case interview guide (by IGotAnOffer)

5. Accenture case interview examples

  • Case interview workbook (by Accenture)
  • Accenture case interview guide (by IGotAnOffer)

6. OC&C case interview examples

  • Leisure Club case example (by OC&C)
  • Imported Spirits case example (by OC&C)

7. Oliver Wyman case interview examples

  • Wumbleworld case sample (Oliver Wyman website)
  • Aqualine case sample (Oliver Wyman website)
  • Oliver Wyman case interview guide (by IGotAnOffer)

8. A.T. Kearney case interview examples

  • Promotion planning case question (A.T. Kearney website)
  • Consulting case book and examples (by A.T. Kearney)
  • AT Kearney case interview guide (by IGotAnOffer)

9. Strategy& / PWC case interview examples

  • Presentation overview with sample questions (by Strategy& / PWC)
  • Strategy& / PWC case interview guide (by IGotAnOffer)

10. L.E.K. Consulting case interview examples

  • Case interview example video walkthrough   (L.E.K. website)
  • Market sizing case example video walkthrough  (L.E.K. website)

11. Roland Berger case interview examples

  • Transit oriented development case webinar part 1  (Roland Berger website)
  • Transit oriented development case webinar part 2   (Roland Berger website)
  • 3D printed hip implants case webinar part 1   (Roland Berger website)
  • 3D printed hip implants case webinar part 2   (Roland Berger website)
  • Roland Berger case interview guide   (by IGotAnOffer)

12. Capital One case interview examples

  • Case interview example video walkthrough  (Capital One website)
  • Capital One case interview guide (by IGotAnOffer)

13. Consulting clubs case interview examples

  • Berkeley case book (2006)
  • Columbia case book (2006)
  • Darden case book (2012)
  • Darden case book (2018)
  • Duke case book (2010)
  • Duke case book (2014)
  • ESADE case book (2011)
  • Goizueta case book (2006)
  • Illinois case book (2015)
  • LBS case book (2006)
  • MIT case book (2001)
  • Notre Dame case book (2017)
  • Ross case book (2010)
  • Wharton case book (2010)

Practice with experts

Using case interview examples is a key part of your interview preparation, but it isn’t enough.

At some point you’ll want to practise with friends or family who can give some useful feedback. However, if you really want the best possible preparation for your case interview, you'll also want to work with ex-consultants who have experience running interviews at McKinsey, Bain, BCG, etc.

If you know anyone who fits that description, fantastic! But for most of us, it's tough to find the right connections to make this happen. And it might also be difficult to practice multiple hours with that person unless you know them really well.

Here's the good news. We've already made the connections for you. We’ve created a coaching service where you can do mock case interviews 1-on-1 with ex-interviewers from MBB firms . Start scheduling sessions today!

The IGotAnOffer team

Interview coach and candidate conduct a video call

The promise and the reality of gen AI agents in the enterprise

The evolution of generative AI (gen AI) has opened the door to great opportunities across organizations, particularly regarding gen AI agents—AI-powered software entities that plan and perform tasks or aid humans by delivering specific services on their behalf. So far, adoption at scale across businesses has faced difficulties because of data quality, employee distrust, and cost of implementation. In addition, capabilities have raced ahead of leaders’ capacity to imagine how these agents could be used to transform work.

However, as gen AI technologies progress and the next-generation agents emerge, we expect more use cases to be unlocked, deployment costs to decrease, long-tail use cases to become economically viable, and more at-scale automation to take place across a wider range of enterprise processes, employee experiences, and customer interfaces. This evolution will demand investing in strong AI trust and risk management practices and policies as well as platforms for managing and monitoring agent-based systems.

In this interview, McKinsey Digital’s Barr Seitz speaks with senior partners Jorge Amar and Lari Hämäläinen and partner Nicolai von Bismarck to explore the evolution of gen AI agents and how companies can and should implement the technology, where the pools of value lie for the enterprise as a whole. They particularly explore what these developments mean for customer service. An edited transcript of the conversation follows.

Barr Seitz: What exactly is a gen AI agent?

Headshot of McKinsey's Lari Hamalainen

Lari Hämäläinen: When we talk about gen AI agents, we mean software entities that can orchestrate complex workflows, coordinate activities among multiple agents, apply logic, and evaluate answers. These agents can help automate processes in organizations or augment workers and customers as they perform processes. This is valuable because it will not only help humans do their jobs better but also fully digitalize underlying processes and services.

For example, in customer services, recent developments in short- and long-term memory structures enable these agents to personalize interactions with external customers and internal users, and help human agents learn. All of this means that gen AI agents are getting much closer to becoming true virtual workers that can both augment and automate enterprise services in all areas of the business, from HR to finance to customer service. That means we’re well on our way to automating a wide range of tasks in many service functions while also improving service quality.

Barr Seitz: Where do you see the greatest value from gen AI agents?

Headshot of McKinsey's Jorge Amar

Jorge Amar: We have estimated that gen AI enterprise use cases  could yield $2.6 trillion to $4.4 trillion annually in value across more than 60 use cases. 1 “ The economic potential of generative AI: The next productivity frontier ,” McKinsey, June 14, 2023. But how much of this value is realized as business growth and productivity will depend on how quickly enterprises can reimagine and truly transform work in priority domains—that is, user journeys, processes across an entire chain of activities, or a function.

Gen-AI-enabled agents hold the promise of accelerating the automation of a very long tail of workflows that would otherwise require inordinate amounts of resources to implement. And the potential extends even beyond these use cases: 60 to 70 percent of the work hours in today’s global economy could theoretically be automated by applying a wide variety of existing technology capabilities, including generative AI, but doing so will require a lot in terms of solutions development and enterprise adoption.

Consider customer service. Currently, the value of gen AI agents in the customer service environment is going to come either from a volume reduction or a reduction in average handling times. For example, in work we published earlier this year, we looked at 5,000 customer service agents using gen AI and found that issue resolution increased by 14 percent an hour, while time spent handling issues went down 9 percent. 2 “ The economic potential of generative AI: The next productivity frontier ,” McKinsey, June 14, 2023.

About QuantumBlack, AI by McKinsey

QuantumBlack, McKinsey’s AI arm, helps companies transform using the power of technology, technical expertise, and industry experts. With thousands of practitioners at QuantumBlack (data engineers, data scientists, product managers, designers, and software engineers) and McKinsey (industry and domain experts), we are working to solve the world’s most important AI challenges. QuantumBlack Labs is our center of technology development and client innovation, which has been driving cutting-edge advancements and developments in AI through locations across the globe.

The other area for value is agent training. Typically, we see that it takes somewhere between six to nine months for a new agent to perform at par with the level of more tenured peers. With this technology, we see that time come down to three months, in some cases, because new agents have at their disposal a vast library of interventions and scripts that have worked in other situations.

Over time, as gen AI agents become more proficient, I expect to see them improve customer satisfaction and generate revenue. By supporting human agents and working autonomously, for example, gen AI agents will be critical not just in helping customers with their immediate questions but also beyond, be that selling new services or addressing broader needs. As companies add more gen AI agents, costs are likely to come down, and this will open up a wider array of customer experience options for companies, such as offering more high-touch interactions with human agents as a premium service.

Barr Seitz: What are the opportunities you are already seeing with gen AI agents?

Jorge Amar: Customer care will be one of the first but definitely not the only function with at-scale AI agents. Over the past year, we have seen a lot of successful pilots with gen AI agents helping to improve customer service functions. For example, you could have a customer service agent who is on the phone with a customer and receives help in real time from a dedicated gen AI agent that is, for instance, recommending the best knowledge article to refer to or what the best next steps are for the conversation. The gen AI agent can also give coaching on behavioral elements, such as tone, empathy, and courtesy.

It used to be the case that dedicating an agent to an individual customer at each point of their sales journey was cost-prohibitive. But, as Lari noted, with the latest developments in gen AI agents, now you can do it.

Headshot of McKinsey's Nicolai von Bismarck

Nicolai von Bismarck: It’s worth emphasizing that gen AI agents not only automate processes but also support human agents. One thing that gen AI agents are so good at, for example, is in helping customer service representatives get personalized coaching not only from a hard-skill perspective but also in soft skills like understanding the context of what is being said. We estimate that applying generative AI to customer care functions could increase productivity by between 30 to 45 percent. 3 “ The economic potential of generative AI: The next productivity frontier ,” McKinsey, June 14, 2023.

Jorge Amar: Yes, and in other cases, gen AI agents assist the customer directly. A digital sales assistant can assist the customer at every point in their decision journey by, for example, retrieving information or providing product specs or cost comparisons—and then remembering the context if the customer visits, leaves, and returns. As those capabilities grow, we can expect these gen AI agents to generate revenue through upselling.

[For more on how companies are using gen AI agents, see the sidebar, “A closer look at gen AI agents: The Lenovo experience.”]

Barr Seitz: Can you clarify why people should believe that gen AI agents are a real opportunity and not just another false technology promise?

A closer look at gen AI agents: The Lenovo experience

Three leaders at Lenovo —Solutions and Services Group chief technology officer Arthur Hu, COO and head of strategy Linda Yao, and Digital Workplace Solutions general manager Raghav Raghunathan—discuss with McKinsey senior partner Lari Hämäläinen and McKinsey Digital’s Barr Seitz how the company uses generative AI (gen AI) agents.

Barr Seitz: What existing gen AI agent applications has Lenovo been running and what sort of impact have you seen from them?

Headshot of Lenovo's Arthur Hu

Arthur Hu: We’ve focused on two main areas. One is software engineering. It’s the low-hanging fruit to help our people enhance speed and quality of code production. Our people are already getting 10 percent improvements, and we’re seeing that increase to 15 percent as teams get better at using gen AI agents.

The second one is about support. We have hundreds of millions of interactions with our customers across online, chat, voice, and email. We’re applying LLM [large language model]-enhanced bots to address customer issues across the entire customer journey and are seeing some great improvements already. We believe it’s possible to address as much as 70 to 80 percent of all customer interactions without needing to pull in a human.

Headshot of Lenovo's Linda Yao

Linda Yao: With our gen AI agents helping support customer service, we’re seeing double-digit productivity gains on call handling time. And we’re seeing incredible gains in other places too. We’re finding that marketing teams, for example, are cutting the time it takes to create a great pitch book by 90 percent and also saving on agency fees.

Barr Seitz: How are you getting ready for a world of gen AI agents?

Linda Yao: I was working with our marketing and sales training teams just this morning as part of a program to develop a learning curriculum for our organization, our partners, and our key customers. We’re figuring out what learning should be at all levels of the business and for different roles.

Arthur Hu: On the tech side, employees need to understand what gen AI agents are and how they can help. It’s critical to be able to build trust or they’ll resist adopting it. In many ways, this is a demystification exercise.

Headshot of Lenovo's Raghav Raghunathan

Raghav Raghunathan: We see gen AI as a way to level the playing field in new areas. You don’t need a huge talent base now to compete. We’re investing in tools and workflows to allow us to deliver services with much lower labor intensity and better outcomes.

Barr Seitz: What sort of learning programs are you developing to upskill your people?

Linda Yao: The learning paths for managers, for example, focus on building up their technical acumen, understanding how to change their KPIs because team outputs are changing quickly. At the executive level, it’s about helping leaders develop a strong understanding of the tech so they can determine what’s a good use case to invest in, and which one isn’t.

Arthur Hu: We’ve found that as our software engineers learn how to work with gen AI agents, they go from basically just chatting with them for code snippets to developing much broader thinking and focus. They start to think about changing the software workflow, such as working with gen AI agents on ideation and other parts of the value chain.

Raghav Raghunathan: Gen AI provides an experiential learning capability that’s much more effective. They can prepare sales people for customer interactions or guide them during sales calls. This approach is having a much greater impact than previous learning approaches. It gives them a safe space to learn. They can practice their pitches ahead of time and learn through feedback in live situations.

Barr Seitz: How do you see the future of gen AI agents evolving?

Linda Yao: In our use cases to date, we’ve refined gen AI agents so they act as a good assistant. As we start improving the technology, gen AI agents will become more like deputies that human agents can deploy to do tasks. We’re hoping to see productivity improvements, but we expect this to be a big improvement for the employee experience. These are tasks people don’t want to do.

Arthur Hu: There are lots of opportunities, but one area we’re exploring is how to use gen AI to capture discussions and interactions, and feed the insights and outputs into our development pipeline. There are dozens of points in the customer interaction journey, which means we have tons of data to mine to understand complex intent and even autogenerate new knowledge to address issues.

Jorge Amar: These are still early days, of course, but the kinds of capabilities we’re seeing from gen AI agents are simply unprecedented. Unlike past technologies, for example, gen AI not only can theoretically handle the hundreds of millions of interactions between employees and customers across various channels but also can generate much higher-quality interactions, such as delivering personalized content. And we know that personalized service is a key driver of better customer service. There is a big opportunity here because we found in a survey of customer care executives we ran that less than 10 percent of respondents  in North America reported greater-than-expected satisfaction with their customer service performance. 4 “ Where is customer care in 2024? ,” McKinsey, March 12, 2024.

Lari Hämäläinen: Let me take the technology view. This is the first time where we have a technology that is fitted to the way humans interact and can be deployed at enterprise scale. Take, for example, the IVR [interactive voice response] experiences we’ve all suffered through on calls. That’s not how humans interact. Humans interact in an unstructured way, often with unspoken intent. And if you think about LLMs [large language models], they were basically created from their inception to handle unstructured data and interactions. In a sense, all the technologies we applied so far to places like customer service worked on the premise that the customer is calling with a very structured set of thoughts that fit predefined conceptions.

Barr Seitz: How has the gen AI agent landscape changed in the past 12 months?

Lari Hämäläinen: The development of gen AI has been extremely fast. In the early days of LLMs, some of their shortcomings, like hallucinations and relatively high processing costs, meant that models were used to generate pretty basic outputs, like providing expertise to humans or generating images. More complex options weren’t viable. For example, consider that in the case of an LLM with just 80 percent accuracy applied to a task with ten related steps, the cumulative accuracy rate would be just 11 percent.

Today, LLMs can be applied to a wider variety of use cases and more complex workflows because of multiple recent innovations. These include advances in the LLMs themselves in terms of their accuracy and capabilities, innovations in short- and long-term memory structures, developments in logic structures and answer evaluation, and frameworks to apply agents and models to complex workflows. LLMs can evaluate and correct “wrong” answers so that you can have much higher accuracy. With an experienced human in the loop to handle cases that are identified as tricky, then the joint human-plus-machine outcome can generate great quality and great productivity.

Finally, it’s worth mentioning that a lot of gen AI applications beyond chat have been custom-built in the past year by bringing different components together. What we are now seeing is the standardization and industrialization of frameworks to become closer to “packaged software.” This will speed up implementation and improve cost efficiency, making real-world applications even more viable, including addressing the long-tail use cases in enterprises.

Barr Seitz: What sorts of hurdles are you seeing in adopting the gen AI agent technology for customer service?

Nicolai von Bismarck: One big hurdle we’re seeing is building trust across the organization in gen AI agents. At one bank, for example, they knew they needed to cut down on wrong answers to build trust. So they created an architecture that checks for hallucinations. Only when the check confirms that the answer is correct is it released. And if the answer isn’t right, the chatbot would say that it cannot answer this question and try to rephrase it. The customer is then able to either get an answer to their question quickly or decide that they want to talk to a live agent. That’s really valuable, as we find that customers across all age groups — even Gen Z — still prefer live phone conversations for customer help and support. .

Jorge Amar: We are seeing very promising results, but these are in controlled environments with a small group of customers or agents. To scale these results, change management will be critical. That’s a big hurdle for organizations. It’s much broader than simply rolling out a new set of tools. Companies are going to need to rewire how functions work so they can get the full value from gen AI agents.

Take data, which needs to be in the right format and place for gen AI technologies to use them effectively. Almost 20 percent of most organizations, in fact, see data as the biggest challenge to capturing value with gen AI. 5 “ The state of AI in 2023: Generative AI’s breakout year ,” McKinsey, August 1, 2023. One example of this kind of issue could be a chatbot sourcing outdated information, like a policy that was used during COVID-19, in delivering an answer. The content might be right, but it’s hopelessly out of date. Companies are going to need to invest in cleaning and organizing their data.

In addition, companies need a real commitment to building AI trust and governance capabilities. These are the principles, policies, processes, and platforms that assure companies are not just compliant with fast-evolving regulations—as seen in the recent EU AI law and similar actions in many countries—but also able to keep the kinds of commitments that they make to customers and employees in terms of fairness and lack of bias. This will also require new learning, new levels of collaboration with legal and risk teams, and new technology to manage and monitor systems at scale.

Change needs to happen in other areas as well. Businesses will need to build extensive and tailored learning curricula for all levels of the customer service function—from managers who will need to create new KPIs and performance management protocols to frontline agents who will need to understand different ways to engage with both customers and gen AI agents.

The technology will need to evolve to be more flexible and develop a stronger life cycle capability to support gen AI tools, what we’d call MLOps [machine learning operations] or, increasingly, gen AI Ops [gen AI operations]. The operating model will need to support small teams working iteratively on new service capabilities. And adoption will require sustained effort and new incentives so that people learn to trust the tools and realize the benefits. This is particularly true with more tenured agents, who believe their own skills cannot be augmented or improved on with gen AI agents. For customer operations alone, we’re talking about a broad effort here, but with more than $400 billion of potential value from gen AI at stake, it’s worth it. 6 “ The economic potential of generative AI: The next productivity frontier ,” McKinsey, June 14, 2023.

Barr Seitz: Staying with customer service, how will gen AI agents help enterprises?

Jorge Amar: This is a great question, because we believe the immediate impact comes from augmenting the work that humans do even as broader automation happens. My belief is that gen AI agents can and will transform various corporate services and workflows. It will help us automate a lot of tasks that were not adding value while creating a better experience for both employees and customers. For example, corporate service centers will become more productive and have better outcomes and deliver better experiences.

In fact, we’re seeing this new technology help reduce employee attrition. As gen AI becomes more pervasive, we may see an emergence of more specialization in service work. Some companies and functions will lead adoption and become fully automated, and some may differentiate by building more high-touch interactions.

Nicolai von Bismarck: As an example, we’re seeing this idea in practice at one German company, which is implementing an AI-based learning and coaching engine. And it’s already seeing a significant improvement in the employee experience as measured while it’s rolling this out, both from a supervisor and employee perspective, because the employees feel that they’re finally getting feedback that is relevant to them. They’re feeling valued, they’re progressing in their careers, and they’re also learning new skills. For instance, instead of taking just retention calls, they can now take sales calls. This experience is providing more variety in the work that people do and less dull repetition.

Lari Hämäläinen: Let me take a broader view. We had earlier modeled a midpoint scenario when 50 percent of today’s work activities could be automated to occur around 2055. But the technology is evolving so much more quickly than anyone had expected—just look at the capabilities of some LLMs that are approaching, and even surpassing, in certain cases, average human levels of proficiency. The innovations in gen AI have helped accelerate that midpoint scenario by about a decade. And it’s going to keep getting faster, so we can expect the adoption timeline to shrink even further. That’s a crucial development that every executive needs to understand.

Jorge Amar is a senior partner in McKinsey’s Miami office, Lari Hämäläinen is a senior partner in the Seattle office, and Nicolai von Bismarck is a partner in the Boston office. Barr Seitz is director of global publishing for McKinsey Digital and is based in the New York office.

Explore a career with us

Related articles.

One large blue ball in mid air above many smaller blue, green, purple and white balls

Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale

Arthur Mensch headshot

Creating a European AI unicorn: Interview with Arthur Mensch, CEO of Mistral AI

Abstract 3D representation of artificial intelligence: a stylized silhouette of a head with a pixelated brain placed atop a cell phone, surrounded by a network emanating from the head.

Why AI-enabled customer service is key to scaling telco personalization

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 20 May 2024

Predictive modelling and identification of key risk factors for stroke using machine learning

  • Ahmad Hassan   ORCID: orcid.org/0000-0001-6515-712X 1 ,
  • Saima Gulzar Ahmad   ORCID: orcid.org/0000-0002-8820-0570 1 ,
  • Ehsan Ullah Munir   ORCID: orcid.org/0000-0001-7838-0291 1 ,
  • Imtiaz Ali Khan   ORCID: orcid.org/0000-0001-7624-1319 2 &
  • Naeem Ramzan   ORCID: orcid.org/0000-0002-5088-1462 3  

Scientific Reports volume  14 , Article number:  11498 ( 2024 ) Cite this article

Metrics details

  • Health care

Strokes are a leading global cause of mortality, underscoring the need for early detection and prevention strategies. However, addressing hidden risk factors and achieving accurate prediction become particularly challenging in the presence of imbalanced and missing data. This study encompasses three imputation techniques to deal with missing data. To tackle data imbalance, it employs the synthetic minority oversampling technique (SMOTE). The study initiates with a baseline model and subsequently employs an extensive range of advanced models. This study thoroughly evaluates the performance of these models by employing k-fold cross-validation on various imbalanced and balanced datasets. The findings reveal that age, body mass index (BMI), average glucose level, heart disease, hypertension, and marital status are the most influential features in predicting strokes. Furthermore, a Dense Stacking Ensemble (DSE) model is built upon previous advanced models after fine-tuning, with the best-performing model as a meta-classifier. The DSE model demonstrated over 96% accuracy across diverse datasets, with an AUC score of 83.94% on imbalanced imputed dataset and 98.92% on balanced one. This research underscores the remarkable performance of the DSE model, compared to the previous research on the same dataset. It highlights the model's potential for early stroke detection to improve patient outcomes.

Similar content being viewed by others

risk case study mckinsey

Multiomic analyses uncover immunological signatures in acute and chronic coronary syndromes

risk case study mckinsey

Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging

risk case study mckinsey

Principal component analysis

Introduction.

Stroke, a devastating medical condition, is a leading cause of mortality worldwide. It occurs when the blood supply to the brain is interrupted or reduced, impairing brain functions 1 . As per the World Stroke Organization (WSO), there is a significant risk associated with strokes, with one in four individuals over the age of 25 facing the possibility of experiencing a stroke during their lifetime 2 . Stroke is a common condition that significantly affects the population. Stroke is the second most common cause of death and the third most prevalent reason for impairment in adults globally. It is a major factor in both death and disability 3 . The significant impact of chronic illness on people, families, and healthcare systems highlights the need for precise and timely prediction techniques to enhance patient outcomes 4 .

In the field of medicine, machine learning has become a powerful technology that has the potential to transform stroke prevention and prediction 5 , 6 , 7 . Machine learning models use large datasets and sophisticated algorithms to identify hidden risk factors, forecast outcomes, and offer tailored strategies for treatment 8 . Stroke prediction is a vital area of research in the medical field. However, there are several problems and issues that need to be resolved 9 , 10 . The accuracy of predictive models is one of the main issues. Machine learning models have shown potential in stroke prediction. Factors such as the data quality, the choice of features, and the choice of algorithm can impact how well models perform 11 . To ensure these model's dependability and efficacy in predicting strokes, it is crucial to assess and validate these factors 12 carefully. Another critical concern is the handling of missing data. Predictive prediction model performance can be severely impacted by incomplete data, producing erroneous or biased outcomes 13 . Appropriate data imputation approaches are needed to handle missing data and increase the precision of prediction models 14 .

The data imbalance is also a concern in stroke prediction 15 . Due to the rarity of pre-stroke datasets, they frequently contain imbalanced classifications, with most instances being non-stroke cases 16 . This imbalance can result in biased models that favour the majority and ignore the minority, resulting in low forecast accuracy. To solve this issue and increase the effectiveness of prediction models, several oversampling and undersampling methods are employed, the popular of which is the SMOTE 17 , 18 . Furthermore, due to ethical considerations, it is challenging to get stroke prediction datasets, especially regarding patient privacy 19 . Predictive models that employ sensitive health data must follow strong privacy standards and protect patients' rights and autonomy 20 , 21 , 22 .

Overall, stroke prediction is a complex and challenging area of study that demands careful evaluation of numerous challenges and concerns 23 . However, by developing innovative approaches and employing rigorous evaluation methods, the potential of machine learning in stroke prediction can be fully realized 24 . These approaches and methods can improve patient outcomes and lower the societal and individual burden of stroke 25 . Addressing stroke prediction difficulties such as accuracy, missing data, data imbalance, and interpretability is critical to reaching the full potential of machine learning in this domain 26 .

The alarming statistics and various issues highlight the urgent need for effective stroke prevention and prediction strategies. This research endeavour delves into the realm of advanced machine learning models to predict strokes and identify key risk factors. By harnessing the power of these models, it aims to enhance early detection, minimize the impact of strokes, and ultimately improve patient outcomes. This study uses a comprehensive analysis of various machine-learning models to predict strokes. It makes several contributions to stroke prediction and provides various previously unknown insights, including:

Exploring various data imputation techniques and addressing data imbalance issues in order to enhance the accuracy and robustness of stroke prediction models.

Identifying crucial features for stroke prediction and uncovering previously unknown risk factors, giving a comprehensive understanding of stroke risk assessment.

Creating an augmented dataset incorporating important key risk factor features using the imputed datasets, enhancing the effectiveness of stroke prediction models.

Assessing the effectiveness of advanced machine learning models across different datasets and creating a robust Dense Stacking Ensemble model for stroke prediction.

The key contribution is showcasing the enhanced predictive capabilities of the model in accurately identifying and testing strokes, surpassing the performance of prior studies that utilized the same dataset.

These contributions collectively enhance the overall understanding of stroke prediction and key contributing factors for stroke. It highlights the potential of machine learning models in accurately identifying individuals at risk of strokes. The literature review can be found in Section " Literature review ". Moving on to Section " Dataset and preprocessing ", which delves into examining the dataset used, the challenges that arise while data preparation, and the preprocessing strategies employed. Section " Data modelling " provides an overview of the main research workflow and outlines the approach to its execution. It examines many models for data modelling. Machine learning algorithms that are used for forecasting are discussed in Section " Machine learning algorithms ". Section " Results " presents an overview of the prediction results obtained through the utilization of different machine learning models and approaches, along with a discussion subsection. Meanwhile, Section " Conclusion " encompasses a conclusion of the findings along with recommendations for future research endeavours.

Literature review

The field of stroke prediction research has been the subject of numerous contributions by various authors over an extended period that uses various datasets. However, in this paper, recent contributions are focused that utilize the same dataset as these are also used for evaluation as well. Several machine learning models, including Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression, are used to predict stroke. The authors also propose a Minimal Genetic Folding (MGF) model 27 for predicting the probability of stroke, achieving an accuracy of 83.2%. The MGF classification is the most accurate, surpassing the area under the curve (AUC) scores of the other specified kernels. The research supports the notion that a general MGF kernel could differentiate between various stages of stroke recovery, but more research is needed. The study's potential limitations include the oversampling method, which might have affected how well the MGF classifier performed.

The authors propose a strategy for predicting stroke using a Logistic Regression algorithm. The authors employ preprocessing techniques such as SMOTE, feature selection, and controlling outliers to enhance the model's performance 28 . By analyzing various factors such as blood pressure, body mass, heart conditions, age, previous smoking status, prior history of stroke, and glucose levels, the authors achieve an accuracy of 86% in stroke disease prediction, which outperforms other LR-based models. The research emphasizes the capacity of machine learning methods to reduce the adverse impacts of stroke and enable early detection. Multiple physiological attributes are used in various machine learning techniques to forecast strokes, that includes Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbours, Support Vector Machine, and Naive Bayes based 29 . The findings indicate that Naïve Bayes achieves the highest accuracy rate, reaching around 82%. These findings suggest that machine learning models can aid early stroke identification in the future.

To predict strokes and evaluate, the proposed model achieves 94% accuracy, and the model outperforms other algorithms, including Naive Bayes, Logistic Regression, Support Vector Machine, and Decision Tree 30 . The authors also used Ensembled Naive Bayes and Ensembled Decision Tree. Overall, the article's contributions to developing an integrated learning model and reorganizing the fixed structure of the developed algorithm.

The researchers employ machine learning algorithms for predicting stroke and evaluate their performance based on F1 score, recall, accuracy, and precision 31 . Preprocessing steps include handling missing values, one hot encoding, and feature scaling. The authors use three classifiers, Support Vector Machines, Decision Trees, and Logistic Regression, to train on the dataset and compare their results. The study emphasizes the value of early stroke prediction, and the paper's contribution lies in preparing the dataset using machine learning algorithms. The proposed model achieves an accuracy of 95.49% and can be used for early stroke prediction in real-world applications.

In another study, the authors put forth a predictive model for stroke detection using five different algorithms, i.e. K-Nearest Neighbours, Decision Tree, Random Forest, Support Vector Machine, and Logistic Regression 32 . A comparative analysis of the five models reveals that Random Forest has the highest accuracy of 95.5%. The authors conclude that they find that Random Forest is the model with the highest accuracy and fewest false negatives, and they use Tkinter to construct a Graphical User Interface (GUI) to make the application's use more convenient. The authors suggest that more medical attributes should be considered in future work for better performance of the model. Four distinct models are utilized, including Logistic Regression, Voting classifier, Decision Tree, and Random Forest 33 . Random Forest performs better by achieving the highest classification accuracy of 96%. The future scope of their research involves using more extensive datasets and different machine learning methods, such as AdaBoost, SVM, and bagging, to enhance prediction reliability further. The authors suggest that machine learning can aid patients in receiving early stroke treatment and enhance their quality of life.

In another article 34 , the authors explore the performance of Logistic Regression and Random Forest algorithms in predicting strokes using a preprocessed stroke dataset. The Random Forest algorithm outperforms Logistic Regression in terms of accuracy. The study also discusses the bias and variance of the models and their impact on the results. Although there are some limitations in the proposed work, such as only using two models are used however it provides valuable insight into stroke prediction research. A recent study suggests an ensemble RXLM model to predict stroke using Random Forest, XGBoost, and LightGBM 35 . The dataset is pre-processed using the KNN imputer technique, one-hot encoding, and SMOTE. The researchers fine-tune the hyperparameters of the ML algorithms by employing a random search technique to achieve optimal parameter values. The accuracy of the suggested ensemble RXLM model is 96.34%.

Authors in their study propose a machine learning model with K-Nearest Neighbours, Decision Tree, and Logistic Regression 36 . Exploratory data analysis is applied for preprocessing and uses the SMOTE technique to balance the dataset. Finally, a cloud-based mobile app is developed, which can gather user data for analysis and accurately warn a person of the likelihood of a stroke with an accuracy of 96%. Future work will focus on analyzing the dataset using deep learning methods to enhance accuracy. The authors explore ten machine-learning models to predict strokes 37 . The employed models included Gaussian Naive Bayes, Bernoulli Naive Bayes, Gradient Boosting, Stochastic Gradient Descent, K-nearest neighbours (KNN), support vector machine (SVM), Decision Tree, Random Forest, Logistic Regression, and MLP (Multi-Layer Perceptron). The study emphasizes the necessity of an early stroke diagnosis to lessen its effects with a 94% accuracy rate, the KNN algorithm outperforms the other models.

The literature review explores various machine learning models for stroke prediction that include Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression. The studies also propose new models, highlighting the importance of early detection and achieving accuracy rates ranging from 82 to 96%. However, limitations such as limited model selection, feature selection/ engineering, and dataset size are identified. This research paper addresses these deficiencies by conducting a comprehensive analysis of advanced machine learning models, identifying key risk factors along with importance, and evaluating the performance on a larger augmented dataset.

Dataset and preprocessing

The stroke prediction dataset was created by McKinsey & Company and Kaggle is the source of the data used in this study 38 , 39 . The dataset is in comma separated values (CSV) format, including demographic and health-related information about individuals and whether or not they have had a stroke. The dataset was originally comprised a total of 29,072 records, while only 30% of the data is publicly accessible and the remaining 70% is designated as private data 40 . The source of the dataset is mentioned as confidential. The data originates from medical records associated with 5110 individuals residing in Bangladesh. The dataset has underwent preprocessing procedures, which involved modifications to the original dataset sourced from Electronic Health Records (EHR) managed by McKinsey & Company 41 . The data has some missing values, and there is an imbalance between the number of people who have had a stroke and those who have not. The aim is to address these issues using different data imputation techniques and oversampling methods.

Exploratory analysis

The dataset used for stroke prediction consists of 5110 observations, each containing 12 attributes. Out of these attributes, 10 are considered relevant for the prediction task. These attributes provide valuable patient information, including their identification number, age, gender, hypertension, marital status, occupation, residence type, presence of heart disease, average glucose level, BMI, smoking habits, and stroke status.

A detailed examination of stroke occurrences concerning different features is presented in Fig.  1 , with sub-figures. In sub-figure (Fig.  1 a), it is visible that there is a slight increase in the number of strokes among females when compared to males. Moving on to sub-figure (Fig.  1 b), a rising trend in stroke cases is observed as individuals age, with the highest incidence observed around the age of 80. Sub-figure (Fig.  1 c) reveals that individuals with heart disease are more vulnerable to experiencing strokes. Marital status is explored in sub-figure (Fig.  1 d), which suggests that married individuals may have a slightly higher incidence of strokes than unmarried individuals. The comparison between stroke occurrences in urban and rural areas is depicted in sub-figure (Fig.  1 e), indicating no significant difference between these groups regarding stroke risk. In sub-figure (Fig.  1 f), the relationship between average glucose levels and stroke risk is illustrated. It shows that individuals with average glucose levels falling within 60–120 and 190–230 are at an increased risk of experiencing strokes. Hypertension is emphasized in sub-figure (Fig.  1 g). It demonstrates a higher incidence of strokes among individuals diagnosed with hypertension.

figure 1

Distribution of features concerning stroke occurrence. ( a ) through ( j ) present diverse aspects of stroke occurrences, revealing nuanced patterns. ( a ) and ( b ) demonstrate gender and age-related trends. ( c ) associates strokes with heart disease, while ( d ) suggests marital status correlations. ( e ) explores urban–rural disparities. ( f ) and ( g ) show links to average glucose levels and hypertension. ( h ) relates BMI levels to stroke incidence. ( i ) emphasizes the role of smoking history, and ( j ) explores potential occupational influences on stroke likelihood.

The relationship between BMI and stroke occurrence is examined in sub-figure (Fig.  1 h). It reveals that individuals with a BMI ranging from 20 to 40 are more prone to strokes. Smoking habits are examined in sub-figure (Fig.  1 i), where it is observed that former or never smokers are more likely to suffer from strokes than current smokers. This finding highlights the importance of considering smoking history when assessing an individual's stroke risk. Lastly, shifting the focus to occupation, sub-figure (Fig.  1 j) indicates that individuals working in private or self-employed sectors may have a greater likelihood of experiencing strokes compared to those in other occupations. This observation may be attributed to various factors such as stress levels, working conditions, and lifestyle differences among different occupational groups. Overall, the comprehensive analysis of stroke occurrences concerning different features provides valuable insights into the dataset and aids in understanding the factors contributing to stroke risk.

The dataset used in this research contains three numerical features: average glucose level, BMI, and age, while the remaining features are categorical. To assess the presence of outliers in the numerical features, box plots have been constructed and displayed in Fig.  2 . The plots illustrate a notable presence of outliers in the average glucose level (Fig.  2 b) and BMI metrics (Fig.  2 c), emphasizing the need for meticulous data preprocessing. As depicted in Fig.  3 , the distributions of individual numerical attributes diverge notably between those with and without a stroke. The discernible non-uniformity in these distributions underscores the importance of these features as promising indicators for stroke prediction.

figure 2

Box plots of numerical features to detect outliers. ( a ), ( b ) and ( c ) presents the boxt plots for age, BMI, and average glucose level to assess the presence of outliers.

figure 3

Distribution of numerical attributes with stroke and each other. ( a ), ( b ) and ( c ) presents the distribution plots for age, BMI, and average glucose level against eachother based on stroke occurences.

A more in-depth exploration of the mentioned numerical attributes holds the promise of unravelling their influence on stroke prediction, thus offering invaluable insights to enhance the accuracy and efficacy of the predictive models. To preprocess the data, outliers are removed using the robust scaler method and apply standard scaling for consistent feature ranges. One-hot encoding is also utilized to convert categorical variables into binary values. Figure  4 illustrates the visual representation of encoded features correlation, offering valuable insights into the relationships between variables. This analysis helps to uncover significant associations and dependencies among the features, enhancing the understanding of the underlying patterns and dynamics within the dataset.

figure 4

Features correlation heatmap for the dataset. Color intensity indicates the strength and direction of correlations, aiding in the identification of potential patterns and dependencies in the data.

Missing data

The dataset will now be analyzed to predict stroke while inspecting for any missing values. The sparsity/nullity matrix for the dataset is shown in Fig.  5 . It can be observed that the BMI feature has some missing values that need to be handled before proceeding with the analysis. As illustrated in Table 1 , the dataset overview provides insights into the total number of stroke cases and the count of entries with missing BMI values. The table showcases that out of the 5110 total cases, there are 201 cases with missing BMI values. Consequently, if rows with missing values were dropped, there would be a data loss percentage of 3.93%, making it almost 4% of the dataset which can lead to loss of valuable insights present in it.

figure 5

Sparsity matrix for the dataset. The empty spaces found in the corresponding column signify the presence of missing data values for the specific feature.

The naive response to missing values would be removing all those rows. But to avoid data loss through list-wise deletion, this study will use imputation techniques to fill in the missing values. Imputation is a method of replacing missing data with an appropriate approximation based on available information. However, if not chosen carefully, imputation can introduce assumptions or biases. Therefore, this study explores well-established techniques specifically chosen for their ability to mitigate these potential issues. Later, the performance of these imputation techniques will be compared with the standard approach of dropping incomplete observations. This study explores three different imputation techniques to maximize data utility while maintaining data integrity and minimizing bias.

Mean ımputation

A commonly employed technique in data preprocessing involves replacing missing values in a dataset with the mean value of the respective variables within the feature column. This approach helps maintain data integrity and ensures the resulting dataset is complete and ready for analysis. Imputing missing values with the mean minimizes the impact of incomplete data on subsequent analyses. Additionally, this method enables us to use as much available information as possible, contributing to more accurate and robust results.

Multivariate ımputation using chained equations (MICE)

It is an advanced approach that surpasses single imputations. It employs multiple imputations, allowing a more robust estimation of missing values. The process involves a sequential regression technique, where each variable's missing values are estimated using information from other variables that have complete data. MICE significantly improves the accuracy and reliability of imputations, providing a comprehensive solution for handling missing values in datasets.

Age group-based BMI ımputation

To enhance the analysis, the individuals are classified into four age groups: 0–20, 21–40, 41–60, and 61–80. The strategy of imputing the mean BMI for each respective age group is employed to address missing values. This approach allows us to account for missing data while maintaining the integrity of the analysis. Additionally, this division into age groups enables a more nuanced understanding of the relationship between age and BMI, contributing to the overall accuracy and reliability of the findings.

Data ımbalance

After addressing the missing data, the focus is shifted to the data imbalance problem. There are more non-stroke case instances than stroke case instances, making the stroke case instances a minority class. This data imbalance poses a challenge in developing accurate predictive models and warrants the need for specialized techniques to handle class imbalance effectively. An overview of the class label populations in the dataset is presented in Table 2 , revealing the presence of a minority class. This data imbalance can adversely affect model performance as the minority class is underrepresented. To mitigate the data imbalance issue, oversampling and undersampling techniques are commonly employed. Undersampling, which involves reducing the number of instances from the majority class, is not considered feasible. This approach may hinder the model's capacity to effectively learn patterns associated with stroke cases, potentially compromising its predictive accuracy. Oversampling is deemed feasible as it elevates the minority class distribution, so the predictive model performs well. However, if it is not implemented carefully, it may introduce the risk of overfitting. This study addresses this potential pitfall by investigating the efficacy of the models in predicting strokes using both balanced and imbalanced datasets. This approach will facilitates a rigorous evaluation of the predictive model's performance under diverse data conditions. To address this class imbalance issue, SMOTE is assessed as a potential remedy.

Synthetic minority oversampling technique

SMOTE is a widely recognized method for oversampling. It is employed to increase the representation of minority samples in a dataset. To understand how it operates. Consider a scenario where the training dataset consists of 's' samples and 'f' features. To enhance the representation of the minority class in the dataset, the methodology employed is known as oversampling. This technique involves selecting a sample from the minority class and identifying its k nearest neighbours in the feature space. Subsequently, it generates a new synthetic data point by combining the original data point with one of its closest neighbours. This combination is achieved by scaling the vector connecting the two points by a random number 'x' ranging from 0 to 1. Incorporating this synthetic data point into the existing dataset effectively addresses the class imbalance issue and generates a fresh, augmented data point. This process helps balance the classes and improve the overall representation of the minority class 42 .

Data modelling

A robust data modelling approach is essential to effectively analyze and predict stroke occurrences, encompassing raw data's systematic transformation and organization into a structured framework. Figure  6 depicts the data modelling pipeline utilized in this research, showcasing the various stages and methodologies employed. This pipeline enhances the data analysis and prediction approach's accuracy and efficiency.

figure 6

Pipeline for data modelling. Stroke prediction data modeling pipeline integrates techniques for missing and imbalanced data. Prediction models, from TabNet to NGBoost, undergo rigorous evaluation and testing, culminating in a Dense Stacking Ensemble (DSE) for enhanced and robust prediction results.

This research employs various techniques for stroke prediction to handle missing and imbalanced data issues. The researchers utilize mean, MICE, and age group-wise BMI mean imputation methods to handle missing values. To tackle the data imbalance issue, SMOTE is used to increase the representation of the minority class labels by generating synthetic samples. Additionally, the outliers are removed using the robust scaler method and apply standard scaling to ensure consistent feature ranges. Categorical variables are transformed into binary values through one-hot encoding. The predictive models encompass a baseline model, and then followed by advanced models including TabNet, Logistic Regression with AGD (LR-AGD), Neural Network, Random Forest, Gradient Boosting, CatBoost, LightGBM, XGBoost, Balanced Bagging, and NGBoost. The dataset is divided into training and testing data using a 70:30 ratio. The models are evaluated using k-fold cross-validation on both balanced and imbalanced imputed datasets and on the augmented dataset, generating multiple analyses for each model. Also, the trained models are tested on testing data to assess their generalization performance. The performance of each model is also compared to the standard approach of dropping incomplete observations on the original dataset. Consequently, a Dense Stacking Ensemble (DSE) model is built upon previous models after fine-tuning, with the best-performing model as a meta-classifier. Finally, all models are ranked and analyzed, including the DSE model, using various performance metrics.

Augmentation of dataset

The primary objective is to identify the key factors contributing to stroke prediction. To accomplish this, the authors highlight essential features in Fig.  7 that demonstrate a positive impact on stroke prediction and their corresponding importance factors. This analysis provides valuable insights into the factors that play a vital role in accurately predicting strokes. This study incorporates information from the previous three imputed datasets for dataset augmentation, resulting in a larger dataset with 10,421 distinct instances.

figure 7

Features with positive importance factor for stroke prediction. Features of significant positive importance for stroke prediction include age, BMI, average glucose level, heart disease, hypertension, and marital status (ever-married) respectively.

The augmented dataset includes age, BMI, average glucose level, heart disease, hypertension, ever-married, and stroke label features. Interestingly, the findings align with another previously conducted comprehensive study that used the same dataset 43 , where the critical features identified for stroke prediction using the same dataset were the same, except for the inclusion of the "ever married" feature. It is noticed that the ever-married feature has a high frequency of stroke occurrences among those individuals who were or are married.

K-fold cross validation

To assess how well the models perform, it is required to divide the dataset into training, validation, and testing data. Since no separate "unlabeled" test dataset is available, this study adopts a tenfold validation method. This means that it divides the training dataset into K parts. During each iteration, a single part is designated as the validation dataset out of the K parts obtained by dividing the training dataset. The remaining parts are utilized for training the model. This process is repeated K times, allowing for a comprehensive evaluation of the model's performance. Performance metrics are recorded for each validation set. After all iterations, the metrics are averaged across all K iterations, ensuring that each bin served as a validation dataset at least once 44 .

Machine learning algorithms

The main objective is to develop an accurate and robust predictive model for stroke prediction. The authors begin by using a baseline model to establish a reference point for model performance. They then investigate different advanced classification models to determine the accuracy of these models in predicting stroke. After thorough fine-tuning, the authors construct a robust DSE model that leverages the best-performing model as a meta-classifier. The upcoming subsections will delve into the classification models, which are also used as base models, as well as the architecture of the DSE model.

Baseline model

This study employs Logistic Regression as the baseline model for stroke prediction. It is a statistical technique widely used for binary classification tasks. It estimates the probability of a binary outcome based on input features using the logistic function as given in Eq. ( 1 ).

where z represents the linear combination of input features weighted by corresponding coefficients. This model serves as an initial benchmark for evaluating the performance of more advanced classification models.

Advanced classification models

In the quest for creating a strong stroke prediction model, a variety of advanced contemporary classification models are carefully examined without fine-tuning and put to use. These models serve a dual purpose: first, they undergo rigorous evaluation for predictive accuracy, and second, they constitute the core elements of the DSE model, which employs a layered and efficient approach to predicting strokes.

TabNet is a supervised machine learning algorithm that operates on tabular data and employs a neural network architecture with attention-based feature selection and sequential decision steps. It is designed to handle structured data and can effectively capture complex relationships between input features to make accurate predictions for stroke classification.

In Eq. ( 2 ) and ( 3 ), let X be the input feature matrix, y be the binary target variable (0 or 1) representing the stroke, and \(\hat{y}\) be the predicted stroke probabilities. The TabNet algorithm aims to find the optimal parameters \(\theta\) that minimize the loss function L .

Logistic regression with AGD

Logistic Regression models the relationship between input variables and binary output. It utilizes the logistic function to estimate the probability of the outcome, making it suitable for binary classification tasks like stroke prediction. In this study, the logistic regression model is trained efficiently using the accelerated gradient descent (AGD) optimization technique. The model is limited to 100 maximum iterations during training.

In logistic regression with AGD, the model estimates the probability of stroke \(\hat{y}\) given the input features X using the logistic function, where \(\beta\) represents the model's coefficients as given in Eq. ( 4 ).

Neural network

Neural network is a powerful machine learning model that consists of interconnected nodes or "neurons" organized in layers. It is capable of learning complex patterns from data and making non-linear predictions. The neural network used in this research has five hidden layers with 24, 36, 48, 36, and 24 neurons, respectively. It is trained to recognize significant features related to stroke prediction and make accurate decisions based on them.

In Eq. ( 5 ), the neural network involves a series of calculations with weight matrices ( W ) biases ( b ), and activation functions ( \(\sigma\) ) in each layer. The output layer uses the sigmoid activation function to obtain the predicted stroke probabilities ( \(\widehat{y}\) ). The \({W}_{out}\) and \({b}_{out}\) are the weight matrix and bias vector of the output layer, respectively.

where in Eq. ( 6 ), the neural network aims to minimize the loss function L during training.

Random forest

The Random Forest model builds a collection of decision trees and combines their predictions to make final predictions. It utilizes random feature selection and bootstrapping to create diverse tree models. The number of estimators is denoted as N RF  = 100.

Gradient boosting

The Gradient Boosting model is a powerful predictive model that utilizes a combination of weak prediction models, commonly decision trees, to generate accurate predictions. This model iteratively enhances its predictions by fitting new models to the residuals of previous models. In this study, N GB  = 100 estimators are employed to optimize the performance of the Gradient Boosting model.

The category Boosting algorithm is specifically designed for categorical data. It utilizes gradient boosting and implements novel techniques to handle categorical variables effectively. The number of estimators for this model is N CB  = 100.

A light Gradient Boosting Machine is a gradient boosting framework that aims to provide high efficiency and speed. It uses a histogram-based approach for gradient boosting and incorporates features like leaf-wise tree growth and data parallelism. N LGBM  = 100 estimators are used.

Extreme Gradient Boosting is a gradient boosting algorithm known for its scalability and performance. It combines multiple weak prediction models and employs regularization techniques to prevent overfitting. For this model, N XGB  = 100 estimators are used.

Balanced bagging

Balanced Bagging is an ensemble learning algorithm that combines multiple classifiers by training them on different subsets of the original dataset. It explicitly addresses class imbalance issues by using sampling techniques to balance the class distribution. Five Random Forest Classifiers as base estimators are used.

Natural gradient boosting is a gradient boosting algorithm that focuses on probabilistic prediction and uncertainty estimation. It utilizes natural gradient boosting and incorporates Bayesian methods for improved model calibration. For this model, N NGB  = 100 estimators and a learning rate of 0.01 are used.

Dense stacking ensemble model

The cornerstone for robust stroke prediction system is the DSE model, as its high-level architecture is visually depicted in Fig.  8 . The DSE model is meticulously crafted to optimize predictive accuracy and robustness. The DSE architecture integrates a range of fine-tuned classification models, each playing a vital role as a base model.

figure 8

Architecture of the dense stacking ensemble model. The dense stacking ensemble integrates fine-tuned base models using three distinct approaches, each incorporating the best-performing model as a meta-classifier.

Within the this model, three distinct approaches are employed, with each utilizing the best-performing model as a meta-classifier. These approaches are strategically integrated as base models in the final DSE model to enhance predictive accuracy and reliability. These approaches are explained in the following.

Voting ensemble

The Voting ensemble approach operates by collecting predictions from multiple base models and making a collective prediction based on the most popular choice. This approach leverages the best-performing model as the meta-classifier, which means it gives more weight to the predictions of this model. By combining the insights from various models, the Voting ensemble aims to maximize the overall predictive power of the DSE model. It's like having a panel of experts vote on the most likely outcome, with the best expert's opinion carrying the most weight 45 .

Blending ensemble

In the Blending ensemble approach, a meta-classifier is trained using the predictions made by the base models. The best-performing model takes on the role of the meta-classifier in this approach. This model is skilled at blending the predictions of other models in a way that optimizes their collective predictive strength. The Blending ensemble essentially learns how to combine the different model outputs best, capitalizing on the unique strengths of each model to enhance the overall predictive accuracy of the DSE model 46 .

Fusion ensemble

The harmonious integration of both the base models and the best-performing model characterizes the Fusion ensemble approach. It doesn't just use the best-performing model as the meta-classifier; it collaboratively combines the strengths and insights of all models in a synergistic manner. This approach creates a final predictive model that benefits from the diverse perspectives and capabilities of the base models, thus producing a more robust and accurate prediction within the DSE framework 47 . It's like bringing together a team of experts to solve a complex problem, with each expert contributing their unique insights and skills.

In this section, a comprehensive analysis of results, along with various score plots, is provided. The datasets are generated using Mean, MICE, and Age Group-based imputation techniques to address missing values. The analysis also encompasses the score plots for the original dataset, in which missing values are managed through list-wise deletion. Furthermore, the results for the augmented dataset are presented in a similar manner. The results are presented in three subsections: The first and second subsections focuse on the performance of various classification models, with a particular emphasis on comparing their effectiveness against a baseline model. The second subsection is about the DSE model results.

To gauge the effectiveness of the models, a diverse set of metrics is employed. These metrics encompass the confusion matrix, which includes true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values for actual and predicted data. The definition of these confusion matrix parameters can be found in Table 3 . Additional metrics, such as accuracy, precision, recall, F1 score, and AUC are utilized to provide a comprehensive evaluation of classifier performance 48 , 49 . Table 4 explains these metrics briefly. The findings highlight the effectiveness of the different imputation techniques in handling missing data and showcase the impact on model accuracy. The analysis of the augmented dataset demonstrates the improvement achieved by incorporating essential features. Overall, this section contributes to a comprehensive understanding of the various model's performance and insights for future research and model development.

Results of baseline model

The performance of the baseline model varied notably between the imbalanced and balanced datasets, as shown in Fig.  9 . On the imbalanced dataset, the highest F1 score achieved was only 23.09%, which is considerably lower than the F1 score of 87.04% observed on the balanced datasets. This discrepancy is consistent across all other metrics, including accuracy, precision, recall, and AUC. In contrast, the baseline model exhibited notably higher performance on the balanced dataset, both in terms of the original data and the imputed datasets. For instance, on the original dataset, the baseline model attained the highest accuracy of 87.12%, precision of 87.65%, recall of 86.44%, F1 score of 87.04%, and AUC of 87.12%. This level of performance was closely mirrored in the imputed datasets, with the MICE-imputed dataset showing an accuracy of 85.54%, precision of 85.95%, recall of 85.44%, F1 score of 85.46%, and AUC of 85.54%. These findings suggest that either dropping rows with missing values or employing imputation techniques can significantly enhance the usability of the dataset, likely due to the increased availability of data for training the models.

figure 9

Baseline model performance across various datasets. The performance on imbalanced datasets is substantially lower than on the balanced datasets.

Results of advanced classification models

In this section, we present the outcomes of our rigorous evaluation of advanced classification models. These models have been extensively assessed to provide insights into their predictive performance for stroke occurrences.

Imbalanced and balanced ımputed datasets results

MICE imputation yields slightly better results than Mean and Age Group-based imputation among the imbalanced imputed datasets. The top-performing model across all imputed datasets is LR-AGD, followed by NGBoost and Balanced Bagging. LR-AGD achieves a k-fold mean accuracy of 94.94%. The precision, recall, and F1 score are 95.20%, 94.94%, and 92.50%, respectively. Figure  10 shows the k-fold mean accuracy for all models on the imbalanced imputed datasets.

figure 10

Models k-fold mean accuracy on imbalanced imputed datasets. The imputed imbalanced datasets are created using three different techniques namely as man, MICE, and age group-based.

The Age Group-based balanced imputed dataset performs slightly better than the Mean and MICE imputed balanced datasets. XGBoost is the top-performing model, followed closely by LightGBM and Random Forest. The k-folds mean accuracy for the top-performing model is 93.48%, with precision, recall, and F1 score of 94.03%, 93.77%, and 93.76%, respectively. Figure  11 presents the k-fold mean accuracy of all models on balanced imputed datasets.

figure 11

Models k-fold mean accuracy on balanced imputed datasets. The imputed balanced datasets are created using three different techniques namely as man, MICE, and age group-based.

On the imbalanced MICE imputed testing dataset, LR-AGD achieves an accuracy of 96.28%, precision of 100%, recall of 14.93%, and F1 score of 25.97%. Moreover, for XGBoost on balanced Age Group-Based imputed testing dataset, the testing accuracy, precision, recall, and F1 score are 96.37%, 96.62%, 96.09%, and 96.35% respectively. The confusion matrices for the LR-AGD and XGBoost model on the respective testing datasets are displayed in Fig.  12 . The matrix provides a visual representation of the model's performance. It allows for a comprehensive analysis of the model's accuracy and the distribution of correct and incorrect predictions across different classes.

figure 12

Confusion matrices of LR-AGD and XGBoost on imputed datasets. Confusion matrix illustrating the performance of LR-AGD (right) and XGBoost (left) models in stroke case classification on imbalanced and balanced imputed datasets, respectively.

The LR-AGD model consistently displayed the highest precision for the imbalanced datasets, with a value of 95.19%. It showcased its ability to predict positive instances accurately. On the other hand, when considering the balanced datasets, the XGBoost, LightGBM, and Random Forest models emerged as the top performers in terms of precision. The XGBoost model demonstrated the highest precision, ranging from 95.56% to 95.68%, followed closely by the LightGBM and Random Forest models with 95.53% and 94.96%, respectively. The k-fold mean precision of models on all imputed datasets is displayed in Fig.  13 .

figure 13

Models k-fold mean precision on all imputed datasets. The imputed datasets are created using three different techniques namely as man, MICE, and age group-based and divided into two categories named as imbalanced and balanced.

NGBoost, Balanced Bagging, and LR-AGD models consistently show high recall values, ranging from 94.91% to 94.94%, across different imputation techniques for the imbalanced datasets. In the case of balanced datasets, XGBoost, LightGBM, and Random Forest models exhibit higher recall values of 95.90%, 95.53%, and 94.84%, respectively. These results indicate that the models are generally effective in capturing actual positive instances and correctly identifying them. The k-fold mean recall of models on all imputed datasets is displayed in Fig.  14 .

figure 14

Models k-fold mean recall on all imputed datasets. The imputed datasets are created using three different techniques namely as man, MICE, and age group-based and divided into two categories named as imbalanced and balanced.

Among the models evaluated, surprisingly, the Random Forest model achieved the highest F1 score of 94.72% on the imbalanced MICE imputed dataset and consistently performed well across all imputation techniques. XGBoost and LightGBM also performed well. When considering the balanced datasets, XGBoost, LightGBM, and Random Forest models exhibited the highest F1 scores, with values of 95.90%, 95.53% and 94.82%, respectively. Overall, these results indicate that XGBoost, LightGBM, and Random Forest models are promising models for stroke prediction, showcasing their ability to achieve accurate classifications across different dataset characteristics. The k-fold mean F1 score of models on all imputed datasets is displayed in Fig.  15 .

figure 15

Models k-fold mean f1 score on all imputed datasets. The imputed datasets are created using three different techniques namely as man, MICE, and age group-based and divided into two categories named as imbalanced and balanced.

Imbalanced and balanced original datasets results

Among the models evaluated on the original dataset, which is created by removing the missing value rows, LR-AGD emerges as the top performer, closely followed by NGBoost and Balanced Bagging. LR-AGD achieves a k-fold mean accuracy of 95.46%, while NGBoost and Balanced Bagging achieve 95.46% and 95.43%, respectively. LR-AGD exhibits a precision, recall, and F1 score of 94.03%, 93.77%, and 93.76%, respectively. These exceptional results underscore the robustness and efficacy of LR-AGD in handling complex data scenarios. The k-fold mean accuracy for all models on imbalanced and balanced datasets is displayed in Fig.  16 . For a balanced dataset with removed missing value rows, XGBoost emerges as the top-performing model, achieving a k-fold mean accuracy of 96.14%. It also exhibits high precision, recall, and F1 score, all at 96.14%. LightGBM and Random Forest follow closely behind in terms of second and third best models.

figure 16

Models k-fold mean accuracy on original datasets. The dataset is categorized into two groups: imbalanced, reflecting its initial state, and balanced, achieved after employing oversampling technique.

LR-AGD on imbalanced testing original dataset achieves the testing accuracy of 96.81%, precision of 87.50%, recall of 13.21%, and F1 score of 22.95%, respectively. When tested on the balanced original dataset, XGBoost maintains its superior performance with a testing accuracy of 98.33% and F1 score of 98.86%, while precision of 98.95%, and recall of 98.77%. The confusion matrices for the LR-AGD and XGBoost models on the respective testing datasets are shown in Fig.  17 .

figure 17

Confusion matrices of LR-AGD and XGBoost on original datasets. Confusion matrix illustrating the performance of LR-AGD (right) and XGBoost (left) models in stroke case classification on imbalanced and balanced original datasets, respectively.

On the imbalanced original dataset, LR-AGD consistently achieved the highest precision and recall scores, with values of 93.42% and 95.46%, respectively. Regarding precision, the second and third best models are Neural Network and Balanced Bagging. NGBoost and Balanced Bagging models also showed impressive recall, the same as the LR-AGD. When considering the F1 score, which provides a balanced measure of precision and recall, all models achieved similar scores of around 93.3% except Neural Network. On the other hand, on the balanced datasets, XGBoost, LightGBM, and Random Forest consistently outperformed the other models regarding precision, recall, and F1 score, with values of 96.14%, 95.78%, and 95.44%, respectively.

Above mentioned results highlight the effectiveness of LR-AGD, XGBoost, LightGBM, and Random Forest in accurately classifying strokes across imbalanced and balanced datasets depending on the nature of the balance in the dataset, emphasizing their potential for stroke prediction applications. Furthermore, the k-fold mean precision, recall, and F1 score of these models on both imbalanced and balanced original datasets are visually depicted in Fig.  18 .

figure 18

Models k-fold mean performance metrics on original datasets. The dataset is categorized into two groups: imbalanced, reflecting its initial state, and balanced, achieved after employing oversampling technique.

Imbalanced and balanced augmented datasets results

The Random Forest model demonstrates exceptional performance on the augmented dataset with imbalance, achieving a k-fold mean accuracy of 97.41%. It also exhibits high precision, recall, and F1 score, with values of 97.23%, 97.41%, and 97.14%, respectively. The remaining models display mean accuracies ranging from 95 to 96%. Figure  19 depicts the k-fold mean accuracy of all models on the augmented dataset, both imbalanced and balanced. The top performing model on the balanced augmented dataset is Random Forest, achieving a k-fold mean accuracy of 99.45% with a precision of 99.46% and 99.45% of recall and F1 score. It is closely followed by Balanced Bagging and XGBoost, with mean accuracies of 99.07% and 97.94%, respectively.

figure 19

Models k-fold mean accuracy on augmented datasets. The dataset is categorized into two groups: imbalanced, and balanced, achieved after employing oversampling technique.

When evaluated on the imbalanced testing dataset, the Random Forest model maintains its strong performance with a testing accuracy of 97.57%, precision of 85.94%, recall of 65.48%, and F1 score of 74.32%. Random Forest also demonstrates impressive performance in testing on balanced dataset, with an accuracy of 99.61%, precision of 99.23%, recall of 100%, and an F1 score of 99.61%. The confusion matrices of the Random Forest model on imbalanced and balanced augmented testing datasets are depicted in Fig.  20 .

figure 20

Confusion matrices of random forest on augmented datasets. Confusion matrix illustrating the performance of Random Forest model in stroke case classification on imbalanced (left) and balanced (right) augmented datasets, respectively.

The Random Forest model consistently demonstrates strong performance across all metrics on imbalanced and balanced augmented datasets. It achieves high precision scores of 97.23% on imbalanced data and 99.46% on balanced data, indicating its ability to identify true positive instances accurately. The Random Forest model also exhibits impressive recall scores of 97.41% on imbalanced data and 99.45% on balanced data, highlighting its capability to capture a high proportion of actual positive instances. In terms of F1 score, Random Forest Model achieves a balanced performance with scores of 97.14% on imbalanced data and 99.45% on balanced data. This indicates a harmonious balance between precision and recall, emphasizing its effectiveness in stroke prediction. The Balanced Bagging and XGBoost models also deliver competitive results across all metrics, showcasing their potential for accurate classification on balanced and imbalanced datasets. The k-fold mean precision, recall, and F1 score of models on all augmented datasets are shown in Fig.  21 .

figure 21

Models k-fold mean performance metrics on augmented datasets. The dataset is categorized into two groups: imbalanced, and balanced, achieved after employing oversampling technique.

Results of dense stacking ensemble model

In the analysis of base model results, it becomes apparent that the MICE-imputed datasets produce marginally superior outcomes. Notably, the Random Forest model stands out as a top performer. Within the DSE model, the Random Forest model assumes the role of the meta-classifier, while the remaining models serve as base models, highlighting the synergy derived from their collective strengths. The DSE model showcased remarkable performance on MICE-imputed datasets. For the imbalanced MICE-imputed datasets, the model yielded an accuracy of 96.13%, precision of 93.26%, recall of 96.18%, and an F1 score of 94.88%. Similarly, on balanced MICE-imputed datasets, the DSE model achieved an accuracy of 96.59%, with precision of 95.25%, recall of 96.27%, and F1 score of 95.79%. These results, also visualized in Fig.  22 , highlight the robust performance of the DSE model when tested on imputed datasets.

figure 22

K-fold mean performance matrices of the proposed DSE model. The MICE imputed dataset is categorized into two groups: imbalanced, and balanced, achieved after employing oversampling technique.

The analysis of the AUC metric for the proposed DSE model reveals compelling insights into its predictive performance across different datasets. On the imbalanced dataset with MICE imputation, the DSE model achieves an AUC of 83.94%, showcasing its ability to discern between positive and negative instances despite the data's skewed distribution. Conversely, on the balanced dataset, the DSE model excels even further, attaining an impressive AUC of 98.92%. This substantial increase in AUC on the balanced dataset underscores the model's enhanced discriminatory power and robustness when trained on a more representative and balanced data distribution. The significant performance improvement achieved by the DSE model on the balanced dataset compared to the imbalanced one is visually represented in Fig.  23 .

figure 23

AUC results of the proposed DSE model. ( a ) represents the AUC results on imbalanced MICE imputed dataset and ( b ) on the balanced one.

Furthermore, on the imbalanced testing dataset, the model shows a testing accuracy of 99.15%, a precision of 84.93%, a recall of 98.88%, and an F1 score of 90.51%. For the balanced testing dataset, the model gives a testing accuracy of 97.19%, precision of 96.83%, recall of 97.38%, and an F1 score of 97.10%. The confusion matrices of the DSE model on imbalanced and balanced MICE-imputed testing datasets are depicted in Fig.  24 .

figure 24

Confusion matrices of the proposed DSE model. Proposed DSE model’s confusion matrices on MICE imputed balanced and imbalanced datasets.

The analysis of feature importance is conducted to ascertain the influential factors in stroke prediction using the proposed DSE model. The analysis of feature importance revealed distinct patterns between the imbalanced and balanced datasets, as visualized in Fig.  25 . In both datasets, the top three features influencing stroke prediction were average glucose level, age, and BMI. However, notable differences were observed in their relative importance. In the imbalanced dataset, these top three features were relatively close in importance, with average glucose level slightly more influential than age and BMI. Conversely, in the balanced dataset, age emerged as the most important feature by a significant margin, followed by average glucose level and BMI. Additionally, the imbalanced dataset highlighted hypertension and heart disease as the 4th and 5th most important features, while the balanced dataset indicated that marital status (yes and no) played a more significant role in prediction. Interestingly, features such as work type (never worked and children) and gender (other) showed minimal contribution in both datasets, underscoring their limited impact on stroke prediction outcomes.

figure 25

Feature importance comparison for the proposed DSE model. Feature importance graphs for imbalanced and balanced MICE-imputed datasets are displayed in ( a ) and ( b ) respectively.

While LR-AGD and XGBoost deliver accurate results with high accuracy, they both exhibit limitations. LR-AGD performs well when the data is imbalanced, but its performance significantly decreases when the dataset is balanced, behaving differently and yielding lower accuracy. Conversely, XGBoost performs exceptionally well on balanced datasets but struggles with imbalanced ones. Linear models excel in simpler, easily separable small data, while non-linear models perform better in complex and equally represented variables with intricate relationships. This highlights the importance of creating an appropriate model to handle these kinds of versatile dataset characteristics to yield optimal performance in stroke prediction. It is crucial to consider the balance between precision and recall to make informed decisions regarding model selection. Additionally, further research and development are needed to address the limitations of LR-AGD and XGBoost to enhance their performance across various dataset scenarios.

However, when an augmented dataset is created, incorporating crucial factors significantly contributing to stroke prediction, Random Forest emerges as the superior model. It consistently outperforms other models on both imbalanced and balanced augmented datasets with a mean accuracy of 97.409% and 99.068%, respectively. Random Forest also gives consistent and around 95% accurate results for non-augmented datasets. Ultimately, the Random Forest is then used as a meta-classifier in the DSE model. Tables 5 and 6 provide a comprehensive summary of the mean accuracy of advanced classification models and the DSE model across all imbalanced and balanced datasets, highlighting the superior performance of the DSE model. The DSE model achieves far more superior results when other models are incorporated within it as base models and Random Forest as meta-classifiers. The DSE achieves the highest accuracy ranging above 96% across all types of datasets, making it the most feasible and robust model for stroke prediction on diverse datasets.

Additionally, Table 7 compares stroke prediction results from the previous recent studies that utilized the same dataset. This comparative analysis provides valuable insights into the top-performing DSE machine learning model's performance on imbalanced and balanced datasets, showcasing its respective accuracies. The table serves as a comprehensive reference for understanding the effectiveness of these models in stroke prediction. The study 27 shows that the minimal genetic folding (mGF) model achieves an accuracy of 83.2% on the balanced dataset. Another study 28 utilizes Logistic Regression and achieves an accuracy of 86.00%. Naive Bayes 29 achieves an accuracy of 82.00%. Random Forest 30 achieves an impressive accuracy of 94.46% on the imbalanced dataset, while Support Vector Machine 31 reaches an accuracy of 95.49%. Additionally, Random Forest is studied 32 , 33 , 34 , 36 with accuracies ranging from 95.50% to 96.00%. The proposed RXLM 35 model achieves an accuracy of 96.34% on the balanced dataset. K-nearest Neighbours 37 model achieves accuracies of 94.00% on the balanced dataset. In this study, the proposed DSE model achieves an impressive accuracy of 96.13% on the imbalanced imputed dataset and 96.59% on the balanced dataset. The notable distinction in the DSE model's performance can be attributed to its unique ability to harness the strengths of multiple base models through ensemble techniques. By employing a strategic combination of Voting, Blending, and Fusion ensembles, the DSE model maximizes predictive accuracy by leveraging the diverse perspectives and capabilities of each individual model. This sophisticated integration of ensemble methods enables the DSE model to outperform standalone models, as seen in previous studies. Overall, Table 7 provides a comprehensive overview of stroke prediction results, showcasing the performance of previously used models on imbalanced and balanced datasets along with the performance of the proposed DSE model.

In the domain of practical application, the DSE model exhibits a seamless integration into a real-life scenario, as demonstrated in Fig.  26 . Users, whether they be individuals concerned about their health or medical professionals, can effortlessly input vital signs and demographic information through a user-friendly mobile or web application. This data is then securely transmitted to a cloud server where the pre-trained DSE model is deployed. The model processes the input information swiftly, with an average prediction time of 0.095 s per subject, showcasing its efficiency. Upon completion of the prediction, results are promptly relayed back to the user through the same cloud server, accessible via the mobile or web app. Crucially, if a subject is predicted to be at risk of a stroke, the system offers the option for immediate online consultation with a healthcare professional or assistance in locating a physical medical service through a third-party service. This innovative approach not only underscores the model's applicability in real-world scenarios but also highlights its potential to contribute significantly to proactive healthcare management.

figure 26

Integration of proposed DSE model in real-life scenario. The DSE model quickly processes vital signs on a user-friendly app, offering timely stroke risk predictions via a cloud server.

Given the substantial global impact of strokes on mortality rates, there is an urgent need for robust and generalizable early prediction methods. While stroke prediction models are pivotal in pinpointing high-risk individuals, they face obstacles such as missing data and data imbalance. This study aims to create an improved predictive model for stroke prediction and evaluate its performance across various imbalanced and balanced datasets. The comprehensive analysis of various advanced machine learning models for stroke prediction that are presented in this research paper sheds light on the efficacy of different techniques and models in handling missing data and data imbalance. The study reveals that most significant factors for stroke prediction are age, BMI, average glucose level, heart disease, hypertension, and ever-married status. Subsequently, an augmented dataset is created to incorporate these essential features, with the goal of enhancing the accuracy of stroke prediction models. The study uses an extensive range of advanced models such as TabNet, Logistic Regression, Neural Network, Random Forest, Gradient Boosting, CatBoost, LightGBM, XGBoost, Balanced Bagging, and NGBoost. The performance evaluation of predictive models is done by employing fivefold cross-validation.

The MICE imputation technique shows slightly better performance compared to two alternative methods. LR-AGD excels on imbalanced data with the highest accuracy of 96.46% and XGBoost performs well on balanced datasets with the highest accuracy of 96.14%. However, their effectiveness is limited by dataset characteristics. In contrast, Random Forest delivers consistent and generalizable results with an accuracy rate around 95% on all non-augmented datasets. This characteristic becomes particularly evident when using the augmented dataset, it gives highest accuracy above 97%. After thorough evaluation, a more robust Dense Stacking Ensemble (DSE) model is constructed. The Random Forest model acts as the meta-classifier within the DSE model, with other models serving as base models post fine-tuning. The DSE model exhibits robust performance across both imbalanced and balanced MICE-imputed datasets. In the case of imbalanced dataset, the model achieved an accuracy of 96.13%, precision of 93.26%, recall of 96.18%, and an F1 score of 94.88%. Similarly, on balanced dataset, the DSE model achieved an accuracy of 96.59%, precision of 95.25%, recall of 96.27%, and F1 score of 95.79%. In terms of AUC, the DSE model achieved an AUC of 83.94% on the imbalanced dataset. On the balanced dataset, the DSE model excelled further by reaching an impressive AUC of 98.92%. These AUC scores demonstrates the ability of DSE model to distinguish between positive and negative instances. In conclusion, the DSE model consistently delivers robust and stable results for stroke prediction across diverse datasets.

In the future, the validation scope can be expanded with larger datasets that will encompass more features. Additionally, it is intended to explore diverse data formats which will include images and hybrid datasets. Furthermore, more extensive and diverse datasets will provide valuable insights and facilitate the generalizability of the findings to a broader population. By conducting external validation studies on these diverse and independent datasets, the authors aim to evaluate and validate the performance of the proposed DSE model. The proposed DSE model can integrate seamlessly into daily life using mobile or web applications, allowing the users to input health data effortlessly. With swift processing and prediction time the results can be relayed promptly. This integration will help in making prompt healthcare intervention in cases of predicted stroke risk.

Data availability

The McKinsey & Company's stroke prediction dataset for healthcare analytics is used in this study. The dataset is available publicly at Analytics Vidhya and Kaggle website at: https://datahack.analyticsvidhya.com/contest/mckinsey-analytics-online-hackathon or https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset .

Bersano, A. & Gatti, L. Pathophysiology and treatment of stroke: Present status and future perspectives. Int. J. Mol. Sci. 24 , 14848 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Feigin, V. L. et al. World stroke organization (wso): Global stroke fact sheet 2022. Int. J. Stroke 17 , 18–29 (2022).

Article   PubMed   Google Scholar  

Feigin, V. L. et al. Global, regional, and national burden of stroke and its risk factors, 1990–2019: A systematic analysis for the global burden of disease study 2019. Lancet Neurol. 20 , 795–820 (2021).

Article   CAS   Google Scholar  

Katan, M. & Luft, A. Global burden of stroke. Semin. Neurol. 38 , 208–211 (2018).

Pitchai, R. et al. An artificial intelligence-based bio-medical stroke prediction and analytical system using a machine learning approach. Comput. Intell. Neurosci. 2022 , 1–9 (2022).

Google Scholar  

Amann, J. Machine learning in stroke medicine: Opportunities and challenges for risk prediction and prevention. Adv. Neuroethics https://doi.org/10.1007/978-3-030-74188-4_5 (2021).

Article   Google Scholar  

Moshawrab, M., Adda, M., Bouzouane, A., Ibrahim, H. & Raad, A. Reviewing multimodal machine learning and its use in cardiovascular diseases detection. Electronics 12 , 1558 (2023).

Javaid, M., Haleem, A., Pratap Singh, R., Suman, R. & Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 3 , 58–73 (2022).

MacEachern, S. J. & Forkert, N. D. Machine learning for precision medicine. Genome 64 , 416–425 (2021).

Bonkhoff, A. K. & Grefkes, C. Precision medicine in stroke: Towards personalized outcome predictions using artificial intelligence. Brain 145 , 457–475 (2021).

Article   PubMed Central   Google Scholar  

Sarker, I. H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2 , 160 (2021).

Yu, J. et al. AI-based stroke disease prediction system using real-time electromyography signals. Appl. Sci. 10 , 6791 (2020).

Nijman, S. et al. Missing data is poorly handled and reported in prediction model studies using machine learning: A literature review. J. Clin. Epidemiol. 142 , 218–229 (2022).

Kumar, Y., Koul, A., Singla, R. & Ijaz, M. F. Artificial intelligence in disease diagnosis: A systematic literature review, synthesizing framework and future research agenda. J. Ambient Intell. Hum. Comput. 14 , 8459–8486 (2022).

Kokkotis, C. et al. An explainable machine learning pipeline for stroke prediction on imbalanced data. Diagnostics 12 , 2392 (2022).

Sirsat, M. S., Fermé, E. & Câmara, J. Machine learning for brain stroke: A review. J. Stroke Cerebrovasc. Dis. 29 , 105162 (2020).

Wongvorachan, T., He, S. & Bulut, O. A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information 14 , 54 (2023).

Sowjanya, A. M. & Mrudula, O. Effective treatment of imbalanced datasets in health care using modified smote coupled with stacked deep learning algorithms. Appl. Nanosci. 13 , 1829–1840 (2022).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Bernat, J. L. & Lukovits, T. G. Ethical issues in stroke management. Neurol. Clin. Pract. 11 , 3–5 (2021).

Murdoch, B. Privacy and artificial intelligence: Challenges for protecting health information in a new era. BMC Med. Ethics https://doi.org/10.1186/s12910-021-00687-3 (2021).

Martin, C. et al. The ethical considerations including inclusion and biases, data protection, and proper implementation among AI in radiology and potential implications. Intell. Based Med. 6 , 100073 (2022).

Wu, Y. & Fang, Y. Stroke prediction with machine learning methods among older Chinese. Int. J. Environ. Res. Public Health 17 , 1828 (2020).

Kaur, M., Sakhare, S. R., Wanjale, K. & Akter, F. Early stroke prediction methods for prevention of strokes. Behav. Neurol. 2022 , 1–9 (2022).

Alanazi, E. M., Abdou, A. & Luo, J. Predicting risk of stroke from lab tests using machine learning algorithms: Development and evaluation of prediction models. JMIR Format. Res. 5 , e23440 (2021).

Monteiro, M. et al. Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ACM Trans. Comput. Biol. Bioinform. 15 , 1953–1959 (2018).

Shobayo, O., Zachariah, O., Odusami, M. O. & Ogunleye, B. Prediction of stroke disease with demographic and behavioural data using random forest algorithm. Analytics 2 , 604–617 (2023).

Mezher, M. A. Genetic folding (GF) algorithm with minimal kernel operators to predict stroke patients. Appl. Artif. Intell. https://doi.org/10.1080/08839514.2022.2151179 (2022).

Guhdar, M., Ismail Melhum, A. & Luqman Ibrahim, A. Optimizing accuracy of stroke prediction using logistic regression. J. Technol. Inform. (JoTI) 4 , 41–47 (2023).

Sailasya, G. & Kumari, G. L. Analyzing the performance of stroke prediction using ml classification algorithms. Int. J. Adv. Comput. Sci. Appl. https://doi.org/10.14569/IJACSA.2021.0120662 (2021).

Paul, D., Gain, G. & Orang, S. Advanced random forest ensemble for stroke prediction. Int. J. Adv. Res. Comput. Commun. Eng. https://doi.org/10.17148/IJARCCE.2022.11343 (2022).

Geethanjali, T. M., Divyashree, M. D., Monisha, S. K. & Sahana, M. K. Stroke prediction using machine learning. Int. J. Emerg. Technol. Innov. Res. 8 , 710–717 (2021).

Harshitha, K. V., Harshitha, P., Gupta, G., Vaishak, P. & Prajna, K. B. Stroke prediction using machine learning algorithms. Int. J. Innov. Res. Eng. Manag. https://doi.org/10.21276/ijirem.2021.8.4.2 (2021).

Tazin, T. et al. Stroke disease detection and prediction using robust learning approaches. J. Healthc. Eng. 2021 , 1–12 (2021).

Chen, Z. Stroke risk prediction based on machine learning algorithms. Highlights Sci. Eng. Technol. 38 , 932–941 (2023).

Alruily, M., El-Ghany, S. A., Mostafa, A. M., Ezz, M. & El-Aziz, A. A. A-tuning ensemble machine learning technique for cerebral stroke prediction. Appl. Sci. 13 , 5047 (2023).

Islam, Md. M. et al. Stroke prediction analysis using machine learning classifiers and feature technique. Int. J. Electron. Commun. Syst. 1 , 57–62 (2021).

Uma, S. K. & Rakshith, S. R. Stroke analysis using 10 ml comparison. Int. J. Res. Appl. Sci. Eng. Technol. 10 , 3857–3862 (2022).

Fedesoriano. Stroke prediction dataset. Kaggle . https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data (2021).

Mattas, P. S. Brain stroke prediction using machine learning. Int. J. Res. Publ. Rev. 3 , 711–722 (2022).

Pathan, M. S., Jianbiao, Z., John, D., Nag, A. & Dev, S. Identifying stroke indicators using rough sets. IEEE Access 8 , 210318–210327 (2020).

Emon, M. U. et al. Performance Analysis of Machine Learning Approaches in stroke prediction. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (2020).

Hassan, A. & Yousaf, N. Bankruptcy prediction using diverse machine learning algorithms. In 2022 International Conference on Frontiers of Information Technology (FIT) (2022).

Dev, S. et al. A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2 , 100032 (2022).

Nguyen, L. P. et al. The utilization of machine learning algorithms for assisting physicians in the diagnosis of diabetes. Diagnostics 13 , 2087 (2023).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mahajan, P., Uddin, S., Hajati, F. & Moni, M. A. Ensemble learning for disease prediction: A review. Healthcare 11 , 1808 (2023).

Jagan, S. et al. A meta-classification model for optimized zbot malware prediction using learning algorithms. Mathematics 11 , 2840 (2023).

Zhen, M. et al. Application of a fusion model based on machine learning in visibility prediction. Remote Sens. 15 , 1450 (2023).

Article   ADS   Google Scholar  

Yuan, Q., Chen, K., Yu, Y., Le, N. Q. & Chua, M. C. Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding. Brief. Bioinform. https://doi.org/10.1093/bib/bbac630 (2023).

Le, N.-Q.-K. & Ou, Y.-Y. Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins. BMC Bioinform. https://doi.org/10.1186/s12859-016-1369-y (2016).

Download references

Acknowledgements

The authors express gratitude to the anonymous reviewers for their insightful comments and suggestions, contributing to the enhancement of this paper.

This research was partially supported by the SAFE-RH project under Grant No. ERASMUS + CBHE-619483 EPP-1-2020-1-UK-EPPKA2-CBHE.

Author information

Authors and affiliations.

Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan

Ahmad Hassan, Saima Gulzar Ahmad & Ehsan Ullah Munir

Department of Computer Science, Cardiff School of Technologies, Llandaff Campus, Western Avenue, Cardiff, CF5 2YB, UK

Imtiaz Ali Khan

School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley, PA1 2BE, UK

Naeem Ramzan

You can also search for this author in PubMed   Google Scholar

Contributions

Ahmad Hassan wrote the main manuscript text, Saima Gulzar Ahmad and Ehsan Ullah Munir developed figures, and Naeem Ramzan contributed to the results section. Imtiaz Ali Khan conducted additional simulations and included supplementary results. All authors reviewed and enhanced the manuscript write-up. The final manuscript was read and approved by all authors.

Corresponding author

Correspondence to Naeem Ramzan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hassan, A., Gulzar Ahmad, S., Ullah Munir, E. et al. Predictive modelling and identification of key risk factors for stroke using machine learning. Sci Rep 14 , 11498 (2024). https://doi.org/10.1038/s41598-024-61665-4

Download citation

Received : 26 February 2024

Accepted : 08 May 2024

Published : 20 May 2024

DOI : https://doi.org/10.1038/s41598-024-61665-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

risk case study mckinsey

StrategyCase.com

  • The 1%: Conquer Your Consulting Case Interview
  • Consulting Career Secrets
  • Cover Letter & Resume
  • McKinsey Solve Game (Imbellus)
  • BCG Online Case (+ Pymetrics, Spark Hire)
  • Bain Aptitude Tests (SOVA, Pymetrics, HireVue)
  • Kearney Recruitment Test
  • BCG Cognitive Test Practice
  • All-in-One Case Interview Preparation
  • Industry Cheat Sheets
  • Structuring & Brainstorming
  • Data & Chart Interpretation
  • Case Math Mastery
  • McKinsey Interview Academy
  • Brainteasers

Case Interview Examples (2024): A Collection from McKinsey and Others

case interview examples from consulting firms such as mckinsey, bcg or bain

Last Updated on January 11, 2024

Whenever you prepare for case interviews, you have to practice as realistically as possible and mimic the real case study interview at McKinsey , BCG , Bain , and others. One way to do this and make your preparation more effective is to practice real cases provided by the firms you apply to.

It will help you to understand what the differences are across firms, how they structure and approach their cases, what dimensions are important to them, and what solutions they consider to be strong.

Below is a steadily expanding selection of real case interview examples provided by different management consulting firms.

Before wasting your money on case interview collection books that use generic cases, use original cases first. Additionally, use professional case coaches, who interviewed for the top firms , to mimic the real interview experience and get real, actionable feedback to improve.

Please be aware that cases are just one part of a typical consulting interview. It is equally important to prepare for behavioral and fit interview questions .

McKinsey Case Interview Examples

  • Loravia – Transforming a national education system
  • SuperSoda – Electro-light product launch
  • GlobaPharm – Pharma R&D
  • Bill & Melinda Gates Foundation – Diconsa financial services offering
  • Beautify – Customer approach
  • Shops – DEI strategy
  • Talbot Trucks – Electric truck development
  • Conservation Forever – Nature conservation

We have written a detailed article on the McKinsey application process, the McKinsey interview timeline, the typical McKinsey case interview, and the McKinsey Personal Experience interview here . You can expect similar cases regardless of your position (e.g. in a McKinsey phone case interview or interviewing for a McKinsey internship as well as a full-time BA, Associate, or Engagement Manager role).

Boston Consulting Group (BCG) Case interview Examples

  • Consumer Goods – Climate strategy
  • Banking – Client satisfaction
  • Consumer Goods – IT strategy
  • Chateau Boomerang – Written case

Bain and Company case interview examples

  • NextGen Tech
  • FashionCo .

Ace the case interview with our dedicated preparation packages.

the image is the cover of a case interview industry overview

Deloitte Case Interview Examples

  • Federal Agency – Engagement strategy
  • Federal Benefits Provider – Strategic vision
  • Apparel – Declining market share
  • Federal Finance Agency – Architecture strategy
  • MedX – Smart pill bottle
  • Federal Healthy Agency – Finance strategy
  • LeadAuto – Market expansion
  • Federal Bureau – Talent management

Strategy& Case Interview Examples

  • Strategy& tips and examples  (case examples included )

Accenture Case Interview Examples

  • Accenture interview tips and examples (case examples included )

Kearney Case Interview Examples

  • Promotional planning

Roland Berger Case Interview Examples

  • Transit-oriented development Part 1
  • Transit-oriented development Part 2
  • 3D printed hip implants Part 1
  • 3D printed hip implants Part 2

Oliver Wyman Case Interview Examples

  • Wumbleworld – theme park
  • Aqualine – boats

LEK Case Interview Examples

  • Video case interview example (currently unavailable)
  • Market sizing video example
  • Brainteaser (scroll to the bottom of the page)

Simon Kucher Case Interview Examples

  • Smart phone pricing

OC&C Case Interview Examples

  • Imported whiskey in an emerging market – business strategy
  • Leisure clubs – data interpretation

Capital One Case Interview Examples

  • How to crack case interviews with Capital One (includes case examples)

Bridgespan Case Interview Examples

  • Robinson Philanthropy – Strategy
  • Reach for the Stars – Student success
  • Home Nurses for New Families – Expansion strategy
  • Venture Philanthropy – Charity

Consulting Clubs Case Interview Books

Contact us at [email protected] for a collection of consulting club case interview books (from Harvard, ESADE, LBS, Columbia, etc.).

How We Help You Ace Your Case Interviews

We have specialized in placing people from all walks of life with different backgrounds into top consulting firms both as generalist hires as well as specialized hires and experts. As former McKinsey consultants and interview experts, we help you by

  • tailoring your resume and cover letter to meet consulting firms’ highest standards
  • showing you how to pass the different online assessments and tests for McKinsey , BCG , and Bain
  • showing you how to ace McKinsey interviews and the PEI with our video academy
  • coaching you in our 1-on-1 sessions to become an excellent case solver and impress with your fit answers (90% success rate after 5 sessions)
  • preparing your math to be bulletproof for every case interview
  • helping you structure creative and complex case interviews
  • teaching you how to interpret charts and exhibits like a consultant
  • providing you with cheat sheets and overviews for 27 industries .

Reach out to us if you have any questions! We are happy to help and offer a tailored program to help you break into consulting.

the image is the cover for the florian smeritschnig case coaching program, the best on the internet

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

risk case study mckinsey

Florian spent 5 years with McKinsey as a senior consultant. He is an experienced consulting interviewer and problem-solving coach, having interviewed 100s of candidates in real and mock interviews. He started StrategyCase.com to make top-tier consulting firms more accessible for top talent, using tailored and up-to-date know-how about their recruiting. He ranks as the most successful consulting case and fit interview coach, generating more than 500 offers with MBB, tier-2 firms, Big 4 consulting divisions, in-house consultancies, and boutique firms through direct coaching of his clients over the last 3.5 years. His books “The 1%: Conquer Your Consulting Case Interview” and “Consulting Career Secrets” are available via Amazon.

Most Popular Products

All-in-One Case Interview Preparation

Search website

Strategycase.com.

© 2024 | Contact: +43 6706059449 | Mattiellistrasse 3/28, 1040 Vienna, Austria

  • Terms & Conditions
  • Privacy Policy
  • Universities & consulting clubs
  • American Express

Click on the image to learn more.

The image is the cover for the bestselling consulting case interview book by florian smeritschnig

  • The McKinsey Solve Assessment - 2023 Guide with Redrock Case Update

Work with us

McKinsey’s Solve assessment has been making candidates sweat ever since it was initially trialled at the firm’s London office back in 2017 - and things have gotten even more difficult since the new version launched in Spring 2023 added the Redrock case study.

Since its initial roll-out, the Solve assessment is definitely the most idiosyncratic, but also the most advanced, of the screening tests used by the MBB firms.

It can be hard to understand how an ecology-themed video game can tell McKinsey whether you’ll make a good management consultant, let alone know how to prepare yourself to do well in that game. When you consider that McKinsey are potentially cutting 70%+ of the applicant pool based on this single test, you can hardly blame applicants for being worried.

Matters are definitely not helped by the dearth of reliable information about what could very well be - with a top-tier consulting job on the line - the most important test you will take over your entire career. This was already true with the version of Solve that had been around for a few years, let alone the new one.

What information is available online is then often contradictory. In particular, there is a huge amount of disagreement as to whether it is actually possible for you to meaningfully prepare for the Solve assessment - before you’ve even considered how to go about that preparation. There is also a lot of confusion and inaccuracy around the new Redrock Case - largely as it is so recent as an addition and individual test takers tend to misremember details.

McKinsey Solve assessment screenshot of undersea ecosystem

Luckily, we at MCC have been interviewing test takers both before and after the Redrock Case rollout and following up to see which strategies and approaches actually work to push individuals through to interview.

Here, we’ll explain that it is indeed possible to prepare effectively for both versions of Solve and give you some ideas for how you can get started. Understanding how the Solve assessment works, what it tests you for and how is critical for all but the most hurried preparations.

This article makes for a great introduction to the Solve assessment. However, if you are going to be facing this aptitude test yourself and want full information and advice for preparation, then you should ideally get our full PDF guide:

MCC McKinsey Solve Guide

What is the McKinsey Solve assessment?

Coral reef with many red fish swimming around it, represetning the aquatic ecosystem from the McKinsey Solve assessment

In simple terms, the McKinsey Solve assessment is a set of ecology-themed video games. In these games, you must do things like build food chains, protect endangered species, manage predator and prey populations and potentially diagnose diseases within animal populations or identify natural disasters.

Usually, you will be given around 70 minutes to complete two separate games, spending about the same amount of time on each.

Until recently, these games had uniformly been Ecosystem Building and Plant Defence. However, since Spring 2023, McKinsey has been rolling out a new version across certain geographies. This replaces the Plant Defence game with the new Redrock Case Study. Some other games have also been run as tests.

We’ll run through a little more on all these games below to give you an idea of what you’ll be up against for both versions and possible new iterations.

In the past, candidates had to show up to a McKinsey office and take what was then the Digital Assessment or PSG on a company computer. However, candidates are now able to take the re-branded Solve assessment at home on their own computers.

Test-takers are allowed to leverage any assistance they like (you aren’t spied on through your webcam as you would be with some other online tests), and it is common to have a calculator or even another computer there to make use of.

Certainly, we strongly advise every candidate to have at least a pen, paper and calculator on their desk when they take the Solve assessment.

Common Question: Is the Solve assessment the same thing as the PSG?

In short, yes - “Solve” is just the newer name for the McKinsey Problem Solving Game.

We want to clear up any potential confusion right at the beginning. You will hear this same screening test called a few different things in different places. The Solve moniker itself is a relatively recent re-branding by McKinsey. Previously, the same test was known as either the Problem Solving Game (usually abbreviated to PSG) or the Digital Assessment. You will also often see that same test referred to as the Imbellus test or game, after the firm that created the first version.

You will still see all these names used across various sites and forums - and even within some older articles and blog posts here on MyConsultingCoach. McKinsey has also been a little inconsistent on what they call their own assessment internally. Candidates can often become confused when trying to do their research, but you can rest assured that all these names refer to the same screening test - though folk might be referring to either the legacy or Redrock versions.

Why does the assessment exist?

Screenshot of an Island from the McKinsey Solve assessment

As with Bain, BCG and other major management consulting firms, McKinsey receives far far more applications for each position than they can ever hope to interview. Compounding this issue is that case interviews are expensive and inconvenient for firms like McKinsey to conduct. Having a consultant spend a day interviewing just a few candidates means disrupting a whole engagement and potentially having to fly that consultant back to their home office from wherever their current project was located. This problem is even worse for second-round interviews given by partners.

Thus, McKinsey need to cut down their applicant pool as far as possible, so as to shrink the number of case interviews they need to give without losing the candidates they actually want to hire. Of course, they want to then accomplish this as cheaply and conveniently as possible.

The Problem Solving Test (invariably shortened to PST) had been used by McKinsey for many years. However, it had a number of problems that were becoming more pronounced over time, and it was fundamentally in need of replacement. Some of these were deficiencies with the test itself, though many were more concerned with how the test fitted with the changing nature of the consulting industry.

The Solve assessment has been developed and iterated by the specialist firm Imbellus ( now owned by gaming giant Roblox ) to replace the long-standing PST in this screening role and offers solutions to those problems with its predecessor.

We could easily write a whole article on what McKinsey aimed to gain from the change, but the following few points cover most of the main ideas:

  • New Challenges: In particular, the changing demands of the consulting industry mean that McKinsey is increasingly seeking a new kind of hire. Previously, candidates were largely coming out of MBAs or similar business-focussed backgrounds, and so the PST’s quickfire business questions in a familiar GMAT-style format were perfectly sufficient to select consultants for what were fairly non-technical generalist consulting roles. However, clients have been bringing ever more technical projects to firms like McKinsey. This has led to the increasing internal segmentation of consulting firms - to create specialist digital divisions , for example. Even in generalist consulting, there is also now an increasing recognition of the utility of individuals with real depth of knowledge coming out of either industry or non-business academic routes like PhD programmes (you can read more about getting into consulting without an MBA here ). This feeds through to change what constitutes a good aptitude test for McKinsey. Without the once-crucial MBA, McKinsey can’t assume the same kind of detailed business knowledge. They also ideally want a single test that can be given to all kinds of prospective specialist consultants as well as generalists.
  • Fairness and the Modern Context: The covid pandemic necessitated at-home aptitude testing. However, even before that, there were pressures for a move to a largely remote recruitment process. Online testing - versus real-life papers, sat on location - dramatically reduces the amount of travel required of candidates. This allows McKinsey to cast a wider net, providing more opportunities to those living away from hub cities, whilst also hugely reducing the carbon footprint associated with the McKinsey selection process.
  • Gaming the System: More pragmatically, the Solve assessment promises to simply do the job of selecting the right candidates better than the PST. All things being equal, just increasing the candidate pool with an online test should lead to better quality candidates emerging to top the cohort. However, the Solve assessment also promises to do a better job at ranking those candidates in line with their actual abilities. A large part of this is that it is a much harder test to “game” than the PST was, where highly effective prep resources were available and readily allowed a bad candidate with good preparation to do better than a good candidate. The fact that game parameters change for every individual test-taker also cuts down the risk of some candidates having an unfair advantage by receiving details of the tests being used from those who have already taken them. The recent move towards the Redrock version also helps McKinsey stay ahead of those developing prep resources for the legacy Solve assessment.
  • Cost Cutting: A major advantage of scrapping the old pen-and-paper PST is that the formidable task of thinning down McKinsey’s applicant pool can be largely automated. No test rooms and invigilation staff need to be organised and no human effort is required to devise, transport, catalogue and mark papers. This is especially impactful when we consider that the Solve assessment’s advanced “process scoring” function allows the kind of nuanced filtering of candidates that would usually require something like an essay-based exam to accomplish, rather than the multiple-choice PST. Imbellus has provided this without the huge time and effort from expert human markers that would usually be required - so McKinsey has gained ability whilst eliminating cost and inconvenience.

How is the Solve Assessment used by McKinsey?

McKinsey's own account of how the Solve assessment is used in selection can be seen in the following video:

Whilst some offices initially stuck with the old PST, the legacy Solve assessment was soon rolled out globally and is given universally to candidates for roles at pretty well every level of the hierarchy. Certainly, if you are a recent grad from a Bachelor’s, MBA, PhD or similar, or a standard experienced hired, you can expect to be asked to complete the Solve assessment.

Likewise, we can expect the new Redrock Case Study version to be rolled out globally - though at this point it seems you might be given either (especially as McKinsey has been having significant technical problems with this new online case study) and so should be ready for both.

At present, it seems that only those applying for very senior positions or perhaps those with particularly strong referrals and/or connections are allowed to skip the test. Even this will be office-dependent.

As noted above, one of the advantages of the Solve assessment is that it can be given to all of McKinsey’s hires. Thus, you can expect to be run into the same games whether you are applying as a generalist consultant or to a specialist consulting role - with McKinsey Digital , for example.

The takeaway here is that, if you are applying to McKinsey for any kind of consulting role, you should be fully prepared to sit the Solve Assessment!

Where does the Solve assessment fit into the recruitment process?

You can expect to receive an invitation to take the Solve assessment shortly after submitting your resume.

Flow chart showing the different stages of the McKinsey recruitment process

It seems that an initial screen of resumes is made, but that most individuals who apply are invited to take the Solve assessment.

Any initial screen is not used to make a significant cut of the candidate pool, but likely serves mostly to weed out fraudulent applications from fake individuals (such as those wishing to access the Solve assessment more than once so they practice...) and perhaps to eliminate a few individuals who are clearly far from having the required academic or professional background or have made a total mess of their resumes.

Your email invitation will generally give you either one or two weeks to complete the test, though our clients have seen some variation here - with one individual being given as little as three days.

Certainly, you should plan to be ready to sit the Solve assessment within one week of submitting your resume!

Once you have completed the test, McKinsey explain on their site that they look at both your test scores and resume (in more detail this time) to determine who will be invited to in-person case interviews. This will only be around 30% of the candidates who applied - possibly even less.

One thing to note here is that you shouldn’t expect a good resume to make up for bad test scores and vice versa. We have spoken to excellent candidates whose academic and professional achievements were not enough to make up for poor Solve performance. Similarly, we don’t know of anyone invited to interview who hadn’t put together an excellent resume.

Blunty, you need great Solve scores and a great resume to be advanced to interview.

Your first port of call to craft the best possible resume to land your invitation to interview is our excellent free consulting resume guide .

Impress your interviewer

What does the solve assessment test for.

Chart from Imbellus showing how they test for different related cognitive traits

Whilst information on the Solve assessment can be hard to come by, Imbellus and McKinsey have at least been explicit on what traits the test was designed to look for. These are:

Diagram showing the five cognitive traits the Solve Assessment examines

  • Critical Thinking : making judgements based on the objective analysis of information
  • Decision Making : choosing the best course of action, especially under time pressure or with incomplete information
  • Metacognition : deploying appropriate strategies to tackle problems efficiently
  • Situational Awareness : the ability to interpret and subsequently predict an environment
  • Systems Thinking : understanding the complex causal relationships between the elements of a system

Equally important to understanding the raw fact of the particular skillset being sought out, though, is understanding the very idiosyncratic ways in which the Solve assessment tests for these traits. Let's dive deeper:

Process Scores

Perhaps the key difference between the Solve assessment and any other test you’ve taken before is Imbellus’s innovation around “process scores”.

To explain, when you work through each of the games, the software examines the solutions you generate to the various problems you are faced with. How well you do here is measured by your “product score”.

However, scoring does not end there. Rather, Imbellus’s software also constantly monitors and assesses the method you used to arrive at that solution. The quality of the method you used is then captured in your “process score”.

To make things more concrete here, if you are playing the Ecosystem Building game, you will not only be judged on whether the ecosystem you put together is self-sustaining. You will also be judged on the way you have worked in figuring out that ecosystem - presumably, on how efficient and organised you were. The program tracks all your mouse clicks and other actions and will thus be able to capture things like how you navigate around the various groups of species, how you place the different options you select, whether you change your mind before you submit the solution and so on.

You can find more detail on these advanced aspects of the Solve assessment and the innovative work behind it in the presentation by Imbellus founder Rebecca Kantar in the first section of the following video:

Compared to other tests, this is far more like the level of assessment you face from an essay-based exam, where the full progression of your argument towards a conclusion is marked - or a maths exam, where you are scored on your working as well as the final answer (with, of course, the major advantage that there is no highly qualified person required to mark papers).

Clearly, the upshot of all this is that you will want to be very careful how you approach the Solve assessment so you generally think before you act and show yourself in a very rational, rigorous, ordered light.

We have some advice for how to help look after your process score in our PDF Guide to the McKinsey Solve Assessment .

A Different Test for Every Candidate

Another remarkable and seriously innovative aspect of the Solve assessment is that no two candidates receive exactly the same test.

Imbellus automatically varies the parameters of the different games to be different for each individual test-taker so that each will be given a meaningfully different game to everyone else’s.

Within a game, this might mean a different terrain setting, having a different number of species or different types of species to work with or more or fewer restrictions on which species will eat which others.

Consequently, even if your buddy takes the assessment for the same level role at the same office just the day before you do, whatever specific strategy they used in their games might very well not work for you.

This is an intentional feature designed to prevent test takers from sharing information with one another and thus advantaging some over others. At the extreme, this feature would also be a robust obstacle to any kind of serious cheating.

If cheating seems far-fetched to some readers, remember how competitive the race to land jobs at top consulting firms is. We at MCC have previously been made aware of individuals purporting to sell the answers to consulting screening tests on the black market. If cheating is possible, there is always a risk it can happen.

To manage to give every candidate a different test and still be able to generate a reliable ranking of those candidates across a fundamental skillset, without that test being very lengthy, is a considerable achievement from Imbellus. At high level, this would seem to be approximately equivalent to reliably extracting a faint signal from a very noisy background on the first attempt almost every time.

Taking this level of trouble - and presumed additional expense - shows how seriously McKinsey take the task of ensuring reliable, fair selection by trying to eliminate anything like cheating, or even just normal information flow between candidates, that might have happened with something like the PST.

(Note that we are yet to confirm this also happens with the new Redrock Case Study, but it seems to be set up to allow for easy changes to be made to the numerical values describing the case, so we assume there will be the same kind of variation.)

What does it all mean for you?

Understanding what you are being tested for is obviously crucial in preparing yourself for any kind of assessment. For the Solve assessment, this is especially true the longer the time you have to prepare.

Over longer preps, an understanding of exactly the kind of traits being examined allows you to select skill-building activities that should actually show transference in boosting your test performance.

Of course,  this begs the obvious question…

Can I Prepare for the McKinsey Solve Assessment?

Clown fish swimming in a coral reef

In short, yes you can - and you should!

As noted previously, there has previously been a lot of disagreement over whether it is really possible to prep for the Solve assessment in a way that actually makes a difference.

Especially regarding the legacy version, there has been a widespread idea that the Solve assessment functions as something like an IQ test, so that preparation beyond very basic familiarisation to ensure you don’t panic on test day will not do anything to reliably boost your scores (nobody is going to build up to scoring an IQ of 200 just by doing practice tests, for example).

This rationale says that the best you can do is familiarise yourself with what you are up against to calm your nerves and avoid misunderstanding instructions on test day. However, this school of thought says there will be minimal benefit from practice and/or skill building.

The utility of preparation has become a little clearer with the addition of the Redrock Case Study to the new version of Solve. Its heavily quantitative nature, strong time pressure and structure closely resembling a traditional business case make for a clearer route to improvement.

However, as we explain in more detail in our PDF guide to the Solve assessment the idea that any aspect of either version of Solve can't be prepared for has been based on some fundamental misunderstandings about what kind of cognitive traits are being tested. Briefly put, the five key skills the Solve assessment explicitly examines are what are known as higher-order thinking skills.

Crucially, these are abilities that can be meaningfully built over time.

McKinsey and Imbellus have generally advised that you shouldn’t prepare. However, this is not the same as saying that there is no benefit in doing so. McKinsey benefits from ensuring as even a playing field as possible. To have the Solve test rank candidates based purely on their pre-existing ability, they would ideally wish for a completely unprepared population.

Group of blue fish in a coral reef

There has been a bit of variation in the games included in the Solve assessment/PSG over the years and what specific form those games take. Imbellus and McKinsey have experimented with whole new configurations as well as making smaller, iterative tweaks over time. That being said, the new 2023 Redrock case is by far the largest change to Solve since that assessment's genesis back in 2017.

Given that innovation seems to continue (especially with the lengthy feedback forms some candidates are being asked to sit after sitting the newest iteration), there is always the chance you might be the first to receive something new.

However, our surveys of, and interviews with, those taking the Solve assessment - both before and after recent changes - mean we can give you a good idea of what to expect if you are presented with either the legacy or the Redrock version of Solve.

We provide much more detailed explanation of each of the games in our Solve Assessment PDF Guide - including guidance on optimal scenarios to maximise your performance. Here, though, we can give a quick overview of each scenario:

Ecosystem Building

Screenshot showing the species data from the ecosystem building game

In this scenario, you are asked to assemble a self-sustaining ecosystem in either an aquatic, alpine or jungle environment (though do not be surprised if environments are added, as this should be relatively easy to do without changing the underlying mechanics).

The game requires you to select a location for your ecosystem. Several different options are given, all with different prevailing conditions. You then have to select a number of different plant and animal species to populate a functioning food chain within that location.

In previous versions of the game, you would have had to fit as many different species as possible into a functioning food chain. However, recent iterations of the Solve assessment require a fixed number of eight species to be selected.

Species selection isn’t a free-for-all. You must ensure that all the species you select are compatible with one another - that the predator species you select are able to eat the prey you have selected for them etc. All the species must also be able to survive in the conditions prevailing at the location you have selected.

So far, this sounds pretty easy. However, the complexity arises from the strict rules around the manner and order in which the different species eat one another. We run through these in detail in our guide, with tips for getting your food chain right. However, the upshot is that you are going to have to spend some significant time checking your initial food chain - and then likely iterating it and replacing one or more species when it turns out that the food chain does not adhere to the eating rules.

Once you have decided on your food chain, you simply submit it and are moved on to the next game. In the past, test takers were apparently shown whether their solution was correct or not, but this is no longer the case.

Test-takers generally report that this game is the easier of the two, whether it is paired with the Plant Defence game in the legacy Solve or the Redrock Case Study in the new version. Candidates will not usually struggle to assemble a functioning ecosystem and do not find themselves under enormous time pressure. Thus, we can assume that process scores will be the main differentiator between individuals for this component of the Solve assessment.

For ideas on how to optimise your process score for this game, you can see our PDF Solve guide .

Plant Defence

Screenshot showing the plant defence game in progress

As mentioned, this game has been replaced with the Redrock Case Study in the new newer iteration of the Solve assessment, rolled out from Spring 2023. However you might still be asked to sit the legacy version, with this game, when applying to certain offices - so you should be ready for it!

This scenario tasks you with protecting an endangered plant species from invasive species trying to destroy it.

The game set-up is much like a traditional board game, with play taking place over a square area of terrain divided into a grid of the order of 10x10 squares.

Your plant is located in a square near the middle of the grid and groups of invaders - shown as rats, foxes or similar - enter from the edges of the grid before making a beeline towards your plant.

Your job then is to eliminate the invaders before they get to your plant. You do this by placing defences along their path. These can be terrain features, such as mountains or forests, that either force the invaders to slow down their advance or change their path to move around an obstacle. To actually destroy the invaders though, you use animal defenders, like snakes or eagles, that are able to deplete the groups of invaders as they pass by their area of influence.

Complication here comes from a few features of the game. In particular:

  • You are restricted in terms of both the numbers of different kinds of defenders you can use and where you are allowed to place them. Thus, you might only have a couple of mountains to place and only be allowed to place these in squares adjacent to existing mountains.
  • The main complication is the fact that gameplay is not dynamic but rather proceeds in quite a restricted turnwise manner. By this, we mean that you cannot place or move around your defences continuously as the invaders advance inwards. Rather, turns alternate between you and invaders and you are expected to plan your use of defences in blocks of five turns at once, with only minimal allowance for you to make changes on the fly as the game develops.

The plant defence game is split into three mini-games. Each mini-game is further split into three blocks of five turns. On the final turn, the game does not stop, but continues to run, with the invaders in effect taking more and more turns whilst you are not able to place any more defences or change anything about your set-up.

More and more groups of invaders pour in, and your plant will eventually be destroyed. The test with this “endgame” is simply how many turns your defences can stand up to the surge of invaders before they are overwhelmed.

As opposed to the Ecosystem Building scenario, there are stark differences in immediate candidate performance - and thus product score - in this game. Some test takers’ defences will barely make it to the end of the standard 15 turns, whilst others will survive 50+ turns of endgame before they are overwhelmed.

In this context, as opposed to the Ecosystem Building game typically preceding it, it seems likely that product score will be the primary differentiator between candidates.

We have a full discussion of strategies to optimise your defence placement - and thus boost your product score - in our Solve guide .

Redrock Case Study

Pack of wolves running through snow, illustrating the wolf packs central tot he Redrock case study

This is the replacement for the Plant Defence game in the newest iteration of Solve.

One important point to note is that, where the Solve assessment contains this case study, you have a strict, separate time limit of 35 minutes for each half of the assessment. You cannot finish one game early and use the extra time in the other, as you could in the legacy Solve assessment.

McKinsey has had significant issues with this case study, with test takers noting several major problems. In particular:

  • Glitches/crashes - Many test takers have had the Redrock Case crash on them. Usually, this is just momentary and the assessment returns to where it was in a second or two. If this happens to you, try to just keep calm and carry on. However, there are reports online of some candidates having the whole Solve assessment crash and being locked out as a result. If this happens, you should contact HR.
  • Confusing interface - Candidates have routinely noted that Redrock's controls are confusing and seem poorly designed compared to the older Ecosystem Building game preceding it. This means that they can often lose time figuring out how to interact with the case.
  • Confusing language - Related to the above is that the English used is often rather convoluted and sometimes poorly phrased. This can be challenging even for native English speakers but is even worse for those sitting Solve in their second language. It can make the initial instructions difficult to understand - compounding the previous interface problem. It can also make questions difficult, requiring a few readings to comprehend.
  • Insufficient time - Clearly, McKinsey intended for Redrock to be time pressured. However, time is so scarce that pretty well nobody is getting through all the questions. This is plainly sub-optimal for McKinsey - as well as being stressful and disheartening for candidates. We would expect changes to be made to address this issue in future.

McKinsey are clearly aware of these issues, as they have been asking some test takers to complete substantial feedback surveys after sitting Redrock. Be aware, then, that this raises the likelihood of changes to the Redrock Case Study in the near term - meaning you should be ready to tackle something new.

For the time being, though, we can take you through the fundamentals of the current version of the Redrock Case Study. For more detail, see our freshly updated PDF Guide .

The Scenario

Whilst changes to the details are likely in future, the current Redrock Case Study is set on the Island of Redrock. This island is a nature reserve with populations of several species, including wolves and elk. Redrock's wolves are split into four packs, associated with four geographical locales. These packs predate the elk and depend upon them for food, such that there is a dynamic relationship between the population numbers of both species. Your job is to ensure ecological balance by optimising the numbers of wolves in the four packs, such that both wolves and elk can sustainably coexist.

The Questions

The Redrock case study's questions were initially split into three sections, but a fourth was later added. These sections break down as follows:

  • Investigation - Here, you have access to the full description of the case, with all the data on the various animal populations. Your task is to efficiently extract all the most salient data points and drag-and-drop them to your "Research Journal" workspace area. This is important, as you subsequently lose access to all the information you don't save at this stage.
  • Analysis - You must answer three numerical questions using information you saved in the Investigation section. This can include you dragging and dropping values to and from an in-game calculator.
  • Report - Formerly the final section, you must complete a pre-written report on the wolf populations, including calculating numerical values to fill in gaps and using an in-game interface to make a chart to illustrate your findings. You will leverage information saved in the Investigation section, as well as answers calculated in the Analysis section.
  • Case Questions - This section adds a further ten individual case questions. These are thematically similar to the preceding case, but are otherwise separate, not relying on any information from the previous sections. The ten questions are highly quantitative and extremely time pressured. Pretty well nobody finishes them before being timed out.

This is a very brief summary - more detail is available in our PDF Guide .

Other Games - Disease and Disaster Identification

Screenshot of a wolf and beaver in a forest habitat from the Solve assessment

There have been accounts of some test-takers being given a third game as part of their Solve assessment. At time of writing, these third games have always been clearly introduced as non-scored beta tests for Imbellus to try out potential new additions to the assessment. However, the fact that these have been tested means that there is presumably a good chance we’ll see them as scored additions in future.

Notably, these alternative scenarios are generally variations on a fairly consistent theme and tend to share a good deal of the character of the Ecosystem Building game. Usually, candidates will be given a whole slew of information on how an animal population has changed over time. They will then have to wade through that information to figure out either which kind of natural disaster or which disease has been damaging that population - the commonality with the Ecosystem Building game being in the challenge of dealing with large volumes of information and figuring out which small fraction of it is actually relevant.

Join thousands of other candidates cracking cases like pros

How to effectively prep for the solve assessment.

Two stingrays and a shark swimming in blue water, lit from above

We discuss how to prep for the Solve assessment in full detail in our PDF guide . Here though, we can give you a few initial pointers to get you started. In particular, there are some great ways to simulate different games as well as build up the skills the Solve assessment tests for.

Playing video games is great prep for the legacy Solve assessment in particular, but remains highly relevant to the new Redrock version.

Contrary to what McKinsey and Imbellus have said - and pretty unfortunately for those of us with other hobbies - test-takers have consistently told us that they reckoned the Problem Solving Game and now the Solve assessment pretty robustly favours those with strong video gaming experience.

If you listened when your parents told you video games were a waste of time and really don’t have any experience, then putting in some hours on pretty much anything will be useful. However, the closer the games you play are to the Solve scenarios, the better. We give some great recommendations on specific games and what to look for more generally in our Solve guide - including one free-to-play game that our clients have found hugely useful as prep for the plant defence game!

PST-Style Questions

The inclusion of the Redrock case studies in the new version of Solve really represents a return to something like a modernised PST. Along with the similar new BCG Casey assessment, this seems to be the direction of travel for consulting recruitment in general.

Luckily, this means that you can leverage the wealth of existing PST-style resources to your advantage in preparation.

Our PST article - which links to some free PST questions and our full PST prep resources - is a great place to start. However, better than old-fashioned PDF question sets are the digital PST-style questions embedded in our Case Academy course . Conducted online with a strict timer running, these are a much closer approximation of the Solve assessment itself. These questions are indeed a subset of our Case Academy course, but are also available separately in our Course Exercises package .

Quick Mathematics With a Calculator

Again, specifically for the Redrock assessment, you will be expected to solve math problems very quickly. The conceptual level of mathematics required is not particularly high, but you need to know what you are doing and get through it fast using a calculator (and/or Excel, if you are already comfortable with that program).

Our article on consulting math is a great place to start to understand what is expected of you throughout the recruiting process, with our consulting math package (a subset of our Case Academy course) providing more in-depth lessons and practice material.

Learn to Solve Case Studies

With the Redrock Case Study clearly being an ecology-themed analogue to a standard business case study, it's pretty obvious that getting good at case studies will be useful.

However, the Solve assessment as a whole is developed and calibrated to be predictive of case interview performance, so you can expect that improving your case solving ability will indirectly bring up your performance across the board.

Of course, this overlaps with your prep for McKinsey's case interviews. For more on how to get started there, see the final section of this article.

Learning About Optimal Strategies for the Games

The first thing to do is to familiarise yourself with the common game scenarios from the Solve assessment and how you can best approach them to help boost your chances of success.

Now, one thing to understand is that, since the parameters for the games change for each test-taker, there might not be a single definitive optimal strategy for every single possible iteration of a particular game. As such, you shouldn’t rely on just memorising one approach and hoping it matches up to what you get on test day.

Instead, it is far better to understand why a strategy is sensible in some circumstances and when it might be different to do something else instead if the version of the game you personally receive necessitates a different approach.

In this article, we have given you a useful overview of the games currently included in the Solve assessment. However, a full discussion with suggested strategies is provided in our comprehensive Solve guide .

With the limited space available here, this is only a very brief sketch of a subset of the ways you can prep.

As noted, what will help with all of these and more is reading the very extensive prep guidance in our full PDF guide to the Solve assessment...

MCC Solve Assessment PDF Guide

Preparing for the Solve assessment doesn’t have to be a matter of stumbling around on your own. Whilst prep isn’t quite as straightforward as it would be for a more conventional test, there is still a lot you can do to increase your chances.

This article is a good start to get you up to speed. From here, though our new, updated PDF guide to the McKinsey Solve assessment is your first stop to optimise your Solve preparation.

This guide is based on our own survey work and interviews with real test-takers, as well as follow-ups on how the advice in the previous guide worked out in reality.

The MyConsultingCoach Solve guide is designed to be no-nonsense and straight to the point. It tells you what you need to know up front and - for those of you who have already received the invitation to interview and don’t have much time - crucial sections are clearly marked to read first, with specific advice for prepping in a hurry

For those of you starting early with more time to spare, there is also a fully detailed, more nuanced discussion of what the test is looking for and how you can design a more long-term prep to build up the skills you need - and how this can fit into your wider case interview prep.

All throughout, there is no fluff to bulk out the page count. The market is awash with guides at huge page counts, with appendices etc stuffed full of irrelevant material to boost overall document length. By contrast, we realise your time is better spent actually preparing than ploughing through a novel.

If this sounds right for you, you can purchase our PDF Solve guide here:

The Next Step - Case Interviews

Male interviewer with laptop administering a case study to a female interviewee

So, you pour in the hours to generate an amazing resume and cover letter.

You prepare diligently for the Solve assessment. You go through our PDF guide , implementing all the suggestions. Accordingly, you pour more hours into gaming, skill building and practising with PST-style questions.

You feel great on the day itself and ace the test, building a perfect ecosystem and keeping your plant alive for 50+ turns or acing the calculations for the Redrock questions.

Your product and process scores are right at the top of your cohort and those plus your resume and cover letter are enough to convince McKinsey to invite you for a first-round case interview. Excellent!

Now the real work begins…

Arduous as application writing and Solve prep might have seemed, preparing for McKinsey case interviews will easily be an order of magnitude more difficult.

McKinsey tells candidates not to prepare for Solve, and it is quite possible that someone might pass that assessment without having done any work in advance. However, McKinsey explicitly expects candidates to have rigorously prepared for case interviews , and it is vanishingly unlikely that an unprepared interviewee could pass even first-round interviews.

The volume of specific business knowledge and case-solving principles, as well as the sheer complexity of the cases you will be given, mean that there is no way around knuckling down, learning what you need to know and practising on repeat.

McKinsey have internal mentoring programmes for promising individuals and we have even heard of HR staff there explicitly telling candidates to secure private coaching before their interviews (indeed, it seems MCC got directly recommended by at least one HR).

All this means that, if you want to get through your interviews and actually land that McKinsey offer, you are going to need to take things seriously, put in the time and learn how to properly solve case studies.

Unfortunately, not all case cracking methods are created equal. There are some older-but-still-well-known systems out there largely trading on brand recognition, but with dubious efficacy - especially in a world where interviewers know all about the frameworks they teach and how to select cases that don’t fit them.

The method we teach throws out generic frameworks altogether and shows you how to solve cases the way a real management consultant approaches a real engagement. Usefully, our method is based specifically on the way McKinsey train incoming consultants

The time you put into learning our approach to case cracking won't just be time down the drain memorising some cribs to be forgotten after the interview. Instead, the methods we teach should still be useful when you start the job itself, giving you a head start on becoming a top-performing consultant!

You can start reading about the MCC method for case cracking here . To step your learning up a notch, you can move on to our Case Academy course .

To put things into practice in some mock interviews with real McKinsey consultants, take a look at our coaching packages .

And, if all this (rightfully) seems pretty daunting and you’d like to have an experienced consultant guide you through your whole prep from start to finish, you can apply for our comprehensive mentoring programme here .

Looking for an all-inclusive, peace of mind program?

Candidates who sign up to our free services are 3 times more likely to land a job in one of their target firms . How?

  • We teach how to solve cases like consultants , not through frameworks
  • Our Meeting Board lets you practice with peers on 100+ realistic, interactive cases.
  • Our AI mentor creates a personalised study roadmap to give you direction.
  • All the advice you need on resume, cover letter and networking.

We believe in fostering talent, that’s why all of the above is free .

Account not confirmed

From Lego to McKinsey, Distracted Managing Can Kill Companies

Increasing social and political pressures are making it harder for CEOs and companies to focus on what they do best.

risk case study mckinsey

Fewer distractions, please.

Photographer: Westend61/Getty Images

If the best management minds agree on anything, it is the importance of corporate focus. Peter Drucker repeatedly argued that deciding what not to do is as important as deciding what to do. The CEO of Apple Inc., Tim Cook, once said that “we believe in the simple, not the complex. We believe in saying no to thousands of projects so that we can focus on the few.” Tom Peters, who recently announced his retirement from writing after decades searching for excellence, advised companies to “stick to their knitting.” At an executive breakfast held on May 9 at the Ivy Club in Covent Garden by the Global Peter Drucker Forum and featuring management thinkers as well as former and current executives, the most popular answers to the question of what ails the corporate world were “lack of focus” or excessive complexity.

Lack of focus not only diverts companies from providing their core products and services. It also contributes to managerial overload and complexity. People who have nothing to do with the company’s core business build empires. CEOs are distracted by an expanding list of demands on their time. What was once transparent becomes opaque and what was once straightforward becomes labyrinthine.

risk case study mckinsey

Food & Function

Fatty acids composition in erythrocytes and coronary artery disease risk: a case-control study in china.

Background & aims: There is limited and conflicting evidence about the association of erythrocyte fatty acids with coronary artery disease (CAD), particularly in China where the CAD rates are high. Our study aimed to explore the associations between erythrocyte fatty acids composition and CAD risk in Chinese adults. Methods: Erythrocyte fatty acids of 314 CAD patients and 314 matched controls were measured by gas chromatography. Multivariable conditional logistic regression and restricted cubic spline models were conducted to explore the odds ratio with 95% confidence interval (OR, 95% CI) and potential associations between erythrocyte fatty acids and CAD risk. Principal component analysis (PCA) was used to analyze further the potential role of various erythrocyte fatty acid patterns in relation to CAD risk. Results: Significant inverse associations were observed between high levels of erythrocyte total n-3 polyunsaturated fatty acids (n-3 PUFA) [ORT3-T1 = 0.18 (0.12, 0.28)], monounsaturated fatty acids (MUFA) [ORT3-T1 = 0.21 (0.13, 0.32)], and the risk of CAD. Conversely, levels of saturated fatty acid (SFA) and n-6 polyunsaturated fatty acids (n-6 PUFA) were positively associated with CAD risk [ORT3-T1 = 3.33 (2.18, 5.13), ORT3-T1 = 1.61 (1.06, 2.43)]. No significant association was observed between CAD risk and total trans fatty acids. Additionally, the PCA identifies four new fatty acid patterns (FAPs). The risk of CAD was significantly positively associated with FAP1 and FAP2, while negatively correlated with FAP3 and FAP4. Conclusion: The different types of erythrocyte fatty acids may significantly alter susceptibility to CAD. Elevated levels of n-3-PUFA and MUFA are considered as protective biomarkers against CAD, while SFA and n-6 PUFA may be associated with higher CAD risk in Chinese adults. The risk of CAD was positively associated with FAP1 and FAP2, and negatively associated with FAP3 and FAP4. Combinations of erythrocyte fatty acids may be more important markers of CAD development than individual fatty acids or their subgroups.

Supplementary files

  • Supplementary information PDF (134K)

Article information

Download citation, permissions.

risk case study mckinsey

Y. Wang, G. Wu, F. Xiao, H. Ying, L. Yu, Y. Chen, Q. Shehzad, L. Xu, H. Zhang, Q. Jin and X. Wang, Food Funct. , 2024, Accepted Manuscript , DOI: 10.1039/D4FO00016A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

IMAGES

  1. mckinsey case interview framework

    risk case study mckinsey

  2. McKinsey 7S Framework perfectly explained with Examples. (2022)

    risk case study mckinsey

  3. McKinsey on Risk

    risk case study mckinsey

  4. mckinsey 7 s model case study

    risk case study mckinsey

  5. McKinsey: The risk-based approach to cybersecurity

    risk case study mckinsey

  6. mckinsey case study framework

    risk case study mckinsey

VIDEO

  1. Speed is the LEAST important factor in a Case Interview #mckinsey #consulting #caseinterview

  2. Webinar: Solved! An African Business Case Study

  3. Ex Mckinsey Partner explains What is Management Consulting (GGI Masterclass)

  4. Padhakku Netcast Series

  5. Secrets to Your Dream Consulting Job! #career #2024 #technology #interview #consulting #europe

  6. Lucy Perez, Co-Leader of McKinsey Health Institute, on latest report on the women’s health gap

COMMENTS

  1. From risk management to strategic resilience

    Like strategy, risk and resilience management requires a strong business and market perspective, a risk mindset, and interdisciplinary thinking. For risk professionals, this is a call to come out of the ivory towers and into the marketplace. Identify the organization's natural strengths and Achilles' heels.

  2. PDF McKinsey on Risk

    24 McKinsey on Risk Number 10, January 2021. accelerated by the COVID-19 crisis. This is true for how data and digital interfaces are affecting firm processes, how companies are employing artificial intelligence to support day-to-day decisions, and how the digital revolution is shaping risk management itself.

  3. Risk, resilience, and rebalancing in global value chains

    The risk facing any particular industry value chain reflects its level of exposure to different types of shocks, plus the underlying vulnerabilities of a particular company or in the value chain as a whole. ... New research from the McKinsey Global Institute explores the rebalancing act facing many companies in goods-producing value chains as ...

  4. Climate risk and response

    Case studies. In order to link physical climate risk to socioeconomic impact, we investigate nine specific cases where climate change extremes are measurable. These cover a range of sectors and geographies and provide the basis of a "micro-to-macro" approach that is a characteristic of MGI research.

  5. The approach to risk-based cybersecurity

    The risk-based approach does two critical things at once. First, it designates risk reduction as the primary goal. This enables the organization to prioritize investment—including in implementation-related problem solving—based squarely on a cyber program's effectiveness in reducing risk.

  6. Risk and resilience are the top topics in McKinsey supply chain survey

    Companies around the globe are accelerating their efforts to diversify and localize their supply networks, as the topics of risk and resilience still dominate the supply chain agenda four years on from the start of the pandemic, according to a report from consulting firm McKinsey & Co. Under those pressures, supply chains are seeing a profound revolution, with a dramatic increase in the ...

  7. Interviewing with McKinsey: Case study interview

    Learn what to expect during the case study interview. Hear what some recent hires did - and did not - do to prepare.

  8. McKinsey Publishes Report on the Growth of Value-Based Care and Risk

    The management consulting firm McKinsey & Company issued a report on March 24, 2022 reviewing the future of healthcare delivery. Their key findings are detailed below: Value based plans including risk bearing models run by managed services organizations and accountable care organizations are expected to grow to 22% percent of insured lives by ...

  9. Capital expenditure management to drive performance

    This strategy is even more vital in competitive markets, where ROIC is perilously close to cost of capital. In our experience, organizations that focus on actions across the whole project life cycle, the capital project portfolio, and the necessary foundational enablers can reduce project costs and timelines by up to 30 percent to increase ROIC by 2 to 4 percent.

  10. Risk management

    Frank Furedi. People aren't getting dumber, but they're often treated that way. Politicians, educators, and the media assume the public is uncomfortable with nuance and grateful for prescribed ...

  11. Ready, set, scale: Shaping leaders for hypergrowth

    However, there is a risk of functions becoming overly focused on those OKRs, which is where leaders can benefit from fostering a "one organization" mindset and identifying early on what sets the company apart. 7 Blair Epstein, Caitlin Hewes, and Scott Keller, "Capturing the value of 'one firm,'" McKinsey Quarterly, May 9, 2023.

  12. Resilience quest is reshaping supply chain, says McKinsey

    The quest for resilience is driving the localising and diversifying of supply chains globally, as well as the deployment of technology-based advances in value chain planning, execution and risk management, a report from McKinsey concludes. But it also paints a picture of boardrooms oblivious to the risks inherent in modern supply chain management.

  13. Understanding the McKinsey Climate Risk report

    Yikes. The tech-speak McKinsey could have avoided. Yes, they're biotic feedbacks, but the phrase tipping points is so much easier to grasp. They're climate points, or thresholds, that tip ...

  14. PDF FRE 7801 (Section I) Model Risk Management

    Course name - Model Risk Management - A case study based approach. Course description - Model risk management (MRM) is often cited as one of the key concerns by regulators and global banks in recent surveys and interviews conducted in the US and Europe (as per recent McKinsey and other industry studies).

  15. 47 case interview examples (from McKinsey, BCG, Bain, etc.)

    One of the best ways to prepare for case interviews at firms like McKinsey, BCG, or Bain, is by studying case interview examples.. There are a lot of free sample cases out there, but it's really hard to know where to start. So in this article, we have listed all the best free case examples available, in one place.

  16. McKinsey Case Interview

    McKinsey Case Interview. Updated November 17, 2023. The McKinsey case interview is different than the case interview you will receive at BCG, Bain, or another top consulting firm. This means that you must prepare strategically. While best practices are consistent across different case styles, at some point, simply preparing for general case ...

  17. McKinsey Profitability Case Walkthrough

    Transcription: McKinsey Case Study Interview: College Football Program The Prompt MC. In today's case, we're working with Kayla, who had the courage to practice and work through a McKinsey profitability case on a livestream with tons of onlookers. The case is led by Lisa Bright, an ex-McKinsey interviewer.

  18. The promise and the reality of gen AI agents in the enterprise

    QuantumBlack, McKinsey's AI arm, helps companies transform using the power of technology, technical expertise, and industry experts. With thousands of practitioners at QuantumBlack (data engineers, data scientists, product managers, designers, and software engineers) and McKinsey (industry and domain experts), we are working to solve the world's most important AI challenges.

  19. Predictive modelling and identification of key risk factors ...

    The stroke prediction dataset was created by McKinsey & Company and Kaggle is the source of the data used in this study 38,39. The dataset is in comma separated values (CSV) format, including ...

  20. Utilizing Mckinsey 7s Model, SWOT Analysis, PESTLE and Balance

    This is a challenging process when one anal yses a case study based on the interview evidence (Yin, 2009). Table 1: Set of Interview Questions with Regards to SWOT Analysis

  21. Case interview examples: McKinsey, BCG, Bain, Deloitte and many others

    McKinsey Case Interview Examples. Loravia - Transforming a national education system. SuperSoda - Electro-light product launch. GlobaPharm - Pharma R&D. Bill & Melinda Gates Foundation - Diconsa financial services offering. Beautify - Customer approach. Shops - DEI strategy. Talbot Trucks - Electric truck development.

  22. Profitability Case Study Interview Example

    If you've found this video helpful, you're gonna LOVE our free course at https://www.craftingcases.com/freecourse -- click the link for more info.The case I ...

  23. | MyConsultingCoach

    In simple terms, the McKinsey Solve assessment is a set of ecology-themed video games. In these games, you must do things like build food chains, protect endangered species, manage predator and prey populations and potentially diagnose diseases within animal populations or identify natural disasters. Usually, you will be given around 70 minutes ...

  24. From Lego to McKinsey, Distracted Managing Can Kill Companies

    Connecting decision makers to a dynamic network of information, people and ideas, Bloomberg quickly and accurately delivers business and financial information, news and insight around the world

  25. Tattoos increase the risk of cancer by 21%, regardless of size

    Getting a tattoo, regardless of its size, increases the risk of developing lymphoma by 21%, according to a new study. The researchers say they're not trying to dissuade people from getting inked ...

  26. Fatty acids composition in erythrocytes and coronary artery disease

    Background & aims: There is limited and conflicting evidence about the association of erythrocyte fatty acids with coronary artery disease (CAD), particularly in China where the CAD rates are high. Our study aimed to explore the associations between erythrocyte fatty acids composition and CAD risk in Chinese

  27. Short-chain chlorinated paraffin (SCCP) exposure and type 2 diabetes

    Exposure to persistent organic pollutants may be associated to type 2 diabetes, but the studies on associations between short-chain chlorinated paraffin (SCCP) exposure and type 2 diabetes risk in humans are still scarce. Here, we conducted a case-control study involving 344 participants in Shandong Province, East China, to explore the effects of SCCPs on type 2 diabetes risk and their ...

  28. Remote Sensing

    The monitoring and evaluation of soil ecological environment is very important to ensure the saline-alkali soil health and the safety of agricultural products. It is of foremost importance to, within a regional ecological risk-reduction strategy, develop a useful online system for soil ecological assessment and prediction to prevent people from suffering the threat of sudden disasters. However ...