Accessibility Links

  • Skip to content
  • Skip to search IOPscience
  • Skip to Journals list
  • Accessibility help
  • Accessibility Help

Click here to close this panel.

Purpose-led Publishing is a coalition of three not-for-profit publishers in the field of physical sciences: AIP Publishing, the American Physical Society and IOP Publishing.

Together, as publishers that will always put purpose above profit, we have defined a set of industry standards that underpin high-quality, ethical scholarly communications.

We are proudly declaring that science is our only shareholder.

Design and Implementation Data Warehouse in Insurance Company

Ryan Ari Setyawan 1 , Eko Prasetyo 2 and Abba Suganda Girsang 2

Published under licence by IOP Publishing Ltd Journal of Physics: Conference Series , Volume 1175 , 1st International Conference on Advance and Scientific Innovation 23–24 April 2018, Medan, Indonesia Citation Ryan Ari Setyawan et al 2019 J. Phys.: Conf. Ser. 1175 012072 DOI 10.1088/1742-6596/1175/1/012072

Article metrics

2136 Total downloads

Share this article

Author e-mails.

[email protected]

Author affiliations

1 Department of Informatics Engineering, Janabadra University, Yogyakarta, Indonesia 55231.

2 Computer Science Department, Binus Graduate Program-Master of Computer Science, Bina Nusantara University, Jakarta, Indonesia.

Buy this article in print

Insurance company are certainly has a rich of data from their business process. One of the data is related to business process sales. From this sales data can be used to analyze the condition of a company whether the condition of the company is in a good condition or not. The purpose of this research is to develop a technique to analyze this data from the company. The method used to implement the data analysis in this paper is to design a data warehouse. Use ninestep methodology as the method for design data warehouse and Pentaho as a tool for ETL (Extract, Transform, Load), OLAP analysis and reporting data, the results of this research concluded that the implementation of data warehouse for perform data analysis and reporting better and more effective.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence . Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Harnessing the potential of data in insurance

In September 2016, AIG and Hamilton Insurance Group announced a joint venture with hedge fund Two Sigma to form Attune, a data and technology platform to serve the $80 billion U.S. small and midsize commercial insurance market. Through Attune, the companies are seeking to transform the small commercial segment by harnessing data, artificial intelligence capabilities and advanced modeling techniques. Attune will partner with brokers, agents and other intermediaries to streamline the pricing, selection and underwriting of insurance for small business owners.

Stay current on your favorite topics

Insurers have historically collected a wealth of data, but they have been slower to monetize this asset—by creating new business lines or models to capture the value of data and analytics. As more insurance consumers move online to interact, compare products and prices, and make purchases, the volume of available data is increasing exponentially. Even more significantly, powerful new analytics technology enables insurers to use that data in ways they had not previously considered. However, many insurers face organizational challenges to becoming data-driven companies. Others are waiting for business opportunities to emerge before enhancing their analytics capabilities. As a consequence, insurers have lagged behind other industries in their investment in and adoption of analytics.

As first movers among insurers create new business models and seek to harness the potential of their data, those that wait will be at a significant competitive disadvantage. To become a data-driven insurance organization, firms must rethink their approach to building and managing data and analytics assets and develop distinctive go-to-market capabilities that allow them to offer clients data-centric solutions.

New technology, new opportunities

The explosion in available customer data (both personal and commercial), the growth in analytics technologies and the rapidly declining cost of computing power and data storage are prompting companies to invest in data analytics as a means to innovate. Forward-thinking leaders across industries are pursuing opportunities to create data-driven businesses in core and adjacent markets. 1 1. For more on how analytics is shaping a range of sectors, see “ The age of analytics: Competing in a data-driven world ,” McKinsey Global Institute, December 2016. United Healthcare’s subsidiary, Optum, monetizes its proprietary consumer data and offers technology, consulting, and other services to providers, payors, government agencies, and life science organizations. Caterpillar’s investment in Uptake, a predictive maintenance platform, allows Caterpillar to tap a quintillion bytes of data to help customers make real-time maintenance decisions that can dramatically reduce the costs of ownership and operations.

Such examples have spurred early movers in the insurance industry to employ analytics across functions such as marketing and distribution, underwriting and claims. In addition to traditional data aggregators such as Acxiom, Epsilon, and Experian, carriers are using new online data sources such as Argus, Datalogic, DemystData, and specialty providers such as Judy Diamond and ATTOM Data Solutions to create 360-degree views of consumers and channels and identify new opportunities in several ways.

Would you like to learn more about our Financial Services Practice ?

  • Enhancing existing business models. Carriers are using data analytics to radically redefine their role by providing agencies with the tools to integrate data-driven decision making into areas such as cross-selling and reducing customer churn. These analytics tools spotlight the highest-value clients and high-potential leads so agents can invest resources more efficiently, predict customer churn more accurately to help improve retention, and generate broker-peer comparison analytics to identify additional sales opportunities.
  • Strengthening channel relationships. Carriers are using data analytics to strengthen broker relationships. AXA’s EB360 platform, for example, offers a suite of analytics-powered tools to help brokers track the status of applications, manage compensation, and commissions and monitor progress on business goals. The tools, which are optimized to minimize data entry and enable rapid quoting, help brokers manage their business more effectively, and thus strengthen the broker-carrier relationship.
  • Changing relationships with consumers. Insurers are fundamentally changing their relationship with consumers through the use of real-time monitoring and visualization. Consumers who agree to let insurance companies track their habits can learn more about themselves, while insurers can use the data to influence behavior and reduce risks. In auto insurance, for example, telematics are being used to monitor consumer driving habits in real time. By harnessing the resulting insights, insurers can offer usage-based policies and determine claims liability easily and accurately.
  • Redesigning products. The Climate Corporation is using data and analytics to redefine the crop insurance market. The company uses data on weather patterns, soil characteristics and other key crop attributes at the field level to reduce farmers’ risks by designing policies that protect farmers from losses due to weather and other adverse events.
  • Creating new business models. Sonnet, Canadian insurer Economical’s entrance into the direct channel, relies on a “data hub” to allow consumers to efficiently obtain online quotes and bind coverage for homeowner’s and auto insurance. The data hub quickly aggregates information from numerous databases to streamline the buying experience. At most insurers, consumers must answer more than 20 question to get an auto insurance quote; Sonnet requires fewer than 10. The approach appeals to tech-savvy consumers with relatively straightforward insurance needs, while those seeking more assistance with their insurance decisions can purchase through Economical’s broker partners.
  • Establishing new adjacent businesses. A large commercial insurer has formed partnerships to offer policyholders just-in-time solutions such as the maintenance of heating, ventilation and air-conditioning equipment in commercial buildings. The solutions are based on monitoring and diagnosing vibration and sound patterns to detect declining performance and predict failures, which reduces the total cost of ownership.

Increasingly, carriers are creating entirely new business models and disruptive offerings that generate non-risk, fee-based income. These “data as a business” models allow insurers to take advantage of their vast data pools and existing investments in data and analytics to offer unique data-driven insights to partners and end customers.

The data-driven insurer: A journey in five phases

The arguments for harnessing the power of data and analytics are convincing. However, the question often asked by insurance executives is, “Where and how do we start?” Insurers should follow a five-phase approach to design, launch and successfully manage a data analytics business (exhibit).

Phase 1—Define aspiration and set vision

The first step in shaping a “data as a business” strategy is for an organization’s senior leaders to define a compelling aspiration for the new business. Given the enormous economic potential the data hold, the aspiration should be bold and include business-backed, strategic use cases. A rallying cry for the organization could be, “Through launching a new data business, we expect to radically redefine the homeowner’s insurance market and double our market cap in three years.” Creating a yardstick to measure progress will ensure that the organization is thinking boldly enough. A high-level business and economic model based on the aspiration should also be developed during this phase. Throughout the following four phases, these elements will be pressure-tested and adjusted as appropriate.

With the aspiration set, senior leaders must determine the most appropriate course to mobilize the organization to pursue it. This task includes appointing and visibly backing a leader to drive the next four phases.

Phase 2—Evaluate assets, capabilities, and value-creation opportunities

With the aspiration and strategic use case as the foundation, insurers must next determine which of their assets they can harness to build the capabilities required to achieve the strategic use case. This process includes not only understanding the types and value of existing data but also building the analytical and business capabilities needed to transform raw data into valuable insights for partners and customers.

Understanding the ecosystem of analytics partners is critical to generating impact from analytics. Carriers should identify best-in-class companies that deliver impact through data, analytics, and insights across the industry value chain. Equally important is finding “white spaces” in the market where no solutions exist. The key here is to understand how to create value in these opportunity areas and which analytical capabilities matter most to the solution. For example, partners to evaluate can include those with:

  • Proprietary data sets, machine-learning models, and approaches to continuously improving the model
  • Flexible data and analytics infrastructure to execute high-priority use cases
  • Tools that allow customers to efficiently access, understand, and act on data and insights
  • Go-to-market capabilities including commercializing and pricing high-impact analytical solutions
  • Capabilities to quickly make improvements to meet customers’ changing needs and stay ahead of competitors

At the conclusion of Phase 2, management should align on the data assets and capabilities that best fit the strategic use case, the gaps that will need to be addressed and the high-level business case. To rally stakeholders, leaders need a compelling reason for the changes they intend to make, and must clearly describe the impact that analytics can have on the organization, its clients, and employees. The key challenge is to ensure that these assets link clearly to business value; without this connection, the resulting assets will have no real impact.

Phase 3—Define specific use cases and business model

Monetization business models.

Organizations are exploring a number of business models to monetize data and deliver against business-backed use cases. These models range from providing raw data to providing insight-based consulting solutions and services to customers and channel partners. Across industries, five categories of business models are emerging (Exhibit A).

In insurance, some early movers are aspiring to develop “utilities” by taking advantage of their size and access to a disproportionate share of data to create solutions that improve industry economics and firms’ ability to serve clients. For example, Aon’s Global Risk Insight Platform (GRIP) contains a proprietary database of insurance industry placement data, a source of insights across carriers, industries and products from individual transactions to global trends. This collection of data enables Aon to benchmark similar risks worldwide, including pricing information, to help clients evaluate performance and anticipate shifts in the market.

Insurance executives must consider many factors when exploring potential business models, such as the use cases, the ultimate customer value proposition, the specific business problems being addressed, and the profit formula (for example, how much profits depend on quickly achieving scale). In data-centric business models, a key factor is data quality and how much processing will be required to make the information usable. In general, moving from the data provider model toward the others requires more processing of the underlying raw data, and hence higher levels of investment.

The good news is that the more companies refine the data, the more value is created for end customers and the higher the resulting pricing power. To get started, management teams should evaluate potential models based on the data that can be monetized with little additional processing (for example, data provider) versus putting all the organization’s efforts behind a longer-term play (for example, full solution provider).

The next step is defining specific use cases and associated customer value propositions—in other words, the building blocks of the aspiration and strategic use case. Where applicable, management should consider further refining the list of potential use cases through market research with potential partners and customers. This exercise gives the team a clearer understanding of potential demand for each use case and the primary monetization mechanisms (for example, price premiums for current products or incremental subscription fees for new data products). As building the necessary capabilities might also require partnerships or acquisitions, insurers should conduct the external assessment with an eye toward this possibility. This assessment can help to identify potential business models (see sidebar, “Monetization business models”).

Time for insurance companies to face digital reality

Time for insurance companies to face digital reality

Phase 4—conduct pilots.

The team proves the value of the new business by piloting two or three minimum viable products (MVPs). Carriers should take an iterative, test-and-learn approach to pilots, with each lasting no more than 8 to 10 weeks. It is important to keep the scale of these pilots manageable and not attempt to perfect final offerings. Also, metrics to gauge a pilot’s success should include a mix of learning and financial impact (with more emphasis on the former) as well as test-versus-control experiments where applicable to measure the incremental value delivered by analytics.

Phase 5—Establish new business unit and scale operations

Scaling successful prototypes and establishing foundational capabilities by recruiting talent and building the “data factory” will position an insurer to formally launch the new venture and begin the scaling process. The new venture will call for new roles in the organization, including not only data scientists who can analyze big data and solution architects to manage the delivery road map, but also experts who can translate business needs into analytics language.

As the data-driven business matures, firms should explore establishing a new branded unit based on its capabilities and revenue growth projections. Insurers should also establish and manage key metrics to assure implementation is on track and that the business unit is delivering anticipated value. The primary focus should always be on value delivered and on investments based on clear value realization milestones.

Building a data-driven business is often a multiyear journey requiring parallel efforts in such areas as data and analytics modeling and business building, along with a heavy customer engagement and go-to-market component.

Implementing an agile cross-functional operating model

Developing a data monetization business calls for strong go-to-market capabilities. Insurers must quickly develop the data analytics offerings, conduct tests with real customers, refine them in quick iterations, and price solutions based on the value delivered and the customer’s willingness to pay. For most insurance carriers, this approach will represent a change from their traditional way of operating. However, leading digital companies around the world are using such agile approaches to deliver business and customer value quickly and effectively. Carriers should consider taking the following actions:

  • Get close to customers and collaborate with them on solution design. Instead of relying on the judgment of select stakeholders such as sales executives or on market research, insurers should base each step of a solution’s design and development on active customer engagement and feedback. This focus should also help inform the value at stake for customers and their willingness to pay.
  • Create cross-functional teams that own well-defined business and customer outcomes. Typical insurance carrier silos such as sales and marketing, product, IT, finance, and HR create significant coordination overhead. Dedicated cross-functional teams (preferably with dedicated resources) can own the solution end to end and focus on customer outcomes.
  • Adopt a test-and-learn approach and focus on launching in weeks rather than months. Rather than aiming to build feature-rich, comprehensive solutions that take months and years to design, develop, and launch, carriers should focus on quickly delivering an MVP followed by subsequent releases to expand and improve on features, functionality, and reach. Each release should be based on meeting a set of milestones and success criteria agreed upon in advance.

Since this approach to developing and running a business represents a sharp break for most carriers, executives should consider establishing the new business so that it does not get burdened by the larger organization’s requirements and processes. For example, a new business will need to attract different talent profiles and develop unique partnerships, so established processes and lead times (for instance, hiring processes) might not be effective. Senior leaders should proactively think about addressing these issues while building momentum, generating excitement, and celebrating successes.

The exploding volume of data available to insurance carriers is giving rise to new business models, revenue streams, and enormous opportunities to increase value. Embarking on the journey to monetize data requires insurers to rethink their approach to building and managing data and analytics assets and to develop distinctive go-to-market capabilities to bring new data-centric offerings to their clients. Executives that can manage investments in analytics while identifying new business lines can capture significant rewards.

Ari Libarikian is a senior partner in McKinsey’s New York office, where Ani Majumder is an associate partner; Kia Javanmardian is a partner in the Washington, DC, office, where Doug McElhaney is a vice president.

The authors wish to thank David Rose for his contributions to this article.

Explore a career with us

Related articles.

Insights-Facing-digital-reality-1536x1536-300_Standard

Analytics in banking: Time to realize the value

Business Intelligence | Power BI | Machine Learning & AI | SaaS & Enterprise portals

Business Intelligence | Power BI | Machine Learning & AI | SaaS & Enterprise portals

Transforming businesses with modern web and data solutions through data intelligence and AI products, portals, and services

Case Study – Azure and Power BI

Global Insurer reduces operational process time by over 99%

End-to-End Modern Business intelligence solution and automation with data processing, pipelines, cloud data warehouse and business reporting with Azure and Power BI

Business challenge.

A global health and life insurance solutions provider with operations worldwide had challenges in data availability, reporting, and insights, and wanted to improve its MIS functions.

  • Insurance product partner reports are available only annually and delays in processing data on loss ratios raised risks of underwriting with incorrect premiums on policies
  • Reliance on manual data collection from various vendor portals
  • Quarterly reporting was cumbersome with manual data preparation taking over 4 days besides Excel file size limitations.
  • Inability to view partner broker/distributor performance analysis
  • Historic data unavailable and unable to perform trend analysis
  • Shortage of resources hampered technology-driven reporting and insights delivery
  • Lack of ownership of the reporting process.

The Solution

After a meticulous vendor evaluation process, BIGINT solutions was the chosen solution provider due to its wide experience in data analytics and business intelligence.

I was amazed at the speed with which they could analyze and propose a totally end-to-end automated solution, unlike others, took a long while for the system study phase. Moreover, their solution is working wonders now and has boosted operational efficiency. – CFO

The team conducted diagnosis to learn more about the challenges, and proposed and implemented a cloud-based data processing and reporting architecture aligning with firm strategic objectives. A new operating model with an automated end-to-end solution was implemented to resolve the business challenges.

Key directives included:

  • Data extraction from multiple source systems: CSV, SQL, Excel, and APIs.
  • An event-driven Data pipeline system is scheduled to initiate on new data arrival.
  • Data transformation and consolidation into an Azure Cloud-based data warehouse
  • Power BI reporting models and data visualization using analytical dashboards for data insights.
  • Shareable data reports in multiple formats
  • An end-to-end automated solution to save costs and time.

Business Outcomes

BIGINT Solutions transformed the client’s reporting and analytics space to deliver an efficient and automated solution resulting in positive and quantifiable business outcomes.

  • Easier Data availability : Centralized cloud-based data repository enabled combined data related to insured profile, member policies, and premiums, claims, group policies, brokers and distributors, and metadata for both internal and external insurance products enabling historic comparison and trend analysis.
  • Quicker reporting : Reports generation time reduced from 4 days to 30 minutes using an Azure cloud-based event-driven data pipeline for data transformation and storage.
  • Improved reports frequency : Reports frequency reduced from quarterly to bi-monthly using end-to-end automated BI solution resulting in faster data availability and improved ability to understand risks, policy movements, and financials for planning and budgeting purposes.
  • Real-time data insights : Enhanced reporting capabilities using Power BI models shows actionable intelligence and provided client with a strategic edge and cost savings.
  • 360-degree views : Leveraging business intelligence with Power BI led to 360-degree views of partner performance and customer portfolios.
Our staff has bandwidth for higher value-added activities now unlike before where they were experimenting trying to get quarterly clubbed data correctly in an excel sheet for days. Highly recommended. Keep up the good work guys!  – Managing Director

Area of focus

Data Pipelines, Cloud data warehouse, Data management, Business Intelligence, and Reporting, Data Analytics

Technology used

Azure Storage, Azure SQL, Azure Functions, Power Platform, Power BI

Let us help improve your organization’s efficiencies with positive business outcomes. Drop us a line and we will be in touch soon. Contact us or schedule a call now for your modern BI and reporting needs.

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by

Get full access to The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Chapter 15. Insurance

We will bring together concepts from nearly all the previous chapters to build a data warehouse for a property and casualty insurance company in this final case study. If you are from the insurance industry and jumped directly to this chapter for a quick fix, please accept our apology, but this material depends heavily on ideas from the earlier chapters. You'll need to turn back to the beginning of the book to have this chapter make any sense.

As has been our standard procedure, this chapter is launched with background information for a business case. While the requirements unfold, we'll draft the data warehouse bus matrix, much like we would in a real-life requirements analysis effort. We'll then design a series of dimensional models by overlaying the core techniques learned thus far in a manner similar to the overlay of overhead transparencies.

Chapter 15 reviews the following concepts:

Requirements-driven approach to dimensional design

Value-chain implications

Data warehouse bus matrix

Complementary transaction, periodic snapshot, and accumulating snapshot schemas

Four-step design process for dimensional models

Dimension role-playing

Handling of slowly changing dimension attributes

Minidimensions for dealing with large, more rapidly changing dimension attributes

Multivalued dimension attributes

Degenerate dimensions for operational control numbers

Audit dimensions to track data lineage

Heterogeneous products with attributes and facts that vary by line of business ...

Get The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

insurance data warehouse case study

Home / Blogs / Data Warehousing for Insurance Reporting and Analytics

The Automated, No-Code Data Stack

Learn how Astera Data Stack can simplify and streamline your enterprise’s data management.

Data Warehousing for Insurance Reporting and Analytics

Data Warehousing for Insurance Reporting and Analytics

Content Manager

The significance of data warehousing for insurance cannot be overstated. It forms the bedrock of modern insurance operations, facilitating data-driven insights and streamlined processes to better serve policyholders.

In recent years, data warehouses have emerged as foundational tools that allow insurance companies to efficiently manage, analyze, and leverage the immense volume of data at their disposal for seamless reporting and analytics. These repositories play a crucial role in transforming raw data from disparate sources into actionable insights, steering insurance companies toward informed decision-making.

The data warehouse has the highest adoption of data solutions, used by 54% of organizations. ( Flexera 2021 )

Data Warehousing for Insurance: Creating a Single Source of Truth

Insurance companies generate and receive large amounts of data from various business functions and subsidiaries that are stored in disparate systems and in a variety of formats. Centralizing and organizing data from disparate sources, such as customer information, policies, claims, and market trends, provides a unified platform for comprehensive analysis.

This analytical capacity enables insurance professionals to conduct intricate risk assessments, predictive modeling, and accurate trend analyses, facilitating the formulation of effective strategies. That’s where a data warehouse comes in!

A data warehouse enables them to integrate this disparate data into a single source of truth , persist it in a report-oriented data structure, and create a centralized gateway to enable seamless reporting and analytics at the enterprise scale. The benefits of data warehousing for insurance companies are many, including:

  • Centralized Data: A data warehouse allows the consolidation of data from diverse sources spread across different systems. This centralized repository ensures consistent, accurate, and up-to-date information is available for analysis.
  • Efficient Reporting: Standardized data in a data warehouse simplifies the reporting process. The company can generate consistent and accurate reports for regulatory compliance, financial audits, and performance evaluation.
  • Enhanced Analytics: A data warehouse facilitates advanced analytics, including predictive modeling, risk assessment, and customer behavior analysis. This enables the company to identify opportunities, optimize processes, and mitigate risks effectively.
  • Cross-subsidiary Insights: With a data warehouse, the insurance company can gain insights that cut across subsidiaries. This can highlight cross-selling opportunities, identify areas of operational synergy, and improve customer experiences.
  • Deeper Customer Awareness: Using a data warehouse, an insurance company can learn more about its customers. They can pinpoint customer preferences, behaviors, and requirements, thereby enabling precise marketing and customer service strategies.
  • Improved Decision-Making: Access to a comprehensive dataset enables better decision-making. Executives can analyze trends, performance, and risk factors across the entire organization, leading to more informed strategic choices.

In addition, data warehousing helps improve other data management aspects, including:

  • Data Security: Centralizing data in a data warehouse enables the implementation of robust security measures, ensuring that sensitive information is appropriately protected.
  • Data Integration: A data warehouse supports data integration across various subsidiaries, systems, and data formats, fostering interoperability and reducing data silos .
  • Data Quality and Consistency: A well-maintained data warehouse enforces data quality standards, ensuring that data is accurate, complete, and consistent.

Data Warehousing for Insurance: Who Can Benefit?

Data team leaders and senior personnel.

As heads of the data team or senior members of an insurance organization, these individuals play a critical role in shaping data strategies. Utilizing a data warehouse empowers them to streamline reporting and analytics processes. By centralizing data, incorporating data quality management , and providing efficient querying capabilities, they can make more informed decisions and drive the company’s overall data-driven strategy. This leads to improved operational efficiency and a competitive edge in the insurance industry.

Data Analysts and Engineers

Data analysts and engineers within the organization benefit significantly from a data warehouse. They often find themselves spending a substantial amount of time on mundane, repetitive tasks like data extraction, transformation, and loading (ETL). With a data warehouse in place, these tasks can be automated, allowing them to focus on higher-value activities such as data analysis, modeling, and innovation. This not only boosts their job satisfaction but also empowers them to contribute more effectively to building innovative insurance products and solutions that can drive business growth.

Business Users

Business users in the insurance industry face challenges related to data dependency, often experiencing delays in obtaining critical information. They rely on timely insights to make informed decisions and solve problems swiftly. A data warehouse addresses this by providing self-service reporting and analytics capabilities. Business users can generate reports instantly, reducing their dependence on IT or data teams. This agility accelerates their ability to respond to changing market conditions, customer needs, and emerging opportunities, ultimately enhancing the organization’s agility and competitiveness.

Fraud Detection & Prevention Using Data Warehouse

Utilizing a data warehouse, insurance companies can improve their fraud detection process. A consolidated data repository enables them to employ anomaly detection and process integrity checks. Through continuous analysis of policyholder data and transaction records, the system establishes behavioral baselines, promptly flagging deviations for potential fraud. This centralized approach enables correlations across diverse data sources, unveiling hidden patterns indicative of fraudulent activities.

A data warehouse’s capability to consolidate information empowers insurers to minimize financial losses caused by fraud. Monitoring various operational aspects allows insurers to gain a comprehensive overview, facilitating rapid identification of irregularities and potential fraud indicators. Real-time transaction monitoring aids in halting fraudulent payouts, while predictive models, built on historical patterns, enable proactive risk mitigation.

Data Warehousing for Insurance: A Smart, Long-term Financial Decision

An enterprise-grade data warehouse with end-to-end automation offers a great return on investment (ROI) to insurance companies by improving operational efficiency, introducing cost-saving opportunities, and enabling faster business intelligence. The ROI depends on the business goals and size of each organization, but in most cases, companies recover their cost of investment within the first three years.

Data warehousing for insurance require a considerable allocation of organizational resources, which sparks significant interest in both their initial justification and ongoing evaluation. It’s essential to acknowledge that despite this commitment, data warehouses frequently demonstrate themselves to be exceptionally valuable and rewarding investments.

Data Warehousing for Insurance: Try Astera DW Builder!

Astera DW Builder is our flexible and scalable data warehouse automation tool that allows you to design, develop, and deploy high-volume data warehouses for your insurance business in days.

Built upon Astera’s industrial-strength ETL engine and refined by years of use by Fortune 1000 companies, our solution allows you to lead with speed through its robust automation capabilities. Astera DW Builder accelerates design, development, and implementation phases by automating low-level, repetitive tasks, allowing the business to refocus resources on optimizing data processes and generating winning results.

Go from Source to Insights at Unprecedented Speeds

Combining the power of advanced data modeling features and parallel processing ETL/ELT engine with enhanced automation capabilities, Astera DW Builder streamlines data warehousing for insurance companies, allowing them to speed up time-to-information and reduce dependency on IT, ensuring that analysts and underwriters have the right data at the right time to facilitate better decision-making.

An All-Round Platform for Data Warehouse Development

Astera DW Builders offers a comprehensive set of data warehousing features tailored to insurance companies’ data management requirements. It simplifies the process of bringing together data from on-premises and cloud sources, ensuring a unified and high-quality data foundation for improved reporting and analytics. Insurance companies can benefit from Astera’s capabilities to manage historical data effortlessly and connect to a wide array of sources and destinations. The push-down optimization (ELT) feature enhances query performance, enabling them to focus on value-added tasks like data analysis and modeling.

Astera also caters to the needs of insurance companies by providing instant access to data through self-service reporting and analytics dashboards. This reduces dependency and empowers them to make agile, data-driven decisions. With the ability to scale via cloud deployment, Astera ensures seamless growth and scalability as insurance organizations expand. Additionally, Astera’s data lineage capabilities offer transparency and confidence in data management processes, while secure data movement from on-premises to the cloud ensures data security and compliance.

In summary, Astera equips insurance professionals with the tools they need to harness the full potential of their data for informed decision-making and competitive advantage.

Astera Advantage

Astera’s enterprise-grade data warehousing solution eliminates the need for extensive coding and complex infrastructure, reducing upfront and ongoing costs associated with traditional data warehousing development projects. You don’t need a team of certified data architects and modelers to manage your data warehouse. Moreover, our solution is built on a no-code architecture, enabling faster project completion.

Designing and maintaining a sustainable data warehouse for timely reporting and insights requires fewer (and lower) man hours with Astera. Projects that would typically take several months can be completed within a few weeks or even days through our tool’s intuitive, drag-and-drop environment and advanced data modeling and ETL/ELT capabilities.

With over 40 pre-built transformations, the Astera ETL solution offers increased uptime and greater reliability as compared to custom-coded solutions, which rely on certified data architects and engineers for maintenance. Additionally, Astera’s unified solution saves resources by eliminating the need for separate licenses, maintenance, and support for multiple tools and vendors, resulting in cost savings and improved resource allocation.

As a result, data warehousing with Astera offers a substantially lower total cost of ownership (TCO).

By partnering with Astera, you can build a data warehouse that would serve your current and future data management needs. To get started, reach us at [email protected]  or request a  free trial  today.

You MAY ALSO LIKE

What is a business glossary definition, components & benefits.

A solid understanding of internal technical and business terms is essential to manage and use data. Business glossaries are pivotal...

What is Online Transaction Processing (OLTP)?

OLTP is a transaction-centric data processing that follows a three-tier architecture.  Every day, businesses worldwide perform millions of financial transactions....

Best Data Mining Tools in 2024

What Is Data Mining? Data mining, also known as Knowledge Discovery in Data (KDD), is a powerful technique that analyzes...

Considering Astera For Your Data Management Needs?

Establish code-free connectivity with your enterprise applications, databases, and cloud applications to integrate all your data.

websights

Related Content

insurance data warehouse case study

Case Study: Paragon Insurance Holdings

Paragon selected LeapFrogBI to build a new data warehouse and implement a more flexible business intelligence solution.

Enabling Advanced Insurance Analytics and Reporting

advanced insurance analytics - poliicy flow diagram

Enabling advanced insurance analytics is both challenging and highly rewarding. This article explains the challenge, recommends a proven solution and shows specific examples.

Topics covered:

What is advanced insurance analytics, and why should you care?

What factors make it difficult to produce advanced insurance analytics?

What benefits will an insurance data warehouse produce?

What do advanced insurance analytics look like in practice?

What does it take to implement an insurance data warehouse?

Advanced insurance analytics and reporting will lead to better decision making

Analytics are nothing new to the insurance industry. Successful insurance carriers long ago mastered the use of statistical analysis to assess the probability of loss, and price their risk accordingly. Ironically, employees at companies from agencies to carriers to third party administrators still lack the information necessary to be data-driven and optimize their business.Advanced insurance analytics and reporting can be a confusing topic. But the underlying concept is easy to grasp. It is the ability to give employees, at all levels of the organization, the information they need to make fact-based decisions in real time. With advanced analytics in place, employees can ask questions of data and understand the impact of their actions on performance. Best of all, they can do so without the need to involve technical resources.

The key performance indicators (KPIs) may vary depending on the type of company and the individual’s role, but the competitive advantage conferred by advanced reporting solutions is consistent across them all. This ability, to make fact based decisions in the moment, is a hallmark of successful companies in every industry.  A global CFO study conducted by IBM showed that companies utilizing advanced analytics to drive decisions had twice the revenue growth and 20 times the EBITDA of their competitors.

By exposing data to business stakeholders, advanced reporting solutions quickly shine a light on data quality, business process and performance issues. This allows managers to identify missed opportunities and specific areas of improvement they can control. As they implement course corrections this process can dramatically improve the bottom line.

Insurance companies are no exception. If you want to outperform your competition, it is critical that you implement advanced analytics and reporting in your organization. Unfortunately, there are many ways this effort can go wrong, and only a few ways it can go right. Individual careers and entire companies hang in the balance, so it is worth taking a moment to explain the problem in some detail before we discuss the various solutions available.

We’re here to help and always glad to assist any way we can. Feel free to reach out if we may be of assistance.

Why is reporting so difficult?

All companies face challenges with analytics and reporting at one time or another. Though it may not be immediately obvious, all of these challenges stem from the same root cause — the necessary data exists, but isn’t easy to access or consume.

Modern reporting tools make data easy to visualize, but the resulting reports are only as good as the data feeding them. Before it can be utilized, we have to take all that data from internal systems, partners and 3rd-party data sources, and synthesize it into something useful. Data may need to be moved, integrated, aggregated, grouped, filtered or cleansed before it can be used to produce accurate and meaningful analytics. As the size and complexity of the data grows, and reporting needs become more advanced, the problem only gets worse.

The lack of “report-ready” data explains why data analysts, on average, spend 80% of their time preparing and organizing data . Anyone who has attempted to take raw data and calculate a loss ratio by accident year, or just about any advanced insurance analytics, will immediately understand the nature of the problem.

To learn what data visualization tools can and can’t do…

Can Reporting Tools Alone Solve your Reporting Challenges?

Given the complexity of transforming data into advanced analytics, and the associated expense, choosing the best approach is critically important. At LeapFrogBI we specialize in the centralized approach, often referred to as data warehousing and business intelligence. This approach is ideal for producing dashboards and reports with KPIs that are needed on an ongoing basis.

How business intelligence will transform your company

There are many things to like about the centralized approach to advanced insurance analytics. It is the most cost-efficient approach, and it places advanced analytics and self-service reporting capabilities in the hands of the employees making decisions. This results in a data-driven culture, deeper engagement and increased strategic decision making. There a variety of other benefits, some technical and some business related.

The 7 Advantages of a Data Warehouse for Analytics and Reporting

Empowers everyone throughout the organization with the information they need to make correct decisions

Enables self-service analytics for diving deeper into the data

Illuminates data and business process issues that need improving

Sharing curated, validated data creates alignment across the organization

Data from all sources are integrated to provide a 360-degree view of the business

Reduces dependence on source systems and provides data retention

More cost-efficient than distributed approach to analytics

By contrast, the decentralized approach requires data transformation and other data science expertise throughout the organization. This comes at a huge cost, and often results in duplication of effort and different answers to the same questions. Of course, there are types of data analysis that the data warehouse can’t support, specifically predictive model development, segmentation model development and other types of extensive statistical analysis. These are perfect applications for point-solution data science, but it must be understood that distributed analytics is not a substitute for a centralized, governed data solution like a data warehouse when it comes to day-to-day reporting.

To better understand the data warehouse versus the data lake…

What You Should Know Before You Go Sailing on the Data Lake

What advanced insurance analytics look like in practice

Every insurance company will have a unique set of metrics to track based on the type of company and its unique focus. What follows are a few examples of insurance business intelligence KPIs you might see on an insurance dashboard. These particular metrics are related to production, underwriting, claims and other common insurance data.

Insurance KPIs Made Possible by a Data Warehouse and Business Intelligence

  • Written Premium – The amount of premium written by policy effective date. Usually reported alongside forecasted written premium for the period. May be further segmented into new policies versus renewals, divided by line of business, etc.
  • Earned Premium – The amount of premium for the elapsed period of each policy, calculated using the desired logic and business rules. Typically reported based on policy effective date and may be further segmented by coverage, line, program, risk state and other relevant dimensions.
  • Commissions – The amount of money earned or paid as commissions by date. May be further segmented by coverage, line, program, employee/partner, etc.
  • Distribution Performance – Policy counts, written premium and commissions by distribution channel partner by date.
  • Rate Adequacy – Comparison of bound policy premiums to technical premiums by policy effective date. May be further segmented by coverage, line, program, underwriter, etc.
  • Cycle Time – The number of days it takes a policy to progress from a submission to a bound policy. May be reported by period, underwriter, line, program, etc.
  • Hit Ratios – The ratio of bound policies to submissions and bound policies to quotes. May be reported by time, line, program, underwriter, etc.
  • Underwriting Process Efficiency – A visual depiction of policy flow showing status changes from submission to bound by quantity. ( see an example )
  • Loss Ratio – The ratio of incurred loss expense to earned premium, by program year, accident year and calendar year.
  • Incurred and Reserve Amounts – by calendar year and by accident year, by program, line, etc.
  • Policy Counts – New, renewal and endorsement transaction counts by time by program, line, coverage type, market segment, risk state, etc.
  • Policy Renewal Retention Rate – Percent of expiring policies renewed, by period, cohort, program, line, etc.

Want to see and explore a sample insurance dashboard and reports?

Fully Interactive Insurance Dashboard

Your preferred KPIs may look different, and the development process allows for that. When designing a data warehouse, each business unit and its stakeholders have the opportunity to participate by defining reporting requirements that dictate the data model and data transformation rules. You will also have the opportunity to define hierarchies in the data which will enable users to drill down to the most granular transaction details or roll results up to custom groupings you’ve defined.

For many people the concept of a data warehouse and business intelligence can be confusing. These terms have been used by different people to mean different things, so it is helpful to break this process down to basic concepts.

Every data problem requires a data solution

In the simplest terms, the data needed to generate advanced insurance analytics exists, but isn’t report-ready. To make it report-ready we write and automate processes that re-engineer the data nightly, or more frequently. The transformed data then becomes the new data source, and it is consumed using standard reporting and data visualization software.

It all starts with the data Kerry Small, Vodafone

As stated by Kerry Small of Vodafone in HBR’s recent The New Decision Makers report, “It all starts with the data” and investing in a centralized data solution with a single record for everything “was the right thing to do.”

This data engineering process and its various components can take many forms. Choosing the best data solution depends on the number of data sources, data quantity, data quality, the complexity of reporting requirements and the existing data engineering and reporting infrastructure, if any.

Eventually, most insurance companies grow to the point where the benefits of a data warehouse outweigh the cost. The data warehouse has many advantages over other available data solutions, and chief among them is the dimensional data model .

The dimensional model is a unique data structure that was invented specifically to facilitate advanced reporting. When properly implemented, it allows non-technical users to ask very complex questions of the data with ease and receive answers in a fraction of a second. This concept of enabling self-service analytics is so powerful that we refer to it as the key to analytics that everyone should understand .

For a brief overview of available data solutions…

Choosing the Best Data Solution for Your Analytics Needs

Implementing an insurance data warehouse can be difficult. The overall failure rate for data warehouse projects is historically very high , and the costs are significant. There are important steps you can take to reduce, or even eliminate these risks.

7 Sure Fire Ways to Improve Data Warehouse Outcomes

  • Commit to analytics as a business function
  • Involve data warehouse experts early
  • Include a persistent staging area (PSA)
  • Use well-defined design patterns and code standards
  • Load less data, not more
  • Utilize agile instead of waterfall
  • Leverage data warehouse automation

In a competitive and rapidly evolving market, Insurance companies that successfully enable fact-based decision making across the organization have a powerful advantage. But deciding to be great at analytics, and making it so, are two different things. Some of the work must be done in-house, such as developing a culture of information, investing in data governance and recognizing analytics as the critical business function that it is. Much of the work, however, is usually best left to expert partners with the specialized skills and tools necessary to succeed.

© LeapFrogBI 2024

Businessman and businesswoman sitting in foyer working on laptop, elevated view

Sound infrastructure techniques, data management methods, rich functional content and an implementation roadmap help to reduce data warehouse development costs and minimize project risks.

The solution helps address regulatory compliance issues associated with reporting by providing the right level of data granularity. Examples include the GDPR, the CCPA and Solvency II.

Consolidating financial and actuarial data gives you more control, while reducing the time it takes to scope requirements, perform subsequent customization and carry out data warehouse extensions.

With no modeling or abstraction involved, business terms define in plain language the industry concepts involved in the insurance industry. Clearly defined business terms help support standardization and communication within an organization. Mapping to the data models makes it possible to create a common, enterprise-wide picture of the data requirements and transform IT data structures based on those requirements.

The dimensional warehouse model provides the data design support needed to transform enterprise-level business requirements into efficient, business-specific structures dedicated to the design of a dimensional data repository. The comprehensive logical data models contain the predefined data warehouse structures required to store all financial services data in an efficient layout for analytics.

Supportive content captures non-reporting requirements in a particular domain and relates them to the data warehouse model entities, relationships and attributes. It provides a method of mapping both external and internal terms, from business standards and other requirements to business terms and atomic and dimensional warehouse models.

The IBM solution provides an industry-specific vocabulary that can help discover and govern privacy data. It includes KPI templates for regulatory reporting and a hierarchy of General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) terminology. The glossary and underlying data warehouse models help organizations ensure that their enterprise data architecture is able to provide the necessary data artifacts to report on data protection issues.

Analytical requirements reflect the most common queries and analyses for business performance measurement and reporting, while supporting other analytical functions such as ad hoc reporting and decision support. Over 140 predefined business reporting requirement templates are provided, addressing the common business reporting and analysis requests from risk, finance, compliance, CRM and line-of-business users.

IBM Insurance Information Warehouse provides the necessary modeling tools and support for requirements gathering to accelerate Solvency II implementations and build a flexible, fit-for purpose risk management warehouse. The models make up a flexible, scalable solution and provide a unified view of critical business data for risk management. Coverage for Solvency II includes support for asset management; balance sheet; premiums, claims and expenses; and reinsurance, life and non-life technical provisions. Aligned to data point model 2.3.0.

This is the first point where various business requirements are brought together and modeled in an entity relationship format. This component includes common design constructs that can transform into separate models for dedicated purposes, such as operational data stores, warehouses and data marts. Designed for the insurance industry, the business data model contains thousands of business definitions and provides an enterprise-wide view of data common to all insurers.

The atomic warehouse model is a logical model consisting of the data structures typically needed by an insurer for a data warehouse. The comprehensive logical data models contain the predefined data warehouse structures required to store all financial services data in an efficient layout for historical and atomic data.

Accelerates the design of an enterprise data warehouse or business intelligence solution, based on financial services business requirements.

Predefined energy and utilities-specific vocabularies, KPIs and data structures, which can help accelerate enterprise governance and analytics projects.

Predefined healthcare-specific vocabularies, KPIs and data structures, which can help accelerate enterprise governance and analytics projects.

Press ENTER to search or ESC to exit

Case Study | Multinational Insurance Company

Automation of the technical completion.

Data warehouse and reporting to reduce operating costs, minimise manual intervention and reduce operational risk

caso-de-exito-grupo-hotelero-multinacional-seguros

Optimise the management of internal processes in relation to monthly technical closure, reducing work time and eliminating the use of manual Excels. Optimise data quality and ensure data integrity.

  • Incorporation of multiple data sources.
  • Standardisation.
  • Design of the data warehouse and processes.
  • Design of the analytical model / semantic layer.
  • Creation of a reporting portal in SSRS that includes all the technical closure options.
  • Standardised and unique data.
  • Recurrent incidental reduction.
  • Reduction of closing time from one month to one day: data is constantly updated and closing is produced on demand, with all calculation processes automated.

bismart-business-intelligence-cine-power-bi-data-warehouse-graf

Other Case Studies

Public consulting and engineering company, hospital del mar ferrer group, pharmaceutical laboratory, multinational movie theater company, multinational pharmaceutical laboratory, multinational hotel group, multinational insurance company, nuclear power plant, municipal services, public services, social services, spanish city council, retail company.

Loading metrics

Open Access

Good practices for clinical data warehouse implementation: A case study in France

* E-mail: [email protected]

Affiliations Mission Data, Haute Autorité de Santé, Saint-Denis, France, Inria, Soda team, Palaiseau, France

ORCID logo

Affiliation Mission Data, Haute Autorité de Santé, Saint-Denis, France

Affiliations Univ. Lille, CHU Lille, ULR 2694—METRICS: Évaluation des Technologies de santé et des Pratiques médicales, Lille, France, Fédération régionale de recherche en psychiatrie et santé mentale (F2RSM Psy), Hauts-de-France, Saint-André-Lez-Lille, France

Affiliation Sorbonne Université, Inserm, Université Sorbonne Paris-Nord, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-Santé, LIMICS, France

  • Matthieu Doutreligne, 
  • Adeline Degremont, 
  • Pierre-Alain Jachiet, 
  • Antoine Lamer, 
  • Xavier Tannier

PLOS

Published: July 6, 2023

  • https://doi.org/10.1371/journal.pdig.0000298
  • Reader Comments

29 Sep 2023: Doutreligne M, Degremont A, Jachiet PA, Lamer A, Tannier X (2023) Correction: Good practices for clinical data warehouse implementation: A case study in France. PLOS Digital Health 2(9): e0000369. https://doi.org/10.1371/journal.pdig.0000369 View correction

Fig 1

Real-world data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern clinical data warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema, and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multicentric data reuses as well as innovations in routine care.

Author summary

Reusing routine care data does not come free of charges. Attention must be paid to the entire life cycle of the data to create robust knowledge and develop innovation. Building upon the first overview of CDWs in France, we document key aspects of the collection and organization of routine care data into homogeneous databases: governance, transparency, types of data, data reuse main objectives, technical tools, documentation, and data quality control processes. The landscape of CDWs in France dates from 2011 and accelerated in the late 2020, showing a progressive but still incomplete homogenization. National and European projects are emerging, supporting local initiatives in standardization, methodological work, and tooling. From this sample of CDWs, we draw general recommendations aimed at consolidating the potential of routine care data to improve healthcare. Particular attention must be paid to the sustainability of the warehouse teams and to the multilevel governance. The transparency of the data transformation tools and studies must improve to allow successful multicentric data reuses as well as innovations for the patient.

Citation: Doutreligne M, Degremont A, Jachiet P-A, Lamer A, Tannier X (2023) Good practices for clinical data warehouse implementation: A case study in France. PLOS Digit Health 2(7): e0000298. https://doi.org/10.1371/journal.pdig.0000298

Editor: Dukyong Yoon, Yonsei University College of Medicine, REPUBLIC OF KOREA

Copyright: © 2023 Doutreligne et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: MD, AD, PAJ salaries were funded by the French Haute Autorité de Santé (HAS). XT received fundings to participate in interviews and participate to the article redaction. AL received no fundings for this study. The funders validated the study original idea and the study conclusions. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: The first author did a (non-paid) visiting in Leo Anthony Celi’s lab during the first semester of 2023.

Introduction

Real-world data.

Health information systems (HIS) are increasingly collecting routine care data [ 1 – 7 ]. This source of real-world data (RWD) [ 8 ] bears great promises to improve the quality of care. On the one hand, the use of this data translates into direct benefits—primary uses—for the patient by serving as the cornerstone of the developing personalized medicine [ 9 , 10 ]. They also bring indirect benefits—secondary uses—by accelerating and improving knowledge production: on pathologies [ 11 ], on the conditions of use of health products and technologies [ 12 , 13 ], on the measures of their safety [ 14 ], efficacy or usefulness in everyday practice [ 15 ]. They can also be used to assess the organizational impact of health products and technologies [ 16 , 17 ].

In recent years, health agencies in many countries have conducted extensive work to better support the generation and use of real-life data [ 8 , 17 – 19 ]. Study programs have been launched by regulatory agencies: the DARWIN EU program by the European Medicines Agency and the Real World Evidence Program by the Food and Drug Administration [ 20 ].

Clinical data warehouse

In practice, the possibility of mobilizing these routinely collected data depends very much on their degree of concentration, in a gradient that goes from centralization in a single, homogenous HIS to fragmentation in a multitude of HIS with heterogeneous formats. The structure of the HIS reflects the governance structure. Thus, the ease of working with these data depends heavily on the organization of the healthcare actors. The 2 main sources of RWD are insurance claims—more centralized—and clinical data—more fragmented.

Claims data is often collected by national agencies into centralized repositories. In South Korea, the government agency responsible for healthcare system performance and quality (HIRA) is connected to the HIS of all healthcare stakeholders. HIRA data consists of national insurance claims [ 21 ]. England has a centralized healthcare system under the National Health Service (NHS). Despite not having detailed clinical data, this allowed the NHS to merge claims data with detailed data from 2 large urban medicine databases, corresponding to the 2 major software publishers [ 22 ]. This data is currently accessed through Opensafely, a first platform focused on Coronavirus Disease 2019 (COVID-19) research [ 23 ]. In the United States, even if scattered between different insurance providers, claims are pooled into large databases such as Medicare, Medicaid, or IBM MarketScan. Lastly, in Germany, the distinct federal claims have been centralized only very recently [ 24 ].

Clinical data on the other hand, tends to be distributed among many entities, that made different choices, without common management or interoperability. But large institutional data-sharing networks begin to emerge. South Korea very recently launched an initiative to build a national wide data network focused on intensive care. United States is building Chorus4ai, an analysis platform pooling data from 14 university hospitals [ 25 ]. To unlock the potential of clinical data, the German Medical Informatics Initiative [ 26 ] created 4 consortia in 2018. They aim at developing technical and organizational solutions to improve the consistency of clinical data.

Israel stands out as one of the rare countries that pooled together both claims and clinical data at a large scale: half of the population depends on 1 single healthcare provider and insurer [ 27 ].

An infrastructure is needed to pool data data from 1 or more medical information systems—whatever the organizational framework—to homogeneous formats, for management, research, or care reuses [ 28 , 29 ]. Fig 1 illustrates for a CDW, the 4 phases of data flow from the various sources that make up the HIS:

  • Collection and copying of original sources.
  • Integration of sources into a unique database.
  • Deduplication of identifiers.
  • Standardization: A unique data model, independent of the software models harmonizes the different sources in a common schema, possibly with common nomenclatures.
  • Pseudonymization: Removal of directly identifying elements.
  • Provision of subpopulation data sets and transformed datamarts for primary and secondary reuse.
  • Usages thanks to dedicated applications and tools accessing the datamarts and data sets.

In France, the national insurer collects all hospital activity and city care claims into a unique reimbursement database [ 13 ]. However, clinical data is historically scattered at each care site in numerous HISs. Several hospitals deployed efforts for about 10 years to create CDWs from electronic medical records [ 30 – 39 ]. This work has accelerated recently, with the beginning of CDWs structuring at the regional and national levels. Regional cooperation networks are being set up—such as the Ouest Data Hub [ 40 ]. In July 2022, the Ministry of Health opened a 50 million euros call for projects to set up and strengthen a network of hospital CDWs coordinated with the national platform, the Health Data Hub by 2025.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

CDW: Four steps of data flow from the Hospital Information System: (1) collection, (2) transformations, and (3) provisioning. CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g001

Based on an overview of university hospital CDWs in France, this study makes general recommendations for properly leveraging the potential of CDWs to improve healthcare. It focuses on: governance, transparency, types of data, data reuse, technical tools, documentation, and data quality control processes.

Material and methods

Interviews were conducted from March to November 2022 with 32 French regional and university hospitals, both with existing and prospective CDWs.

Ethics statement

This work has been authorized by the board of the French High Authority of Health (HAS). Every interviewed participant was asked by email for their participation and informed on the possible forms of publication: a French official report and an international publication. Furthermore, at each interview, every participant has been asked for their agreement before recording the interview. Only 1 participant refused the video to be recorded.

Semi-structured interviews were conducted on the following themes: the initiation and construction of the CDWs, the current status of the project and the studies carried out, opportunities and obstacles, and quality criteria for observational research. S1 Table lists all interviewed people with their team title. The complete form, with the precised questions, is available in S2 Table .

The interview form was sent to participants in advance and then used as a support to conduct the interviews. The interviews lasted 90 min and were recorded for reference.

Quantitative methods

Three tables detailed the structured answers in S1 Text . The first 2 tables deal with the characteristics of the actors and those of the data warehouses. We completed them based on the notes taken during the interviews, the recordings, and by asking the participants for additional information. The third table focuses on ongoing studies in the CDWs. We collected the list of these studies from the dedicated reporting portals, which we found for 8 out of 14 operational CDWs. We developed a classification of studies, based on the typology of retrospective studies described by the OHDSI research network [ 41 ]. We enriched this typology by comparing it with the collected studies resulting in the 6 following categories:

  • Outcome frequency : Incidence or prevalence estimation for a medically well-defined target population.
  • Population characterization : Characterization of a specific set of covariates. Feasibility and prescreening studies belong to this category [ 42 ].
  • Risk factors : Identification of covariates most associated with a well-defined clinical target (disease course, care event). These studies look at association study without quantifying the causal effect of the factors on the outcome of interest.
  • Treatment effect : Evaluation of the effect of a well-defined intervention on a specific outcome target. These studies intend to show a causal link between these 2 variables [ 43 ].
  • Development of diagnostic and prognostic algorithms : Improve or automate a diagnostic or prognostic process, based on clinical data from a given patient. This can take the form of a risk, a preventive score, or the implementation of a diagnostic assistance system. These studies are part of the individualized medicine approach, with the goal of inferring relevant information at the level of individual patient’s files.
  • Medical informatics : Methodological or tool oriented. These studies aim to improve the understanding and capacity for action of researchers and clinicians. They include the evaluation of a decision support tool, the extraction of information from unstructured data, or automatic phenotyping methods.

Studies were classified according to this nomenclature based on their title and description.

Fig 2 summarizes the development state of progress of CDWs in France. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The results are described for all projects that are at least in the prospective stage minus the 3 that we were unable to interview after multiple reminders (Orléans, Metz, and Caen), resulting in a denominator of 21 university hospitals.

thumbnail

Base map and data from OpenStreetMap and OpenStreetMap Foundation. Link to the base layer of the map: https://github.com/mapnik/mapnik . CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g002

Fig 3 shows the history of the implementation of CDWs. A distinction must be made between the first works—in blue—, which systematically precede the regulatory authorization—in green—from the French Commission on Information Technology and Liberties (CNIL).

thumbnail

CDW, clinical data warehouse.

https://doi.org/10.1371/journal.pdig.0000298.g003

The CDWs have so far been initiated by 1 or 2 people from the hospital world with an academic background in bioinformatics, medical informatics, or statistics. The sustainability of the CDW is accompanied by the construction of a cooperative environment between different actors: Medical Information Department (MID), Information Systems Department (IT), Clinical Research Department (CRD), clinical users, and the support of the management or the Institutional Medical Committee. It is also accompanied by the creation of a team, or entity, dedicated to the maintenance and implementation of the CDW. More recent initiatives, such as those of the HCL (Hospitals of the city of Lyon) or the Grand-Est region, are distinguished by an initial, institutional, and high-level support.

The CDW has a federating potential for the different business departments of the hospital with the active participation of the CRD, the IT Department, and the MID. Although there is always an operational CDW team, the human resources allocated to it vary greatly: from half a full-time equivalent to 80 people for the AP-HP, with a median of 6.0 people. The team systematically includes a coordinating physician. It is multidisciplinary with skills in public health, medical informatics, informatics (web service, database, network, infrastructure), data engineering, and statistics.

Historically, the first CDWs were based on in-house solution development. More recently, private actors are offering their services for the implementation and implementation of CDWs (15/21). These services range from technical expertise in order to build up the data flows and data cleaning up to the delivery of a platform integrating the different stages of data processing.

Management of studies

Before starting, projects are systematically analyzed by a scientific and ethical committee. A local submission and follow-up platform is often mentioned (12/21), but its functional scope is not well defined. It ranges from simple authorization of the project to the automatic provision of data into a Trusted Research Environment (TRE) [ 44 ]. The processes for starting a new project on the CDW are always communicated internally but rarely documented publicly (8/21).

Transparency

Ongoing studies in CDWs are unevenly referenced publicly on hospital websites. Some institutions have comprehensive study portals, while others list only a dozen studies on their public site while mentioning several hundreds ongoing projects during interviews. In total, we found 8 of these portals out of 14 CDWs in production. Uses other than ongoing scientific studies are very rarely documented. The publication of the list of ongoing studies is very heterogeneous and fragmented between several sources: clinicaltrials.gov, the mandatory project portal of the Health Data Hub [ 45 ] or the website of the hospital data warehouse.

Strong dependance to the HIS.

CDW data reflect the HIS used on a daily basis by hospital staff. Stakeholders point out that the quality of CDW data and the amount of work required for rapid and efficient reuse are highly dependent on the source HIS. The possibility of accessing data from an HIS in a structured and standardized format greatly simplifies its integration into the CDW and then its reuse.

Categories of data.

Although the software landscape is varied across the country, the main functionalities of HIS are the same. We can therefore conduct an analysis of the content of the CDWs, according to the main categories of common data present in the HIS.

The common base for all CDWs is constituted by data from the Patient Administrative Management software (patient identification, hospital movements) and the billing codes. Then, data flows are progressively developed from the various softwares that make up the HIS. The goal is to build a homogeneous data schema, linking the sources together, controlled by the CDW team. The prioritization of sources is done through thematic projects, which feed the CDW construction process. These projects improve the understanding of the sources involved, by confronting the CDW team with the quality issues present in the data.

Table 1 presents the different ratio of data categories integrated in French CDWs. Structured biology and texts are almost always integrated (20/21 and 20/21). The texts contain a large amount of information. They constitute unstructured data and are therefore more difficult to use than structured tables. Other integrated sources are the hospital drug circuit (prescriptions and administration, 16/21), Intense Care Unit (ICU, 2/21), or nurse forms (4/21). Imaging is rarely integrated (4/21), notably for reasons of volume. Genomic data are well identified, but never integrated, even though they are sometimes considered important and included in the CDW work program.

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.t001

Data reuse.

Today, the main use put forward for the constitution of CDWs is that of scientific research.

The studies are mainly observational (non-interventional). Fig 4 presents the distribution of the 6 categories defined in Quantitative methods for 231 studies collected on the study portals of 9 hospitals. The studies focus first on population characterization (25%), followed by the development of decision support processes (24%), the study of risk factors (18%), and the treatment effect evaluations (16%).

thumbnail

https://doi.org/10.1371/journal.pdig.0000298.g004

The CDWs are used extensively for internal projects such as student theses (at least in 9/21) and serve as an infrastructure for single-service research: their great interest being the de-siloing of different information systems. For most of the institutions interviewed, there is still a lack of resources and maturity of methods and tools for conducting inter-institutional research (such as in the Grand-Ouest region of France) or via European calls for projects (EHDEN). These 2 research networks are made possible by supra-local governance and a common data schema, respectively, eHop [ 46 ] and OMOP [ 47 ]. The Paris hospitals, thanks to its regional coverage and the choice of OMOP, is also well advanced in multicentric research. At the same time, the Grand-Est region is building a network of CDW based on the model of the Grand-Ouest region, also using eHop.

CDW are used for monitoring and management (16/21).

The CDW have sometimes been initiated to improve and optimize billing coding (4/21). The clinical texts gathered in the same database are queried using keywords to facilitate the structuring of information. The data are then aggregated into indicators, some of which are reported at the national level. The construction of indicators from clinical data can also be used for the administrative management of the institution. Finally, closer to the clinic, some actors state that the CDW could also be used to provide regular and appropriate feedback to healthcare professionals on their practices. This feedback would help to increase the involvement and interest of healthcare professionals in CDW projects. The CDW is sometimes of interest for health monitoring (e.g., during COVID-19) or pharmacovigilance (13/21).

Strong interest for CDW in the context of care (13/21).

Some CDWs develop specific applications that provide new functionalities compared to care software. Search engines can be used to query all the hospital’s data gathered in the CDW, without data compartmentalization between different softwares. Dedicated interfaces can then offer a unified view of the history of a patient’s data, with inter-specialty transversality, which is particularly valuable in internal medicine. These cross-disciplinary search tools also enable healthcare professionals to conduct rapid searches in all the texts, for example, to find similar patients [ 32 ]. Uses for prevention, automation of repetitive tasks, and care coordination are also highlighted. Concrete examples are the automatic sorting of hospital prescriptions by order of complexity or the setting up of specialized channels for primary or secondary prevention.

Technical architecture

The technical architecture of modern CDWs has several layers:

  • Data processing: connection and export of source data, diverse transformation (cleaning, aggregation, filtering, standardization).
  • Data storage: database engines, file storage (on file servers or object storage), indexing engines to optimize certain queries.
  • Data exposure: raw data, APIs, dashboards, development and analysis environments, specific web applications.

Supplementary cross-functional components ensure the efficient and secure operation of the platform: identity and authorization management, activity logging, automated administration of servers and applications.

The analysis environment (Jupyterhub or RStudio datalabs) is a key component of the platform, as it allows data to be processed within the CDW infrastructure. A few CDWs had such operational datalab at the time of our study (6/21) and almost all of them have decided to provide it to researchers. Currently, clinical research teams are still often working on data extractions in less secure environments.

Data quality, standard formats

Quality tools..

Systematic data quality monitoring processes are being built in some CDWs. Often (8/21), scripts are run at regular intervals to detect technical anomalies in data flows. Rare data quality investigation tools, in the form of dashboards, are beginning to be developed internally (3/21). Theoretical reflections are underway on the possibility of automating data consistency checks, for example, demographic or temporal. Some facilities randomly pull records from the EHR to compare them with the information in the CDW.

Standard format.

No single standard data model stands out as being used by all CDWs. All are aware of the existence of the OMOP (research standard) [ 47 ] and HL7 FHIR (communication standard) models [ 48 ]. Several CDWs consider the OMOP model to be a central part of the warehouse, particularly for research purposes (9/21). This tendency has been encouraged by the European call for projects EHDEN, launched by the OHDSI research consortium, the originator of this data model. In the Grand-Ouest region of France, the CDWs use the eHop warehouse software. The latter uses a common data model also named eHop. This model will be extended with the future warehouse network of the Grand Est region also choosing this solution. Including this grouping and the other establishments that have chosen eHop, this model includes 12 establishments out of the 32 university hospitals. This allows eHop adopters to launch ambitious interregional projects. However, eHop does not define a standard nomenclature to be used in its model and is not aligned with emerging international standards.

Documentation.

Half of the CDWs have put in place documentation accessible within the organization on data flows, the meaning and proper use of qualified data (10/21 mentioned). This documentation is used by the team that develops and maintains the warehouse. It is also used by users to understand the transformations performed on the data. However, it is never publicly available. No schema of the data once it has been transformed and prepared for analysis is published.

Principal findings

We give the first overview of the CDWs in university hospitals of France with 32 hospitals reviewed. The implementation of CDW dates from 2011 and accelerated in the late 2020. Today, 24 of the university hospitals have an ongoing CDW project. From this case study, some general considerations can be drawn that should be valuable to all healthcare system implementing CDWs on a national scale.

As the CDW becomes an essential component of data management in the hospital, the creation of an autonomous internal team dedicated to data architecture, process automation, and data documentation should be encouraged [ 44 ]. This multidisciplinary team should develop an excellent knowledge of the data collection process and potential reuses in order to qualify the different flows coming from the source IS, standardize them towards a homogenous schema and harmonize the semantics. It should have a sound knowledge of public health, as well as the technical and statistical skills to develop high-quality software that facilitates data reuse.

The resources specific to the warehouse are rare and often taken from other budgets or from project-based credits. While this is natural for an initial prototyping phase, it does not seem adapted to the perennial and transversal nature of the tool. As a research infrastructure of growing importance, it must have the financial and organizational means to plan for the long term.

The governance of the CDW has multiple layers: local within the university hospital, interregional, and national/international. The first level allow to ensure the quality of data integration as well as the pertinence of data reuse by clinicians themselves. The interregional level is well adapted for resources mutualization and collaboration. Finally, the national and international levels assure coordination, encourage consensus for committing choices such as metadata or interoperability, and provide financial, technical, and regulatory support.

Health technology assessment agencies advocate for public registration of comparative observational study protocols before conducting the analysis [ 8 , 17 , 49 ]. They often refer to clinicaltrials.gov as potential but not ideal registration portal for observational studies. The research community advocates for public registrations of all observational studies [ 50 , 51 ]. More recently, it emphasizes the need for more easy data access and the publication of study code [ 29 , 52 , 53 ]. We embrace these recommendations and we point to the unfortunate duplication of these study reporting systems in France. One source could be favored at the national level and the second one automatically fed from the reference source, by agreeing on common metadata.

From a patient’s perspective, there is currently no way to know if their personal data is included for a specific project. Better patient information about the reuse of their data is needed to build trust over the long term. A strict minimum is the establishment and update of the declarative portals of ongoing studies at each institution.

Data and data usage

When using CDW, the analyst has not defined the data collection process and is generally unaware of the context in which the information is logged. This new dimension of medical research requires a much greater development of data science skills to change the focus from the implementation of the statistical design to the data engineering process. Data reuse requires more effort to prepare the data and document the transformations performed.

The more heterogeneous a HIS system is, the less qualitative would be the CDW built on top of it. There is a need for increasing interoperability, to help EHR vendors interfacing the different hospital softwares, thus facilitating CDW development. One step in this direction would be the open source publication of HIS data schema and vocabularies. At the analysis level, international recommendations insist on the need for common data formats [ 52 , 54 ]. However, there is still a lack of adoption of research standards from hospital CDWs to conduct robust studies across multiple sites. Building open-source tools on top of these standards such as those of OHDSI [ 41 ] could foster their adoption. Finally, in many clinical domains, sufficient sample size is hard to obtain without international data-sharing collaborations. Thus, more incitation is needed to maintain and update the terminology mappings between local nomenclatures and international standards.

Many ongoing studies concern the development of decision support processes whose goal is to save time for healthcare professionals. These are often research projects, not yet integrated into routine care. The analysis of study portals and the interviews revealed that data reuse oriented towards primary care is still rare and rarely supported by appropriate funding. The translation from research to clinical practice takes time and need to be supported on the long run to yield substantial results.

Tools, methods, and data formats of CDW lack harmonization due to the strong technical innovation and the presence of many actors. As suggested by the recent report on the use of data for research in the UK [ 44 ], it would be wise to focus on a small number of model technical platforms.

These platforms should favor open-source solutions to assure transparency by default, foster collaboration and consensus, and avoid technological lock-in of the hospitals.

Data quality and documentation

Quality is not sufficiently considered as a relevant scientific topic itself. However, it is the backbone of all research done within a CDW. In order to improve the quality of the data with respect to research uses, it is necessary to conduct continuous studies dedicated to this topic [ 52 , 54 – 56 ]. These studies should contribute to a reflection on methodologies and standard tools for data quality, such as those developed by the OHDSI research network [ 41 ].

Finally, there is a need for open-source publication of research code to ensure quality retrospective research [ 55 , 57 ]. Recent research in data analysis has shown that innumerable biases can lurk in training data sets [ 58 , 59 ]. Open publication of data schemas is considered an indispensable prerequisite for all data science and artificial intelligence uses [ 58 ]. Inspired by data set cards [ 58 ] and data set publication guides, it would be interesting to define a standard CDW card documenting the main data flows.

Limitations

The interviews were conducted in a semi-structured manner within a limited time frame. As a result, some topics were covered more quickly and only those explicitly mentioned by the participants could be recorded. The uneven existence of study portals introduces a bias in the recording of the types of studies conducted on CDW. Those with a transparency portal already have more maturity in use cases.

For clarity, our results are focused on the perimeter of university hospitals. We have not covered the exhaustive healthcare landscape in France. CDW initiatives also exist in primary care, in smaller hospital groups and in private companies.

Conclusions

The French CDW ecosystem is beginning to take shape, benefiting from an acceleration thanks to national funding, the multiplication of industrial players specializing in health data and the beginning of a supra-national reflection on the European Health Data Space [ 60 ]. However, some points require special attention to ensure that the potential of the CDW translates into patient benefits.

The priority is the creation and perpetuation of multidisciplinary warehouse teams capable of operating the CDW and supporting the various projects. A combination of public health, data engineering, data stewardship, statistics, and IT competences is a prerequisite for the success of the CDW. The team should be the privileged point of contact for data exploitation issues and should collaborate closely with the existing hospital departments.

The constitution of a multilevel collaboration network is another priority. The local level is essential to structure the data and understand its possible uses. Interregional, national, and international coordination would make it possible to create thematic working groups in order to stimulate a dynamic of cooperation and mutualization.

A common data model should be encouraged, with precise metadata allowing to map the integrated data, in order to qualify the uses to be developed today from the CDWs. More broadly, open-source documentation of data flows and transformations performed for quality enhancement would require more incentives to unleash the potential for innovation for all health data reusers.

Finally, the question of expanding the scope of the data beyond the purely hospital domain must be asked. Many risk factors and patient follow-up data are missing from the CDWs, but are crucial for understanding pathologies. Combining city data and hospital data would provide a complete view of patient care.

Supporting information

S1 table. list of interviewed stakeholders with their teams..

https://doi.org/10.1371/journal.pdig.0000298.s001

S2 Table. Interview form.

https://doi.org/10.1371/journal.pdig.0000298.s002

S1 Text. Study data tables.

https://doi.org/10.1371/journal.pdig.0000298.s003

Acknowledgments

We want to thanks all participants and experts interviewed for this study. We also want to thanks other people that proof read the manuscript for external review: Judith Fernandez (HAS), Pierre Liot (HAS), Bastien Guerry (Etalab), Aude-Marie Lalanne Berdouticq (Institut Santé numérique en Société), Albane Miron de L’Espinay (ministère de la Santé et de la Prévention), and Caroline Aguado (ministère de la Santé et de la Prévention). We also thank Gaël Varoquaux for his support and advice.

  • View Article
  • PubMed/NCBI
  • Google Scholar

You are using an outdated browser. Please, upgrade to a different browser or install Google Chrome Frame to experience this site.

Aspire Systems

  • Producteering
  • Usability Engineering
  • Integration
  • Banking and Finance
  • Big Data and Analytics
  • Collaboration
  • User Experience
  • Cyber Security
  • Data and Analytics
  • ERP – Automobile
  • ERP – Real Estate
  • ERP Implementation
  • ERP Support
  • Financials Cloud
  • Managed Services
  • Robotic Process Automation
  • Request for information

Building a Data Warehouse and its benefits to the Insurance Industry

Building aData Warehouse

“As of 2022, over 60% of all corporate data is stored in the cloud. This is up from 30% in 2015” 

Data warehouse is an essential tool for businesses looking to make better use of their data. In the insurance industry, cloud data warehouse can help organizations store and analyse vast amounts of data to gain insights into their operations, customers, and markets.

Steps to implement data warehouse tools in your organisation:  

  • Define your data requirements : 

The first step in implementing data warehouse is to identify the data that needs to be stored and analysed. In the insurance industry, this could include data on policyholders, claims, premiums, underwriting, and risk assessment. 

  • Choose an appropriate data warehouse platform : 

There are many data warehouse tools available, and each has its strengths and weaknesses. Insurance companies should evaluate different options based on their specific needs, such as scalability, security, and cost. 

  • Build the data warehouse : 

Once the desired cloud data warehouse platform is chosen, the next step is to build the data warehouse. This involves creating a database schema and data structures that can efficiently store and retrieve data. 

  • Extract and load the data : 

The next step is to extract data from various sources, such as legacy systems, spreadsheets, and databases, and load it into the data warehouse. This can be a complex process that requires careful planning and execution. 

  • Analyse the data : 

With the data warehouse in place, insurance companies can begin to analyse their data. This could include building dashboards and reports to visualize data, creating predictive models to identify trends and patterns, and performing ad-hoc analysis to answer specific questions. 

  • Make data-driven decisions : 

The ultimate goal of building a data warehouse is to help companies make better decisions. In the insurance industry, this could involve identifying new market opportunities, improving risk assessment, and enhancing customer satisfaction. 

The following are the key steps that insurers should take when implementing data warehousing in the insurance industry:  

Understand the data needs of the business:  

Before embarking on a data warehousing project, it is crucial to understand the specific data requirements of the business. This involves understanding existing business processes and identifying any new or existing data sources that could be leveraged for improved decision-making. 

Choose the right platform:  

Insurers need to decide which platform is best suited for their specific needs, considering factors such as scalability, performance, cost, and integration capabilities. 

Develop an efficient data architecture:  

Insurers must design an efficient data architecture that ensures the security, availability, and scalability of the warehouse. This includes developing a standard data model and setting up processes for data ingestion and storage. 

Implement data governance policies:  

Data governance is essential to ensure the accuracy and integrity of the data stored in the warehouse. This involves setting policies related to data security, privacy, and quality. 

Leverage analytics:  

Once the data is stored in the warehouse, insurers can leverage analytics tools to gain insights into customer behaviour, market trends, and financial performance. 

Implementing an effective data warehouse requires careful planning and consideration. However, when done right, it can bring many benefits such as improved customer service, increased operational efficiency, reduced costs, and better decision-making. 

Benefits of data warehouse implementation in Insurance sector:  

  • Improved Data Integration : 

Data warehousing allows the integration of data from various sources, such as policy administration systems, claims systems, underwriting systems, and financial systems. By integrating data from these different sources, insurers can gain a holistic view of their business, identify trends and patterns, and make better-informed decisions. 

  • Enhanced Reporting and Analytics:  

With data warehousing, insurers can perform advanced analytics and reporting on their data. This can help insurers to identify new business opportunities, optimize their underwriting processes, improve their claims management, and detect fraud. 

  • Better Risk Management : 

Data warehousing can help insurers to identify and manage risks more effectively. By analysing historical data, insurers can identify patterns of risky behaviour and take proactive measures to mitigate potential losses. 

  • Improved Customer Insights : 

With data warehousing, insurers can gain a deeper understanding of their customers. By analysing customer data, insurers can identify customer preferences, behaviours, and needs, and develop more targeted marketing and customer service strategies. 

  • Increased Operational Efficiency : 

Data warehousing can help insurers to streamline their operations by reducing the time and effort required to access and analyse data. By centralizing data in a single location, insurers can also reduce the risk of data errors and inconsistencies. 

Data warehouse: a positive disruption in the realm of insurance business 

There is little to no question that the world is witnessing a remarkable revolution in the realms of technology and industries. Now that you know how to put data warehousing to good use, you can give a new lease of life to the business data that have been unnoticed and hidden somewhere laying across your organization. It is important to note that implementing an effective data warehouse requires careful planning and consideration. However, when done right with the right technological partner such as Aspire Systems, rest assured that the implementation results in bringing up a plethora of benefits such as improved customer service, increased operational efficiency, reduced costs, and better decision making. For insurers looking to get the most out of their data, investing in a robust data warehouse system is a worthwhile endeavour at that. 

Why still thinking…!!!! Looking for Data Warehouse services? click the below link.

  • Recent Posts
  • Building a Data Warehouse and its benefits to the Insurance Industry - April 26, 2023

Leave a Reply Cancel reply

You must be logged in to post a comment.

© Copyrights 2024 Aspire Systems

Careers | In the News | Terms of Use | Privacy | Site Map

insurance data warehouse case study

Careers | In the News

Terms of Use | Privacy | Site Map | Contact Us

� Copyrights 2024 Aspire Systems

insurance data warehouse case study

please enter your email address to access the whitepaper

Kindly fill in your details.

  • Subscribe to exclusive whitepapers and newsletter

amazon

Sign up now and get FREE access to articles, newsletter, infographics, videos and blogs

Data Consulting & Analytics Experts

insurance data warehouse case study

Select a Region

Choose the region closest to you for relevant content and targeted services

Kimball GroupData Warehouse Insurance - Kimball Group

  • Data Warehouse Insurance
  • By Ralph Kimball
  • December 1, 1995

Print Friendly, PDF & Email

Insurance is an important and growing sector for the data warehousing market. Several factors have come together in the last year or two to make data warehouses for large insurance companies both possible and extremely necessary. Insurance companies generate several complicated transactions that must be analyzed in many different ways. Until recently, it wasn’t practical to consider storing hundreds of millions — or even billions — of transactions for online access. With the advent of powerful SMP and MPP Unix processors and powerful database query software, these big complicated databases have begun to enter the comfort zone for data warehousing. At the same time, the insurance industry is under incredible pressure to reduce costs. Costs in this business come almost entirely from claims or “losses,” as the insurance industry more accurately describes them.

The design of a big insurance data warehouse must deal with several issues common to all insurance companies. This month, I use InsureCo as a case study to illustrate these issues and show how to resolve them in a data warehouse environment. InsureCo is the pseudonym of a major insurance company that offers automobile, homeowner’s, and personal property insurance to about two million customers. InsureCo has annual revenues of more than $2 billion. My company designed InsureCo’s corporate data warehouse for analyzing all claims across all its lines of business, with history in some cases stretching back more than 15 years.

The first step at InsureCo was to spend two weeks interviewing prospective end users in claims analysis, claims processing, field operations, fraud and security management, finance, and marketing. We talked to more than 50 users, ranging from individual contributors to senior management. From each group of users we elicited descriptions of what they did in a typical day, how they measured the success of what they did, and how they thought they could understand their businesses better. We did not ask them what they wanted in a computerized database. It was our job to design, not theirs.

From these interviews we found three major themes that profoundly affected our design. First, to understand their claims in detail, the users needed to see every possible transaction. This precluded presenting summary data only. Many end-user analyses required the slicing and dicing of the huge pool of transactions.

Second, the users needed to view the business in monthly intervals. Claims needed to be grouped by month, and compared at month’s end to other months of the same year, or to months in previous years. This conflicted with the need to store every transaction, because it was impractical to roll-up complex sequences of transactions just to get monthly premiums and monthly claims payments. Third, we needed to deal with the heterogeneous nature of InsureCo’s lines of business. The facts recorded for an automobile accident claim are different than those recorded for a homeowner’s fire loss claim or for a burglary claim.

These data conflicts arise in many different industries, and are familiar themes for data warehouse designers. The conflict between the detailed transaction view and the monthly snapshot view almost always requires that you build both kinds of tables in the data warehouse. We call these the transaction views and monthly snapshot views of a business. Note that we are not referring to SQL views here, but to physical tables. The need to analyze the entire business across all products (lines of business in InsureCo’s case) versus the need to analyze a specific product with unique measures is called the “heterogeneous products” problem. At InsureCo, we first tackled the transaction and monthly snapshot views of the business by carefully dimensionalizing the base-level claims processing transactions. Every claims processing transaction was able to fit into the star join schema.

This structure is characteristic of transaction-level data warehouse schemas. The central transaction-level fact table consists almost entirely of keys. Transaction fact tables typically have only one additive fact, which we call Amount. The interpretation of the Amount field depends on the transaction type, which is identified in the transaction dimension. The Time dimension is actually two instances of the same dimension table connecting to the fact table to provide independent constraints on the Transaction Date and the Effective Date.

This transaction-level star join schema provided an extremely powerful way for InsureCo to analyze claims. The number of claimants, the timing of claims, the timing of payments made, and the involvement of third parties, such as witnesses and lawyers, were all easily derived from this view of the data. Strangely enough, it was somewhat difficult to derive “claim-to-date” measures, such as monthly snapshots, because of the need to crawl through every detailed transaction from the beginning of history. The solution was to add to InsureCo’s data warehouse a monthly snapshot version of the data. The monthly snapshot removed some of the dimensions, while adding more facts.

The grain of this monthly snapshot fact table was the monthly activity of each claimant’s claim against InsureCo’s insured party. Several of the transaction schema dimensions were suppressed in this monthly snapshot, including Effective Date, Employee, Third Party, and Transaction Type. However, it was important to add a Status dimension to the monthly snapshot so that InsureCo could quickly find all open, closed, and reopened claims. The list of additive, numeric facts was expanded to include several useful measures. These include the amount of the reserve set aside to pay for a claim, amounts paid and received during the month, and an overall count of the transaction activity for this claim. This monthly snapshot schema was extremely useful at InsureCo as a way to rapidly analyze the month-to-month changes in claims and exposure to loss. Monthly snapshot tables were very flexible because interesting summaries could be added as facts, almost at will. Of course, we could never add enough summary buckets to do away with the need for the transaction schema itself. There are hundreds of detailed measures, representing combinations and counts and timings of interesting transactions, all of which would be suppressed if we didn’t preserve the detailed transaction history.

After dispensing with the first big representation problem, we faced the problem of how to deal with heterogeneous products. This problem arose primarily in the monthly snapshot fact table, in which we wanted to store additional monthly summary measures specific to each line of business. These additional measures included automobile coverage, homeowner’s fire coverage, and personal article loss coverage. After talking to the insurance specialists in each line of business, we realized that there were at least 10 custom facts for each line of business. Logically, our fact table design could be extended to include the custom facts for each line of business, but physically we had a disaster on our hands.

Because the custom facts for each line of business were incompatible with each other, for any given monthly snapshot record, most of the fact table was filled with nulls. Only the custom facts for the particular line of business were populated in any given record. The answer was to separate physically the monthly snapshot fact table by coverage type. We ended up with a single core monthly snapshot schema, and a series of custom monthly snapshot schemas, one for each coverage type.

A key element of this design was the repetition of the core facts in each of the custom schemas. This is sometimes hard for a database designer to accept, but it is very important. The core schema is the one InsureCo uses when analyzing the business across different coverage types. Those kinds of analyses use only the core table. InsureCo uses the Automobile Custom schema when analyzing the automobile segment of the business. When performing detailed analyses within the automobile line of business, for example, it is important to avoid linking to the core fact table to get the core measures such as amounts paid and amounts received. In these large databases, it is very dangerous to access more than one fact table at a time. It is far better, in this case, to repeat a little of the data in order to keep the users’ queries confined to single fact tables.

The data warehouse we built at InsureCo is a classic example of a large data warehouse that has to accommodate the conflicting needs for detailed transaction history, high-level monthly summaries, company-wide views, and individual lines of business. We used standard data warehouse design techniques, including transaction views and monthly snapshot views, as well as heterogeneous product schemas to address InsureCo’s needs. This dimensional data warehouse gives the company many interesting ways to view its data.

About the Author: Ralph Kimball

Ralph Kimball

Ralph Kimball is the founder of the Kimball Group and Kimball University where he has taught data warehouse design to more than 10,000 students. He is known for the best selling series of Toolkit books. He started with a Ph.D. in man-machine systems from Stanford in 1973 and has spent nearly four decades designing systems for users that are simple and fast.

IMAGES

  1. What Is the Best Healthcare Data Warehouse Model?

    insurance data warehouse case study

  2. Using a Data Warehouse in Healthcare: Architecture, Benefits, and Use Cases

    insurance data warehouse case study

  3. Insurance Data Warehouse Model-Merged

    insurance data warehouse case study

  4. The Healthcare Data Warehouse: Lessons from 20 Years

    insurance data warehouse case study

  5. Data Warehouse Development for Healthcare Provider

    insurance data warehouse case study

  6. Data Warehouse for Insurance Industry

    insurance data warehouse case study

VIDEO

  1. Data, Insurance, and Safety in Autonomous Vehicles

  2. Data Driven X: Entering outcome economy with IoT-based insurance

  3. Estratégia para Montagem de Data Warehouse

  4. Insurance

  5. Business Intelligence: Data Warehouses

  6. Colliding factors impacting insurance in 2019

COMMENTS

  1. PDF CASE STUDY

    CASE STUDY Data Warehouse Accelerates Compliance, Powers Business Intelligence at Insurance Company Challenges An insurance company began building a data warehouse as the keystone in its MAR compliance program. After a rocky year and a half, the effort completely stalled and the company retained Princeton Consultants to get it back on track.

  2. Design and Implementation Data Warehouse in Insurance Company

    Use ninestep methodology as the method for design data warehouse and Pentaho as a tool for ETL (Extract, Transform, Load), OLAP analysis and reporting data, the results of this research concluded that the implementation of data warehouse for perform data analysis and reporting better and more effective. Export citation and abstract BibTeX RIS.

  3. PDF Picturing Success: Nationwide's Data Warehouse

    data warehouse implementation benefited from previous "warehousing" attempts and the team's successful partner-based approach to providing the "authority source" for enterprise information. • This TowerGroup Research Note is a case study that explores why Nationwide Insurance has been successful in its implementation

  4. Harnessing the potential of data in insurance

    In September 2016, AIG and Hamilton Insurance Group announced a joint venture with hedge fund Two Sigma to form Attune, a data and technology platform to serve the $80 billion U.S. small and midsize commercial insurance market. Through Attune, the companies are seeking to transform the small commercial segment by harnessing data, artificial intelligence capabilities and advanced modeling ...

  5. PDF SUGI 27: The Benefits of Data Warehousing for an Insurance Company

    The stock-portfolio data for life and property and casualty insurance, all accounting entries (e.g. premiums, benefits, commissions, expenses) and the data concerning our sales organization and customers are regularly infiled. The data from life insurance is completely transformed from the ODD-level to the Central Warehouse.

  6. Case Study

    An event-driven Data pipeline system is scheduled to initiate on new data arrival. Data transformation and consolidation into an Azure Cloud-based data warehouse. Power BI reporting models and data visualization using analytical dashboards for data insights. Shareable data reports in multiple formats. An end-to-end automated solution to save ...

  7. Using IBM Insurance Information Warehouse & Big Data to Augment a Data

    data warehouse. This combined data resource is called the logical data warehouse. As with all complex databases it is vital that a single data model is used to understand, access and govern. IBM Insurance Information Warehouse is ideally suited to this task of data warehouse augmentation as it is implementation neutral and can be deployed on a ...

  8. PDF Customer data management in practice: An insurance case study

    in practice: An insurance case study Received (in revised form): 28th April, 2003 Edith N. Pula ... data warehouse and marketing database build, data analysis and data mining for customer segmentation and scoring, profitability and risk management (including Basel II) and integrated customer campaign communications, including contact ...

  9. Building Data Warehouse at life insurance corporation of India: A case

    A Data Warehouse is the main repository of an organisation's historical data, its corporate memory. This paper presents a case description into the new IT strategies in general and Data Warehouse ...

  10. 15. Insurance

    The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by. Chapter 15. Insurance. We will bring together concepts from nearly all the previous chapters to build a data warehouse for a property and casualty insurance company in this final case study. If you are from the insurance industry and jumped directly to this chapter for a ...

  11. Data Warehousing for Insurance Reporting and Analytics

    The data warehouse has the highest adoption of data solutions, used by 54% of organizations. (Flexera 2021) Data Warehousing for Insurance: Creating a Single Source of Truth. Insurance companies generate and receive large amounts of data from various business functions and subsidiaries that are stored in disparate systems and in a variety of ...

  12. Enabling Advanced Insurance Analytics and Reporting

    The 7 Advantages of a Data Warehouse for Analytics and Reporting. Empowers everyone throughout the organization with the information they need to make correct decisions. Enables self-service analytics for diving deeper into the data. Illuminates data and business process issues that need improving.

  13. Building Data Warehouse at life insurance corporation of India: a case

    A well implemented Data Warehouse itself can bring manifold benefits to an organisation. A Data Warehouse is the main repository of an organisation's historical data, its corporate memory. This paper presents a case description into the new IT strategies in general and Data Warehouse in particular adopted by Indian insurance giant Life ...

  14. IBM Insurance Information Warehouse

    IBM Insurance Information Warehouse is an industry blueprint that provides business vocabularies, data warehouse design models and analysis templates. It helps to accelerate the development of data architecture, data governance and data warehouse initiatives. The IBM Insurance Information Warehouse data model provides a comprehensive, scalable ...

  15. Case Study

    Success story of a multinational insurer: Automated technical closure, data warehouse, and reporting to reduce operational costs, manual interventions, and operational risks. Learn how we can optimize your business processes. ... Case Study | Multinational Insurance Company Automation of the technical completion.

  16. Good practices for clinical data warehouse implementation: A case study

    Author summary Reusing routine care data does not come free of charges. Attention must be paid to the entire life cycle of the data to create robust knowledge and develop innovation. Building upon the first overview of CDWs in France, we document key aspects of the collection and organization of routine care data into homogeneous databases: governance, transparency, types of data, data reuse ...

  17. Data Warehouse for Insurance Industry

    The first step in implementing data warehouse is to identify the data that needs to be stored and analysed. In the insurance industry, this could include data on policyholders, claims, premiums, underwriting, and risk assessment. There are many data warehouse tools available, and each has its strengths and weaknesses.

  18. Real-time Access To Insurance Claims Data Insights Case Study

    This leading medical insurance company is a reputable organization with a long-standing history of providing comprehensive medical insurance to clients all over Australia. The company has a team of highly experienced professionals and maintains local offices throughout the country. In 2020, the company engaged Altis Consulting to provide a detailed architecture plan for an implementation […]

  19. PDF Big Data Analytics in Motor and Health Insurance:

    data such as Internet of Things (IoT) data, online data, or bank account / credit card data in order to perform more sophisticated and comprehensive analysis, in a process that is commonly known as 'data enrichment.' The data used by insurance firms in the different stages of the insurance value chain may include personal data. 3 (e.g. 2

  20. The Data Warehouse Toolkit

    The goal of this book is to provide a one-stop shop for dimensional modeling techniques. The book is authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence. The book begins with a primer on data warehousing, business intelligence, and ...

  21. Data Warehouse Insurance

    The design of a big insurance data warehouse must deal with several issues common to all insurance companies. This month, I use InsureCo as a case study to illustrate these issues and show how to resolve them in a data warehouse environment. InsureCo is the pseudonym of a major insurance company that offers automobile, homeowner's, and ...

  22. PDF European Insurance Company Case Study

    Case Study. European Insurance Company. 2. the encryption solution had to work with the . insurer's data integration software, Informatica . PowerCenter. This is used to extract data from . various sources, transform it, and load it into a new system, such as a data warehouse. Solution. To address these challenges, the insurer turned

  23. Data Warehouse

    Based on my prior experience as Data Engineer and Analyst, I will explain Data Warehousing and Dimensional modeling using an e-Wallet case study. — Manoj. Data Warehouse. A data warehouse is a large collection of business-related historical data that would be used to make business decisions.