Opening a data set for the first time can be challenging and exciting. Turning raw data into insights is part of what I enjoy most about analytics, but it can also be overwhelming. Where should you start? Which analyses should you run? How do you ensure that what you do has a tangible business impact?
[Related Article: Why Fast Data Matters for Business]
A strong initial analysis doesn’t necessarily give you the answers, but it informs which questions you should be asking next.
The data set deals with agency performance for a set of property and casualty insurance agencies. The data contains, among other things, a list of agencies over the periods 2005–2015, their premiums by product, and losses incurred by product.
To have a business impact, it’s important to understand the context of the data and the metrics that are most relevant to the business. For insurance data, there are two key considerations that will help guide where we start the analysis.
First, insurance focuses on risk mitigation—their ability to collect more premium than is paid out is what keeps them in business. A metric of interest is, therefore, the loss ratio, which is approximately calculated as the total losses divided by the premiums written. Developing an understanding of what impacts the loss ratio positively and negatively can inform product strategy and pricing, among other areas.
Second, most insurance is sold through brokers or agents rather than directly from the insurer. Beyond the product, then, understanding the composition and performance of the agencies is another analysis that can drive a quick impact. Finding the traits among top performers that can be generalized to the broader agency base can improve overall efficiency.
Further, identifying the types of agencies that are delivering the best value to the insurer can inform future partnership strategy. All businesses are limited by time and resources. Data can ensure that the time and resources available are focused in the right places.
With the above in mind, and seeing this data with no additional context on the business or its goals, I have 3 key questions that I would want to answer:
- What are the variables that correlate to a higher (or lower) loss ratio?
- How does the agency base segment by value?
- How are the agencies performing relative to their potential?
The collective answer to these three questions informs which products to focus on, the agencies that are delivering the most (and least) value, and on which agencies to focus going forward.
To start, trended over time, the loss ratio steadily grows over the early years followed by a stabilization.
An interesting early finding for me was that if you remove just the top 1% of losses in the data set, the loss ratio drops by over 30%. Removing the top 5% of losses drops the loss ratio by nearly 70%, and removing the top 10% of losses yields a loss ratio of just 2%. Said another way, the top 10% of losses account for ~96% of the overall loss ratio.
This indicates to me that the losses are concentrated on a small subset of products or agencies. Understanding which products and agencies are driving the losses will be key to writing more profitable policies in the future.
From a product perspective, we can see that Homeowners Insurance should be the first area for further investigation. It’s the second-largest category by net written premiums and also carries one of the highest loss ratios. My hunch is that it’s a more competitive market, which inhibits any pricing power that could better price in risk, but this would be an area for more research.
Rolling up the data to its product line, and removing the top 1% of outliers, the loss ratio for Property insurance products overall is ~50% higher than that of Commercial insurance, potentially driven by Homeowners.
The takeaways and implications from the initial exploratory analysis are:
- A minority of losses drive a majority of the loss ratio. Diving deeper on the products, pricing, and general underwriting of the policies that experienced these losses to understand what they have in common could yield valuable insights.
- Homeowners Insurance is a clear area for further investigation to understand why its loss ratio is higher despite high net premiums written (which should distribute risk).
- Property insurance is riskier than commercial insurance. Over time, striking the right balance of business that weighs toward commercial could be beneficial.
Now that we have a better understanding of performance and the underlying drivers of performance, we’ll analyze the agencies themselves.
Clustering is a technique that seeks to find a natural grouping among the data. It can be useful in creating a more nuanced story—not everyone in a data set acts similarly, and clustering can help identify the groups within the data that do.
After testing different clusters I settled on five as the optimal number, and the following visualization shows how the clusters differ on the key metrics of interest within the data set:
From this analysis, I would classify the clusters as follows:
- Cluster 1: Stable, moderate size, well-performing
- Cluster 2: Moderately growing, large, well-performing
- Cluster 3: Declining, very small, significantly under-performing
- Cluster 4: Growing, midsize, well-performing
- Cluster 5: Declining, small, moderately under-performing
From a business perspective, I have a few takeaways from the above charts and analysis.
First is that despite having ~1,600 agencies in the data set, only ~60% of those matter to the business. Quality of relationship is more important than the quantity of partners.
Next is that, while the loss ratio in cluster 3 is pretty alarming, the average size of the segment and its total premiums written means that it’s not actually having an outsize impact on performance. My recommendation here would be to consider discontinuing the partnership with most of the agencies; the value (or lack thereof) that they are providing is unlikely to be worth the revenue they’re generating and the business would see little impact if most were to no longer be agency partners.
My third takeaway is that we should be looking at the agencies in cluster 4 to understand why they have doubled in size over the time period. The current data set has its limitations, but possible explanations are that these agencies were growing their agent count while others weren’t or they’re serving specific industries (or local communities) that were faster growing coming out of the recession. One explanation that I ruled out is that there’s a specific set of products that they offer at a higher frequency than most other segments; there was no evidence in the data to indicate that this was true.
Finally, I would want to understand why clusters 1 and 2 have a similar average number of agents, agencies in the cluster, and loss ratio, yet cluster 2 is delivering 2x the premiums compared to cluster 1.
For clusters 1 and 2, we find that the typical agency in cluster 2 is selling 4 additional products compared to cluster 1. Of those, one product that is more likely to be sold by cluster 2 is among the highest by average written premium revenue. Further qualitative information (calls with agencies in each cluster to gain a better understanding of their business, for example) would be a helpful place to start.
The last analysis will seek to measure the efficiency of each of the agencies as it relates to their premium revenue. The analysis quantifies how an agency is performing against its peers given the resources available to them. Here, we consider the resources to be the number of producers (agents) at an agency and total unique products offered; the revenue from premiums written in 2014 is the output.
To illustrate, let’s imagine that there are only two agencies that both had 5 producers and 10 products sold. The first agency had $1M of written premiums and the second had $500K. We would consider agency 1 to be operating at 100% efficiency and agency 2 at 50% efficiency because, given the same inputs, agency 2 is producing only 50% of the output.
While no cluster is necessarily performing efficiently, it’s unsurprising to see directionally how the results map to the revenue performance seen earlier. Clusters 3 and 5, which were declining, are operating at lowest efficiency.
Cluster 2 is operating at more than 2x the efficiency of cluster 1, which could help to explain why their revenue was significantly higher. In effect, all else equal, their agents are doing a better job of selling than those of cluster 1, and they typically offer more products, as evidenced by the earlier analysis.
Next steps for this analysis are:
- Identify the common characteristics of the high-efficiency agencies—What are they doing differently? Which products are they selling? How are they training their producers/agents?
- Target the under-performing, high-potential agencies. Which agencies have the most potential for growth based on a number of agents or past performance? Which are already generating strong revenue (and therefore good partners) and have the potential to increase in size?
In the equivalent of ~1 day of work, we can quickly understand the factors that lead to better and worse outcomes (as measured by the loss ratio), the agencies that are driving the majority of revenue and are most valuable to the business, and the runway within the current base for improved efficiency.
We’ve also identified the areas within the data that warrant further research. Why does Homeowners Insurance, the second-largest category by premiums, have such a high loss ratio? Why are the agencies in cluster 2 significantly outperforming those in cluster 1? Why is cluster 4 growing faster than other segments? What are the high-efficiency agencies doing that leads to out-performance? Is this scalable to other agencies?
[Related Article: How to Integrate Design Thinking Into Your Business]
As next steps, there could be a discussion with a broader team on the implications of these findings and the prioritization of which research areas to focus on first that will drive the most value to the organization.
Originally Posted Here