Ditching Excel for Python – Lessons Learned from a Legacy Industry Ditching Excel for Python – Lessons Learned from a Legacy Industry
I’ve been in a reflective mood lately, probably because it’s the end of a very eventful year. Change is all around... Ditching Excel for Python – Lessons Learned from a Legacy Industry

I’ve been in a reflective mood lately, probably because it’s the end of a very eventful year. Change is all around us and technology is the enabler.

On that note, I’ve been thinking a lot about my own experiences with technology-driven change in the reinsurance industry, where I worked as an analyst from 2017 until recently.

During these three short years, I observed a radical shift in data analysis methodologies. Excel-based models, which had seemed top-of-the-line suddenly were too slow and too rigid; Integration with 3rd party data sources, which was once a luxury, became the norm; And analysts began to utilize scripts to accomplish many labor-intensive tasks typically performed by hand or in spreadsheets.

Enabling this change is a suite of accessible Python-powered tools. These technologies are rapidly displacing the old way of doing things, ushering in a new wave of reinsurance models and the talent needed to support them.

Admittedly, the following observations do come from a very niche industry. But I believe that the broader trends discussed here will also apply to other legacy companies and sectors.

In this post, I explore the following topics:

  • Section 1 – An introduction to reinsurance
  • Section 2 – The need for technological disruption in the reinsurance industry
  • Sections 3 to 5 – How Python is replacing Excel
  • Section 6 – Arguments against adopting new technologies
  • Section 7 – The evolution of the reinsurance industry

If you are curious about the transition from Excel to Python but aren’t interested in reinsurance, skip to section 3.

Without further ado, let’s get into it.

1. Reinsurance, from an analyst’s perspective

Most people understand insurance, but few have heard of its rather exotic-sounding cousin: reinsurance. So, before I begin, here’s a quick intro:

Just as individuals transfer risk to insurers in exchange for a premium, insurers too must mitigate their risk by transferring some or all of their liabilities to one or more reinsurance companies. This enables insurers to protect themselves in the case of a large event, such as a hurricane, which affects many individual policyholders. Without reinsurers, insurance companies would not be able to survive catastrophes and would be much more risk averse, ultimately impacting the end consumer.

Reinsurance programs come in a variety of flavors to suit the need of the reinsured. I’ll spare you the details but suffice it to say that all potential reinsurance programs must first be analyzed. In industry speak, this is the process of “pricing a deal”.

For each deal, brokers gather relevant data from the insurance companies and send it to potential reinsurers as a “submission”. It is the underwriting analyst’s responsibility to parse this data, enter it into Excel-based models and produce a number.

This number, the loss ratio, represents the expected loss divided by the total premium and typically ranges between 10% to 150+%. LRs above 100% mean that the premium is not sufficient to cover the losses that we expect based on our model. In other words, we’d be writing the deal at a loss.

In reality, reinsurance companies have pricing hurdles well below 100% because they have other money outflows to account for such as brokerage (intermediary fees) and G&A (general and administrative expenses).

The point is, Excel is the key piece of software that enables underwriting teams to assess new business. Throughout the company, different teams use different models, each with different processes and different pricing hurdles. But the one common theme is that all of the models are based on Excel spreadsheets. That’s just how the industry works. Everyone uses Excel for everything.

Well, until recently.

2. Change is coming

In an industry that is notoriously behind the times, a fundamental change is underway. This change is in response to new business demands, including:

  1. The desire to price increasingly complex deals with increasingly large datasets
  2. The desire to integrate with 3rd party data to augment pricing models
  3. The desire for faster models and robust codebases that adhere to software development best practices

Excel fails miserably at these tasks.

First, expressing complex reinsurance deals requires tabs and tabs of structural details, interlinking formulas and limiting assumptions. It’s a mess to build and a headache to understand. Have a lot of data? That’s more tabs and formulas to add to the model. Performance slows dramatically and minutes of your life are wasted waiting for bloated spreadsheets to finally open.

Second, you’re forced to use the VBA programming language if you want to query external data or implement user-defined functionality. According to one Stack Overflow survey, VBA ranked “most dreaded” of all the languages, which wholly sums up my feelings. The development environment is not user friendly, the syntax confusing, there’s no support for unit testing – I could go on.

Third, Excel is fragile. Views and logic are intermingled, making it very hard to troubleshoot errors. Also, models aren’t put into source control, so you can’t restore an old version. Massive spreadsheets with dozens of tabs frequently crash without warning, much to the dismay of underwriting analysts (like myself).

During my three years in reinsurance, I witnessed a dire need for new tooling. And, from what I observed, it is clear that this tooling will be powered by Python.

3. The future is Python

As a technical underwriting analyst with programming skills, I was among the first to build and benefit from what I think are the next generation of pricing tools. These tools fall into two main categories:

  • Excel models running Python code
  • Python notebooks in the browser

Both options offer enhanced analysis capabilities while still catering to a non-technical or semi-technical end user.

Now, let’s explore each technology in more detail.

4. Excel models running Python code

The first category of new technology is based on augmentation. Rather than adding VBA code to your Excel spreadsheet, you can use Python instead. This is possible thanks to 3rd party Excel add-ins such PyXLL, which brings the power of Python data analysis into the humble spreadsheet.

When you open Excel, the add-in makes your Python code accessible either as functions or as macros. This means that you can write Python code in the development environment of your choice, add unit tests, integrate with 3rd party libraries and push to source control.

PyXLL facilitated a step change in how we construct reinsurance models. Specifically, here’s how our setup looked before and after augmenting our model with Python:

Before Python After Python
We used an Excel spreadsheet-based model with dozens of tabs containing complex formulas, endless pivot tables and unintelligible VBA code. The tangled mess of VBA was re-written into independent Python modules, each of which performs a distinct function.
The spreadsheet took 30+ seconds to open. The Excel spreadsheet is now just a view, so it is lightweight and quick to open.
There was no visibility into errors. “Debugging” meant manually breaking apart lengthy Excel formulas. The new model logic is testable and can be upgraded independently.
We were limited by the assumptions made when the model was first created, with little ability to add new features. The codebase lives in source control, enabling multiple collaborators to work in tandem.
Developers can use their preferred development environment.
A robust peer review process helps to perpetuate high quality code.

Using PyXLL, we built an underwriting pricing model powered entirely by Python code. We were able to express structural complexities of deals at a level never before achieved. And we were able to add simulation-based pricing techniques that simply cannot be done using spreadsheets alone.

The impact of PyXLL on how we build models and ultimately price business cannot be understated.

Which leads us to the second technology that is changing the way we perform underwriting data analysis…

5. Python notebooks in the browser

The second novel approach to reinsurance risk assessment and pricing is enabled via Python notebooks. These are web-based documents that can contain narrative text, live code, charts and visuals, plus results from your data analysis.

Python notebooks are all about collaboration, visualization and rapid iteration. They are a way to execute code with minimal setup and overhead, making it easier for new programmers to explore data outside of Excel. One of my favorite parts of notebooks is how easy it is to access your data, whether that’s from a file, SQL databases or an external source.

Let’s look at what a data analysis project looked like for me with and without Python notebooks:

Without Python Notebooks With Python Notebooks
Manually query SQL databases and copy data into Excel. Keep data fresh by re-querying SQL and copying in new data. Drop in a csv file, connect directly to an internal SQL database or utilize a 3rd party API to retrieve external data.
Perform analysis using Excel functions, pivot tables and VBA code. Functions quickly became unwieldy and difficult to troubleshoot. Manipulate data easily and rapidly using Python libraries like Pandas.
Collaborate with a colleague by closing Excel and letting your colleague know that the spreadsheet is available for them to work on. Collaborate with colleagues by sharing the notebook.
Copy tabular results and charts into Word or Powerpoint and add written narrative as necessary. Export results to csv or generate professional-looking PDFs to give directly to key stakeholders.
Add explanatory narrative directly to the document alongside related code blocks.
Create interactive charts and static visuals using libraries like Plotlyseaborn or matplotlib.
Leverage existing libraries like SciPy or scikit-learn to perform more advanced data analysis.

My first experience with notebooks was in the interactive development environment created by JupyterLab. Nowadays, the term “Jupyter Notebook” has become synonymous with web-based notebook applications (think: Kleenex for notebooks).

More recently, I discovered Deepnote, which I now recommend before Jupyter due to several key features, including real-time inline collaboration, a built-in variable explorer and interactive plots. Deepnote’s integrations with other 3rd party services, such as GitHub and S3, are also very handy.

All of this to say: Python notebooks are enabling a level of rapid data analysis that dwarfs the power of Excel spreadsheets. Moreover, they are beginner friendly. I’ve personally witnessed how an interested business user started writing code in a notebook and, within several months, was coding in his own local environment.

6. Things to consider before adopting Python

But, despite obvious improvements to pricing capabilities, not everyone is rushing to adopt Python-enabled technologies. This lack of enthusiasm and stubbornness to innovate stems from two related grievances:

  1. We don’t have enough resource. There are simply not enough underwriters with requisite programming skills and not enough in-house staff with software development backgrounds to support them.
  2. We don’t want to lose control over model development. Without sufficient programming resource, underwriting teams must rely on outside help, either from other teams or external contractors. This added step in the model development chain is extremely unappealing for individuals who are used to the immediacy of an Excel/VBA workflow.

At the end of the day, making a change in an Excel sheet is easy; understanding formulas is achievable; but learning to code is hard. Time and again, I observed how swapping out a familiar Excel spreadsheet for a development environment with Python immediately caused observers’ eyes to glaze over.

Unfortunately, the situation is a catch-22.

Lack of understanding about the technology means that there’s a resulting lack of confidence and buy-in at the management level. Without buy-in, there isn’t a coordinated effort to hire or train the resource needed to support the technology. Add legacy systems and bloated models to the mix, and it’s no surprise that reinsurance professionals have developed a protective incompetence for learning and adopting new technologies.

Will there eventually be a breakthrough? I think so. In my opinion, the reinsurer who embraces technological change will lead the industry into a new era of underwriting.

7. Where the industry goes from here

For decades, the reinsurance industry has performed data analysis using Excel. Custom spreadsheet-based models were the norm, with underwriters relying on their output to decide whether or not to write a deal.

During my time as an underwriting analyst, I witnessed the beginning of a transition away from Excel and toward new Python-based tools, including Excel add-ins such as PyXLL and Python notebooks such as Deepnote. These tools enable enhanced data analysis capabilities, performant tooling, robust codebases and collaborative development environments. The end result is that underwriters can make better decisions, faster.

Let’s now imagine how the reinsurance industry might evolve as a result of this step change in technology.

First, the deals: I predict that reinsurance structures will become increasingly complex as our ability to price them increases.

Rather than relying on approximations in Excel spreadsheets, Python-based models will be able to account for the nuance of complicated structural terms. This will enable underwriters to more accurately predict whether a deal will be loss making. Likewise, brokers will get better at structuring coverage to fit the exact needs of their clients; this will result in more bespoke reinsurance programs that, in turn, will require more bespoke pricing techniques.

Second, the people: I believe that, in the future, all high-performing underwriters will need to know the basics of programming.

Business users with an appreciation for software development are a dangerous breed who understand the application as well as the implementation. They are rare now, but they won’t be for long. I am confident that, as easily accessible Python-based tools deliver new capabilities to the underwriting function, programming skills will become a requirement for all new hires.

Lastly, the industry: I foresee a reinsurance industry that leverages technology for rapid risk transfer.

Already, high-touch interactions such as broker meetings are being replaced by self-service portals and always-live portfolio dashboards. Data cleaning scripts and automation tasks enable nearly end-to-end modeling – no humans needed! Soon, submissions will be processed automatically, freeing up underwriting analysts to review key details and perform more in-depth data analysis.

And, as someone who spent three years with my head in an Excel spreadsheet, I can only say – it’s about bloody time!

Originally posted here. Reposted with permission.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.