Why Open Source Integration is Key to Success in the Era of Analytics Heterogeneity Why Open Source Integration is Key to Success in the Era of Analytics Heterogeneity
Up until recent years, companies were mainly focused on how analytics work and how the model building was being done. For... Why Open Source Integration is Key to Success in the Era of Analytics Heterogeneity

Up until recent years, companies were mainly focused on how analytics work and how the model building was being done. For many organizations, it was largely a lab exercise to see if there is value in data science. 

Now, as that has become more certain and organizations are making a lot more investment in analytics, we very quickly moved to creating a lot more models, considering them for a lot more use cases, and highly increasing our expectations from analytics. We can create these models by using literally hundreds of different tools. Open source integration is key to this effort.

We live in the era of Analytics Heterogeneity

When doing analytics, a typical organization uses multiple tools, methodologies, algorithms, and solutions to get the job done. And the ecosystem of tools available is diverse, fast-growing, and rapidly changing.

Every year, new frameworks and technologies, mainly open source, enter the market in different areas. The figure below shows how open source is prevalent throughout the technological landscape.

Figure 1- Open Source Ecosystem. Please note that this is not an exhaustive list. There are tons out there!

I like to define it as the era of Analytics Heterogeneity – where analytics is not limited to one single methodology, tool or algorithm, but is able to leverage the full potential of the fast-growing and rapidly changing ecosystem of analytical solutions and technologies available. 

Data Scientists develop models using a diverse selection of interfaces, algorithms, and tools. Similarly, IT leaders adopt a variety of different environments and paradigms in which to execute analytics—on-premise, in the cloud, hybrid, via APIs, real-time, in-database, on the server, on the edge, the list goes on!

The byproduct of heterogeneity on analytics value chain creation: a higher need for integration and governance

Heterogeneity brings innovation and drives creativity. And this is true when it comes to analytics as well. Using the tool of choice, the favorite programming language, the preferred package of functions is key.

However, as every economic theory teaches, the downside of heterogeneity is the complexity and a higher demand for integration, governance, and centralization. Otherwise, the result will be chaos, entropy, and missed business opportunities.

In fact, if we have a look at the data from the market, despite all their investments, organizations are struggling to manage the huge range of solutions available and, consequently, creating value from analytics.

Analyst firms estimate that only 35% (IDC) to 40% (Gartner) of models are fully deployed. And SAS research discovered that 44% of models take over seven months to deploy. Too few models get into production, and for those that do, it takes too long to turn them into business value.

SAS and open source integration: an end-to-end seamless analytics experience

As we enter the era of Analytics Heterogeneity, where we can potentially use many more technologies, frameworks, and tools to build our models, we need to update the capabilities to meet the new expectations we have from analytics. 

We expect our models to make decisions for us, tell us the best recommendation, do it in real-time, and to interact with us. And this is what we define as value. Not just building the most accurate model, but being able to interact with it, through a mobile app, web page, product or service. That is where value is today.

Value is possible only if the open-source potential is combined with capabilities like scalability, reliability, integration, automation, versioning, monitoring, orchestration, and flexibility.

And therefore integration of the two technologies, SAS and open source, is needed. The goal is not to replace open source, but to extend its interoperability and utility for the enterprise, allowing the user to have a seamless analytical experience throughout the different stages of the analytical process.

Figure 2- Benefits of SAS and Open Source Integration

Integration across the model lifecycle

As I mentioned earlier, one of the biggest challenges that organizations are facing is putting machine learning (ML) into production. While developments in ML and AI keep touching the edges of technology, the burning question remains: how to bring those ideas in action, and how to keep them alive?

This process is usually referred to as Model Lifecycle Operationalization (or ModelOps).

Specifically, getting inspiration from DevOps in Software Development, ModelOps is a holistic approach for rapidly and iteratively moving models through the analytics lifecycle so they are deployed faster and deliver expected business value. It helps organizations to

  • Scale the use of analytics for real-time decision-making
  • Operationalize their analytics, i.e. to take models from development to production effectively
  • Facilitate the collaboration between the different functions involved in managing analytic models as a corporate asset (data science, business units, IT & operations).

A detailed overview of this topic is provided by Véronique Van Vlasselaer, Data Scientist at SAS, in her breakout session at ODSC East. She talks about the often-forgotten steps after model development, focusing on what is required to turn ML and AI into value, how to manage and govern machine learning models once they run in production, and how model management & governance helps ease the process to keep ML alive.

Another key resource strongly recommended if you want to get hands-on with ModelOps is the new Interactive Guide “Mastering Model Lifecycle Orchestration” – an interactive blueprint for technical experts looking to master the model lifecycle orchestration. It has practical examples and how-tos for each step of the model lifecycle, from model registering to deployment/retraining, including how to automate tasks and create custom workflows that match business requirements and processes.

Also, follow the SAS Model Management page to check the latest resources on ModelOps.

The impact on a data scientist’s job

How is this going to impact the skillset required for data scientists entering the job market?

Having model operationalization skills is being requested more and more by companies and this is something that Dr. Iain Brown will talk about in his keynote session “Data Science change is inevitable, growth is optional”.

Which skills are now valued by employers, how should data scientists best invest their time and what are the keys to unlocking a successful career? Drawing from first-hand experience as a practitioner, a tutor, and now a leader in the field over the last decade, Iain will shine a light on what is required to succeed in an evolving discipline.

How ModelOps was key during Covid19 – stories from the healthcare industry

As COVID-19 struck, one of the most critical decisions governments and healthcare providers around the world faced was how to best allocate limited medical resources, including an intensive care unit (ICU) beds and ventilators. Not that this was a new issue – providers are always managing the fickle, sensitive balance between supply and demand for medical resources. But the pandemic threatened to upend everything, on a scale never previously encountered.

In this article, Steve Bennet talks about how healthcare leaders deployed analytical model when the crisis hit.  For example, Cleveland Clinic created a range of models that help forecast patient volume, bed capacity, ventilator availability, and more. The models – freely available via GitHub – provide timely, reliable information for hospitals and health departments to optimize health care delivery for COVID-19 and other patients and to predict impacts on the supply chain, finance, and other critical areas.

Follow SAS for Open Integration and our YouTube channel to always be updated on the latest and greatest ModelOps skills/features.

By Marinela Profi, Global Product Marketing at SAS

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.