The Future of Data Engineering Goes Through Data Contracts The Future of Data Engineering Goes Through Data Contracts
Editor’s note: Jean-Georges Perrin is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Building... The Future of Data Engineering Goes Through Data Contracts

Editor’s note: Jean-Georges Perrin is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Building Data Contracts with Open Source Tools,” there!

Data engineering is a critical function in all industries. However, data engineering grows exponentially as the company grows, acquires, or merges with others.

Why is that? The number of data sources (whether internal or external) is growing. The number of use cases is growing. The maintenance to keep existing things “alive” is not going down; at best, it stays constant. Many companies I talk to end up with 50,000 to 100,000 pipelines, which magically work with a magical set of non-the-less magical technologies.

One solution is to bring more tools to the rescue, but the promise is often like “use my tool, and it will be better in the future.” The magic of the fabric, factory, or modern data engineering assumes you become a mono tool for the rest of your existence. What a coincidence; the message from this handful of vendors is the same and often associated with their cloud.

Another solution is Data Mesh, as proposed by Zhamak Dehghani (Data Mesh Principles and Logical Architecture). The brilliant solution is pretty disruptive, and although I am a great believer in Data Mesh, particularly after the successful implementation at PayPal (The next generation of Data Platforms is Data Mesh), it involves a lot of changes at the organizational level. Changes that not all companies are willing to make, mainly due to the youth and limited return at this time.

Here comes the data contract. Although the term is relatively new, the concept is old. If you remember the CASE tools from the early 90s, software engineers back then were using something that could be considered a data contract today.

Data contracts define a link between the data producer and one or more consumers of the data. It also links a logical world, dear to architects, and an implementation world, loved by engineers. The richness of the contract enables us to get value extremely quickly through simplifying discovery, keeping documentation up to date, and using targeted tools. There are many more benefits to the contract as their number grows and they combine into data products while staying agnostic to the existing infrastructure and tooling.

While at PayPal, we released a data contract template, which became so popular we had to bring it to a larger group. The template is now called Open Data Contract Standard (ODCS, not to be confused with ODSC). The standard is part of the Bitol project hosted by the Linux Foundation AI & Data and AIDA User Group. The project planned to expand normalization beyond data contracts, but let’s start with the beginning.

I have been elected and am honored to chair the technical steering committee (TSC), a great group of data contract users, vendors, and service providers, all motivated by making data engineering easier and better.

I am a big fan of open source, but I am an even bigger fan of open standards. In my talk, I will explain how to get started with data contracts and show you some of their power.

About the Author:

Jean-Georges “jgp” Perrin is the chief innovation officer at AbeaData, focusing on building innovative and modern data tooling. He is also chair of the Open Data Contract Standard (ODCS) at the Linux Foundation project Bitol, president of AIDA User Group, and author of multiple books, including Implementing Data Mesh (O’Reilly) and Spark in Action, 2nd edition (Manning). He is passionate about software engineering and all things data. His latest endeavors bring him to more and more data engineering, data governance, industrialization of data science, and his favorite theme, Data Mesh. He is proud to have been recognized as a Lifetime IBM Champion. Jean-Georges shares over 25 years of experience in the IT industry as a presenter and participant at conferences and publishing articles in print and online media. His blog is visible at http://jgp.ai. He enjoys exploring Upstate New York and New England with his wife and kids when not immersed in tech, which he loves.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.