10 Important Topics Featured at the 2024 Data Engineering Summit 10 Important Topics Featured at the 2024 Data Engineering Summit
Conferences aren’t just talking heads in front of podiums at venues; they’re representative of the trends, topics, and problems that are... 10 Important Topics Featured at the 2024 Data Engineering Summit

Conferences aren’t just talking heads in front of podiums at venues; they’re representative of the trends, topics, and problems that are relevant to everyday life. At the Data Engineering Summit, co-located alongside ODSC East from April 23rd to 24th, we’ll be examining several important topics that will help guide your data engineering team to success. As such, here are ten important topics that will be covered at the Data Engineering Summit this April.

Is Gen AI A Data Engineering or Software Engineering Problem?

Generative AI isn’t solely a data engineering or software engineering problem, but rather a collaborative effort requiring both. Data engineers prepare the training data, while software engineers design and build the models, making generative AI a two-pronged approach. Teams will have to decide what aspect of the generative AI pipeline to tackle so it doesn’t become an everybody problem!

Related Session: Is Gen AI A Data Engineering or Software Engineering Problem?: Barr Moses, Co-Founder & CEO at Monte Carlo

Data Infrastructure

Data engineering teams face headaches like wrangling data from various sources into a usable format, scaling systems to handle growing data volumes, and ensuring data security and compliance.  They also battle technical debt from past shortcuts and maintaining data quality to avoid unreliable results.

Related Session: Data Infrastructure through the Lens of Scale, Performance and Usability: Ryan Boyd, Co-founder of MotherDuck

Foundation Models

Foundation models are game-changers in AI. Trained on massive, diverse data, they’re like super-powered, adaptable AI tools. Unlike single-use models, they can be fine-tuned for many tasks, from language stuff to image generation. Their power is pushing the boundaries of what AI can do.

Related Session: From Research to the Enterprise: Leveraging Foundation Models for Enhanced ETL, Analytics, and Deployment: Ines Chami, Co-founder and Chief Scientist at NUMBERS STATION AI

Data Contracts

A data contract is like a handshake for data exchange. It clarifies between provider and consumer: what the data looks like (format), what it means (definitions), how good it is (quality), and how it’s delivered (frequency, access). It ensures everyone speaks the same data language.

Related Session: Building Data Contracts with Open Source Tools: Jean-Georges Perrin, CIO at AbeaData

Semantic Layers

A semantic layer simplifies data analysis by translating complex data structures into business terms and presenting a unified view from various sources. This empowers users and fosters data-driven decisions.

Related Session: The Value of A Semantic Layer for GenAI: Jeff Curran, Senior Data Scientist at AtScale

Unstructured Data

Unstructured data is information that doesn’t fit neatly into a pre-defined format like a spreadsheet. Imagine it like a big pile of documents, emails, videos, and social media posts.  While valuable, this data can be messy and difficult for computers to analyze directly.

Related Session: Unlocking the Unstructured with Generative AI: Trends, Models, and Future Directions: Jay Mishra, Chief Operating Officer at Astera

Monolithic Architecture

In software development, a monolithic architecture is a traditional approach where the entire application is built as a single, self-contained unit. Imagine a massive, monolithic rock – everything is tightly coupled and inseparable. This includes the user interface (what you see and interact with), the business logic (the core functionalities), and the data storage (where information is kept).

Related Session: Data Pipeline Architecture – Stop Building Monoliths: Elliott Cordo, Founder, Architect, and Builder at Datafutures

Experimentation Platforms

An experimentation platform is a tool for running A/B tests on websites, apps, or marketing campaigns. You create variations of what you want to test (e.g., new layout, pricing), and the platform shows them to different users, analyzes results, and tells you which variation works best. It helps make data-driven decisions and improve product performance.

Related Session: Experimentation Platform at DoorDash: Yixin Tang, Engineer Manager at DoorDash

Open Data Lakes

An open data lake is a data lake that prioritizes openness and flexibility. It stores data in vendor-neutral formats and uses open standards for easier access and collaboration, avoiding lock-in to specific vendors. Think of it as a public park for your data, instead of a private walled garden.

Related Session: Dive into Data: The Future of the Single Source of Truth is an Open Data Lake: Christina Taylor, Senior Staff Engineer at Catalyst Software

Data-Centric AI

Data-centric AI flips the traditional approach. Instead of prioritizing models, it focuses on high-quality data (labeling, cleaning, augmentation) to train them. This iterative cycle continuously improves data to get better AI results. Imagine building a house: using the best tools with bad materials won’t work. Data-centric AI ensures strong data is the foundation for reliable AI.

Related Session: How to Practice Data-Centric AI and Have AI Improve its Own Dataset: Jonas Mueller, Chief Scientist and Co-Founder at Cleanlab

Sign me up!

As any data engineering professional knows, the best way to stay ahead of the curve is by keeping up with the latest in all things related to data and data engineering. The best way to do that is by joining us at ODSC’s Data Engineering Summit and ODSC East.

At the Data Engineering Summit on April 24th, co-located with ODSC East 2024, you’ll be at the forefront of all the major changes coming before it hits. So get your pass today, and keep yourself ahead of the curve.



ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.