Editor’s note: Jonathan will be presenting on our Ai+ Training platform in January. Be sure to check out the session, “Data Governance Essentials,” on January 26th.
Organizations of all types are creating, storing, and managing increasing volumes of data. If data is to be well-governed, managed, and more easily discoverable, an organization must know, for example, what data they have, how it’s being used, if it’s secure, and where it is.
Answers to these questions require an effort to organize an inventory of all enterprise data and make access to that inventory available to the right staff.
Today this data inventory has matured into what is known as a data catalog.
A data catalog provides an enterprise view of all data. In addition to being a data inventory that efficiently lists all data in an organization, to add substantial value, a data catalog stores metadata on datasets. Metadata is basically data about data.
Types of Metadata in a Data Catalog
What kind of metadata is stored in a data catalog? It falls into three categories.
The first is technical metadata. This is data about the design of a dataset, such as tables, columns, file names, and other documentation related to the source system.
The second is business metadata. This is organizational information about a dataset, such as business description, how it’s used, its relevance, an assessment of data quality, and users.
Finally, there is operational metadata. This includes metadata such as when the data was last accessed, who accessed it, and when it was last backed up.
Access to this data catalog and its metadata provides many advantages to an organization. Assuming all data that is worthy is captured and documented in detail, finding data becomes much easier. Most of us can relate that a lot of time and resources can be wasted trying to find the right dataset.
Data catalogs can help us determine what data we’re missing by the fact that we know what we have. They also make data reuse possible. It’s always frustrating when organizations recreate data sets that already exist. In addition, duplicate datasets can inadvertently create integrity issues because it lowers confidence in knowing which dataset is current and relevant.
The advantages of a data catalog will benefit many stakeholders within an organization.
Benefits of a Data Catalog
Staff on the technical end of the business can use the data catalog to understand and support data needs. For example, this view greatly enhances data modeling by informing analysts about existing data, its structures, security, and quality. Data scientists can be much more effective by being able to tap into disparate data sets and build and evaluate more complex data models and reports. Cybersecurity professionals can use the data catalog to prioritize and manage their approach to information security.
On the business side, a data catalog is a powerful asset for data stewards. As those responsible for managing the life cycle of data, data stewards get a real-time view of the state of data. This includes its quality, usage, and management across systems and organizational units. It becomes another input into ensuring that maximum value is derived from data. For data stewards and owners, a data catalog can help improve operational efficiency and reduce costs.
Finally, all staff, dependent on their access level, can more easily find and discover data across the enterprise.
With an understanding of the role of a well-managed data catalog, it becomes clear that the functions of both data governance and data management are greatly enhanced by its implementation and use.
About the author/Ai+ Speaker on Data Catalogs: Dr. Jonathan Reichental
Dr. Jonathan Reichental is a multiple-award-winning technology and business leader whose career has spanned both the private and public sectors. He’s been a senior software engineering manager, a director of technology innovation, and has served as a chief information officer at both O’Reilly Media and the City of Palo Alto, California. Reichental is currently the founder of advisory, investment, and education firm, Human Future, and also creates online education for LinkedIn Learning. He has written two books on the future of cities: Smart Cities for Dummies and Exploring Smart Cities Activity Book for Kids.
More on the Ai+ Training session, “Data Governance Essentials“: In this masterclass, you’ll be introduced to data governance, an increasingly essential approach to increasing the value of data in any organization and managing the risks associated with it. You’ll discover how your organization can benefit from data governance, how it is implemented, and how it is used to manage a wide range of risks. You will learn how to measure data governance efforts to increase the probability of success. Hot career opportunities in this field will also be discussed. Join Dr. Jonathan Reichental, a renown business and technology expert, author, and professor, as he takes you on a journey to explore the remarkable benefits of a robust data governance strategy. Success begins here.