Don’t get us wrong, we love and use BI tools, but we believe that business logic should live somewhere more universal and accessible.
It’s no secret that most companies use multiple downstream tools to visualize, report, analyze, and act on data. By downstream, I’m referring to tools that come after the data has been extracted, loaded, and transformed in a data warehouse. Examples of common downstream tools are Mode, Tableau, and Optimizely.
There are many other tools that might be used by business consumers or data teams. This blog post will focus on downstream business intelligence (BI) products.
Most organizations have more than one business intelligence tool
A 2020 business intelligence trends study by SAP showed that businesses on average use 3.8 different BI solutions. Let that sink in. 3.8! That’s the average, which means there are many companies that use four or more BI tools. This may not come as a surprise to some, and maybe your company has found its groove with five or more BI tools, but this statistic gives me the heebie-jeebies.
Here’s why it should scare you too (hint: it’s not because of the money)…
Better decisions, less painful process
Whether you’re a data producer like a data analyst, data scientist, analytics engineer, or data engineer—or if you’re a data consumer, like a business analyst, manager, or executive—you’ve likely suffered the pain that manifests from having multiple BI tools at your company. Storing critical business logic in the business intelligence layer, and having that logic differ between tools, is a cause of confusion and erodes trust.
Let’s start from the perspective of the data consumer…
You’re the one that’s using data to report out to Wall Street, make a crucial decision for your company’s sales motions, or make the call on how much to invest in product-led growth initiatives. You’re trying to understand why the metrics in your Mode dashboard and Tableau dashboard are showing different numbers.
You know the questions:
- “Which one is correct and up to date?
- Could both of them be wrong?
- Have these dashboards rotted entirely?”
We’ve all been there.
This can be a paralyzing experience and it obviously slows down your decision-making. When your job is to make good decisions for the business, driven by data, but you have no idea what to do next, you worry that you don’t have the information or the context you need to do your job.
Likely, your next move is to find someone that might have the context to get the right answer. Maybe you track down your Head of Data, a data analyst, or data scientist you’ve relied on in the past and you try to get them to help you find the right metric value.
Shifting to the data producer’s standpoint…
The data person is probably greeting you with a smile and offering to help you. Internally, though, they’re annoyed that you keep asking a similar set of basic questions every few weeks (“Again?!”). They would rather focus on their exploratory analysis that’s helping fuel innovation on the product side than answer your boring questions (harsh, but true).
The data person adjusts their priorities and is able to compute the metrics you’re looking for, but actually doesn’t realize that the fields they use to compute the metric are not the same fields that other data people at the company use to compute the same metric. Lovely! That means there are multiple definitions floating around in SQL and in dashboards of this same-name-different-logic metric, all siloed about the organization and likely reported out and used to make decisions (oops).
You probably get the point now. No matter what side of the data you’re on, this rigmarole sucks. Everyone loses time to work on valuable things, dashboards are not trustworthy, and no one will ever feel confident in the data, perpetuating the same cycle of computing metrics in a siloed way (i.e., “I’ll just ask this person who has helped me before”).
When you experience these situations, there’s no easy or scalable way to be confident in your data. It would be an enormous amount of manual overhead and resources to ensure that the business logic is perfectly consistent across all the tools that a business uses.
And this is exactly why *megaphone please*—logic to define key metrics for your business should NOT live in BI Tools!
Avoiding dashboard sprawl
It is unrealistic to assume that a given BI product that a team or company is using will be the end-all-be-all for consuming and analyzing data. There will always be another tool. As such, BI tools are not the right layer to capture the important logic that goes into defining metrics, because companies often use many of them. This is all in addition to the other downstream tools that analysts and other consumers might use, like experimentation or CRM tools.
Because there’s no good way to maintain the definitions across all these tools, companies will end up in a state where they have dashboard sprawl—multiple dashboards representing similar things—where no one is confident on which one is actually “true.” This puts business teams in paralyzing positions as demonstrated above, and leads to wasted time for both the data consumers and the data producers to seek out the “right” answer.
We propose that the data stack needs a metric store that is independent of any BI tool to capture business logic and make downstream consumption consistent. Transform is the layer of consistency in the stack that allows business users to assert their metrics in code, allowing these critical metrics to be accurate in all downstream tools regardless of what’s used by an organization or team. Maybe your analytics team likes Mode and your customer success team likes Tableau—it doesn’t matter! When metric logic is defined ahead of all the tools data consumers use, you eliminate dashboard sprawl, metric inconsistency, and a lack of trust in data.
How to define metric logic in Transform’s metrics store
Transform solves this problem because it allows you to define your metrics in a single place, use those same definitions in every downstream tool, and version control them as you would any other software product. Furthermore, Transform also asserts ownership and approval over metrics, so that there’s an accountable team or person for ensuring that metric is correct and usable.
At a high level, this is where Transform sits in the data stack:
To define your data sources and metric, Transform’s Metric Framework has a concise YAML format. Continuing with the same revenue example, let’s say you want to define revenue from a table with transaction value, transaction amount, and the product line that the transaction came from.
First, define the data source that includes your measures in the table (like transaction value), and dimensions (like product line). This data source will serve as a building block to metrics that will be created on top of it. You do not need to define this data anywhere else and the logic of the data model now lives in code and cannot be changed without approval.
data_source: name: transactions description: Transaction table owners: - firstname.lastname@example.org sql_table: finance.txns identifiers: - name: transaction_id type: primary measures: - name: transaction_value description: The total USD value of the transaction. agg: sum - name: transactions description: The total number of transactions. agg: sum expr: 1 dimensions - name: txn_date type: time - name: product_line type: categorical
That’s your data source—and you can build one or many metrics on top of it. Now, we can define the revenue metric! Again, once it’s defined here, there’s no need to define it anywhere else, and you can use Transform’s Metric APIs to move this data into whichever downstream tools you please.
Now that you have your data source, here’s how you would define revenue, calculated as an expression of transaction value times the number of transactions.
metric: name: revenue description: Revenue from all products owners: - email@example.com type: expr type_params: expr: transaction_value * transactions measures: - transaction_value - transactions
Transform can help mitigate the pain felt by dashboard sprawl whether you’re producing or consuming data. By streamlining metric definitions and allowing business users to get context on metrics in a self-serve manner, this reduces inaccurate decisions being made due to inconsistent metrics and decreases the time that data teams are fielding repetitive and mundane requests, increasing productivity and focusing your teams on higher-value initiatives.
Defining logic in a single place is the best way to ensure accurate metrics across your organization. It also means that you are making the best decisions for your business, regardless of which tools your company uses for visualization, reporting, or analysis.
We know how important it is for our customers to be able to integrate with all of their BI tools. That’s why we’re busy building out integrations (we recently launched our first-class Mode integration). Follow our blog to keep up with new releases!
The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.