Cover Image
Author's Image
Sebastian Karrer
The best way to a good database: Manage Data as Products
Data analytics programs place high demands on the database. Data must be available in sufficient quality and quantity and it must be the right data. However, this is often not the case in a company when the analytics initiative starts. In this situation, companies should be inspired by their own product management or successful start-ups: data are products for very specific users and should also be managed as such.

It's a recurring pattern that you may have already experienced yourself: the first data analytics use cases take longer to complete and cost more than planned; and that happened despite the fact that large investments and efforts had already been made in advance to have an extensive database available at the start of the project.

Is this normal at the beginning of an analytics program, even though so much preparatory work is done beforehand? Will the next use cases become more efficient and faster? What is needed to ensure the next batch of problems is not just around the corner as the analytics program scales?

There is no panacea for companies at this point. Every business is different, especially when it comes to the goals they set for their analytics program. The goals of the analytics program in turn determine which methods are used and what the requirements for the database are in terms of the required amount of data, the information content of the data and metadata, tolerances, throughput of data pipelines and much more.

So if there is no patent remedy, is there at least a proven procedure that companies can use to raise the data basis to the level required for a professional analytics program? Practice shows that the development of a reliable database is indeed possible with a limited investment in a relatively short period of time. There are proven ways that have helped analytics programs to achieve resounding success.

The challenges of an inadequate database

To once again emphasize the importance of a solid database, let's start with the example of a large automobile manufacturer. The company had set itself the goal of reducing the costs for assembly corrections in production. The task of predicting the dimensional accuracy in the production of certain parts was derived from this goal. The gained knowledge should then flow into the manufacturing process and render correctional work redudant.

When the analytics team dealt with this use case, it quickly became apparent that 80 to 90 percent of all data points taken during the production of a car were not quality-assured. They just hadn't been needed before. The measurements had not been taken at the defined positions and the metadata was also incorrect, i.e. the information that describes what information a data set contains. The assignment of measurements from the body shop to the measurements from the finished vehicle was difficult to implement because different IDs were used for vehicle assignment. So the amount of available data was huge, but any model trained on that data would have suffered from poor prediction quality. At the same time, the cost of maintaining the database was enormous, especially considering that only a fraction of the stored data series and only 15-20 production cycles data points out of those series were eventually used for the project.

The team used a very quick approach to identify and understand the problems in the datasets: 1. Consistency check of the existing data. 2. Examination of the data with a simple model 3. Validation the suspected fundamental relationships

This process led to a good understanding of the data and provided exactly the insights that were needed to improve the database in a targeted manner. When that was done after a few weeks, more complex models were used, which then delivered an even better than expected prediction quality.

Applying the most important insights from this case, what is the right way for companies to go about building a data basis so that such problems are prevented, rather than solved during a project?

What is a data product?

We recommend that companies approach the development of their database as if they were launching a new product on the market. In this case, data or information is the product and the target market is internal departments and processes that generate added value with the data.

Analytics Project Scheme

Similar to a multi-stage production process, data products can be differentiated with different “degrees of vertical integration” or “value depths”:

The added value of good data products is that they are fast, secure and easy to use. They make it possible to get the desired information much more precisely and to complete projects up to 10x faster. On the user side, many errors are prevented because the complexity of using the data is minimal. Quality assurance and the qualitative improvement of the data basis are anchored in the process.

It is just as important for the company that good data products keep the costs for maintaining the data infrastructure to the minimum, for example because the overall system architecture is simpler, the amount of data is smaller, data are stored in the optimal format and a way to scale the data use is provided for from the outset.

With a data product so defined, what is the right approach to building good ones?

How to manage data like a product in practice

If companies should think about data like about products, can established processes from product management be transferred to the structure of a database?

The simple answer is "yes", although these are of course products of a different nature than , for example, physical goods or standard software. Companies should always consider the following in order to develop the right approach for their organization:

  1. The fundamental question: Which data products provide the greatest possible benefit for the analytics program?

What does the company want to achieve with the analytics program? What aspects of the core business does the program aim to improve? The start is again at the data strategy level, which, in addition to the vision, also defines what the use cases will be for the data, e.g. whether a predictive model should control or automate decisions in real time. And that is exactly what “data products” should aim for: they should contain the necessary information and be accessible in a way that the analytics program as a “customer” can easily access them.

In fact, regardless of the size of the budget invested, those companies make the best progress that develop their database towards a goal. The goal is to use the data, usually through an analytics program, which in turn is geared towards the core business of the company. This way, a common thread runs through the initiative and it becomes possible to make the value contribution of individual projects transparent and measurable.

A particularly important strategic aspect for the structure of the database relates to scalability. Right from the start, it is imperative to set up data infrastructures in such a way that the company can use the data as flexibly as is necessary for the analytics strategy.

  1. Database and analytics simply develop faster in coordination with one another

How many of your products and how many of your databases have a dedicated product manager?

As with a new product, the company should not only expect a learning curve when building the database, but plan it. So the questions are how to accelerate this learning curve.

An agile approach is in our experience the best for this: The company should start with the first applications right away: The existing database can almost always be used for introductory analyses, which often already provide the first usable information and added value. On the other hand, many inconsistencies in data sets only become apparent when the result of a model is discussed with the domain expert. The database and analysis methods can be improved most quickly in step with one another.

Product management, with its many methods of incorporating customer feedback into the product, shows how the development of the database and analytics program are best aligned with each other. A transparent and simple process is necessary for the analytics team to make requests to those building the data product - and a selection process that decides whether to accept or reject the requests.

Also consider the time it takes for your program to build up the necessary analytics competence and establish the communication structures with the specialist departments. Analytics is a multidisciplinary matter and it often takes time for data scientists and business departments to understand how to best work together with their counterparties.

  1. Build data products that are easy to use

What has long been a mantra in sales has not yet found a foothold everywhere in the IT environment: a product, its added value and how it can be used must be easy to understand. It's no different with data products: even if the users are exclusively internal specialists, " ease - of -use" is still extremely important. More analytics programs fail because of difficult-to-access database systems than because of missing data.

Companies often don't even know what data they actually have. Getting an overview of which data is available in the individual business units is an enormous advantage for other departments. This knowledge enables employees to develop their own ideas or process use cases much more quickly. And in the end, it's mission critical to scaling an analytics program.

An important aspect is metadata, such as IDs or categorizations, which are still neglected in many companies. However, they are often the key to success and the basis for employees to identify opportunities for improvement or use of data sets that they did not personally develop.

Clearly defined interfaces for all users are also very important, as they make access to the data easy and quick. APIs with open, well-known and documented specifications, such as OpenAPI or AsyncAPI , are often ideal and enable data teams to start working on the data right away, rather than spending the first days of a project forging data connections.

  1. Good data governance leads to smooth data maintenance and use

Data governance is a term, which is often used, but not always well understood. It describes the framework conditions for the use of data; the house rules for data users, so to speak. This is often already needed to meet regulatory requirements. For example, many countries and even different regions of the same country have very different laws regarding the storage and use of employee or customer data. All of these must be observed by the company and critical data must then be anonymized in order to be able to use them.

Analytics needs high-quality data, but the business to stay in control of the cost to deliver the data. This, in turn, requires the organization to clearly define who is responsible for creating and maintaining a data source, and who may use the data and to what extent. Companies are well-advised to also take data security transparently from the start.

Organizational issues: From data warehouse to data mesh

The way of thinking of managing data like products is already standard in highly digitized companies. But in order to bring data products into being, the company needs to staff the responsible teams.

Organizations that have a central IT and / or implement the entire analytics program with a central team, it makes sense to also manage the databases centrally. But there are also other approaches that might have advantages. In any case it is important that technology and organization have to fit together.

One of the best known alternatives to a centralize data base is the data mesh. In a data mesh, the development and maintenance of data products is decentralized. Data bases are managed close to where the data is generated and where most of the knowledge about the processes behind the data is. The entire responsibility for the development and operation of data products lies in the hands of the functional departments and not in a central administrative unit. A data mesh is therefore also technically a decentralized data architecture within which data is treated like products.

This approach enables more flexible control mechanisms and higher speed than a centrally managed initiative and aims to move the entire company to a data-driven mindset.

The central IT or analytics program planning still has an important control function: It must ensure that there is no “uncontrolled growth”, that the various data products complement each other and that common standards are observed.


Good data is the foundation of any analytics program. And many companies do not proceed efficiently when setting up and optimizing their database - and thus waste time and money.

The most successful companies manage their data like products. They develop data products tailored to the needs of the users - usually a wide variety of specialist departments and their own analytics program. Standardization and clear responsibilities for the provision and use of the data products should be clearly defined and documented. The approach automatically leads to the strategic expansion of the database and quality improvements in a very targeted manner.

Good data products reduce IT costs and increase the speed and reach of the analytics program, which can focus on what it was created for: giving the company a competitive advantage. And the most modern advances, such as a data mesh, do an even better job of enabling the analytics program to create even more value for the business. Most importantly, decentralized approaches foster a data-driven company mindset.

By the way: Feel free to think outside the box. Users and providers of data are not only found in their own organization. Industry-wide or cross-company databases often have a special value, as they may give a company access to information that they could not otherwise get in their own circle of customers or suppliers. And customers certainly also appreciate customized

Article image
How to turn data analytics into a competitive advantage for your company
Article image
Künstliche Intelligenz in der Industrie 4.0 – Das Exoskelett
Article image
Best Practices for Data Analytics Projects