Do your customers value data quality? This is a critical question, as there are often low-cost sources of dirty data which don’t provide data accuracy or extensive coverage. When building a premium solution, it’s important to begin with a plan to deliver high quality data that has been normalized across the many data sources available. In fact, it’s the variance between data sources that raises one of the barriers to entry to competitors who might want to offer their own data solution. Tapping data sources via API rather than cutting corners with less rigorous methods allows for accuracy as the trend data changes over time.
Each data source has its own nomenclature and unique methodology for handling changes, so it’s essential to start out with an understanding of how each data source is structured, what is included and excluded and how modifications to the data are shared. In the case of sales data, how are cancellations and returns tracked in relation to the original order? For market data, how does each ERP or eCommerce solution handle sales and listings? Resistance to documenting these details ultimately results in errors down the road, so ensure that the appropriate state of the data is normalized across all of those sources to reduce customer questions in the future. Another challenge is that data inconsistencies can occur when working with a wide range of data sources for similar data. Those differences can be as simple as varied formats, mis-leading naming conventions or spellings. A focus on quality in this area means continuous resolution of inconsistencies and a commitment to invest in dataset profiling and adaptive rules to learn from the data itself.
In a product portfolio, each product needs to be able track a customer’s behavior in regards to the key functionality of that individual product. Deduplicating records across the portfolio is especially challenging in a merger and acquisition scenario. Deduplication enables a bigger picture to emerge that can empower more effective marketing initiatives, as well as provide greater potential for valuable insights for the customers themselves.
Overall, the data in a solution must be trustworthy as business decisions are made based on its quality. While there are core principles to building a solution beyond data quality such as timeliness of updates, data relevance, reliability and completeness, implementing data quality processes is a solid place to start the productization journey.