Data Completeness – Another Foundational Element of Data Quality

I’ve shared thoughts on data consistency and next it’s time to talk about a related topic – data completeness. As a Product Leader, it’s important to put a visionary plan in place to increase the completeness of data over time. Data completeness ensures that every necessary piece of information is present in the dataset. Outline which data is most essential and what can be missing without compromising the usefulness and trustworthiness of your data set. Define this at the beginning of your project.
Your customer master provides a great example of the importance of data completeness. When assessing your business opportunity having complete customer information (enterprise business hierarchies, sales volume, market focus etc.) for each enterprise ensures that analysis can accurately reflect the diverse characteristics of each segment your solutions serve and allows for better product feature targeting. For instance, if the actual parent/child relationships of business partnerships are not understood, conclusions drawn about price and sales connect rates might not be applicable. If a CRM unintentionally duplicates customer records, or fails to identify which sub-companies are part of an enterprise, the number of target customers can be over- or understated.
There are multiple approaches to consider for achieving data completeness ranging from attribute and record-level approaches to data sampling and data profiling to surface metadata about your data. You can use profiling to analyze patterns, distributions, and missing value frequencies within your data sets. As the scale and diversity of data ingestion grows, more automation is needed. Support your team to implement data observability capabilities that leverage historical patterns to detect incomplete data (like when 10 million rows turn into 1 million during a cloud migration) and establish alerts based on rules and thresholds. This allows the team to be proactive in addressing issues before customers notice data as incomplete.
Another automation approach is to visualize how data is collected and how it moves through the system. This is called data lineage. Once implemented, data lineage helps operational and engineering teams understand and monitor upstream and downstream data dependencies. Producing data lineage requires identifying your data assets, tracking those assets starting with ingestion sources, documenting all those sources, mapping the path of data as it moves through various pipelines and transforms, and finally pinpointing where the data is being served up in reports and across portfolio products.
As a Product Leader, it’s easy to be enticed into focusing on product features and functions while over-looking the importance of setting up the team to support on-going growth and success. Just as there are minimum viable product features for product users, there are minimum viable features for the architecture. Ensure you leave enough engineering bandwidth in your development plan to build a foundation of data quality that can be extended and enhanced as your data offerings are adopted. You’ll be most effective if you lay out a long-term product vision that will demand the rigor of enterprise-class data quality from the get-go – another benefit is your engineers will be excited, too!