Going Beyond Taxonomies and Templates
Data Quality Management – The How
In the first part of our Data Quality Management Series – DQM: The Why, we commented that our DQM framework has not been built because of, but despite the rigid data templates we have in place which already sets specific filters on incoming data.
But how do we go beyond taxonomies and templates and ensure our data is actually usable? Our approach is both simple and complex—simple in terms of how we run our DQM and complex in terms of what it takes to get to where we are right now. In other words, how we ensure data quality is not even close to rocket science, yet, is extremely difficult to replicate—and that makes us unique.
Our “Data Quality Factory” has a good balance of human intelligence and machinery that takes the raw data and produces two outputs; One, an error report on the data set and two, an adjusted separate copy of the data set with automatically corrected data where possible.
This “factory” is also designed to be “portable”. It can be tightly integrated into an official data collection workflow or as a separate pre-scanning tool before data enters an official workflow.
As for the machinery, we have four units:
- A data quality rule builder, which is an in-house proprietary tool for all our analysts to feed data quality rules based on a four-eye approval system. What makes this simple tool complex is the content. Our analysts have developed more than 2500 rules in six years, and that number continues to grow. These rules cover various types of checks including simple field-value checks, duplicates, outliers, as well as inter-field and inter-submission checks.
- A data quality rule engine that takes in raw data as it arrives to our repository and performs all checks on the data.
- A data quality toolbox that our analysts use for several quality assurance activities including checking the results of our rule engine, creating a knowledge base of why we accept and reject some automated rule results, etc.
- A quality ticketing system which is integrated into our core repository platform through which we communicate with our clients to put forth our data quality findings and give them the opportunity to either correct them or provide explanations which feed our knowledge base.
As for the human intelligence, I would call that our secret recipe! Our team is extremely skilled in the structured finance space and have a broad spectrum of experience across the structured finance industry. We have people who are experienced in credit ratings, data research, statistics and ABS issuance. Perhaps the most important quality across our team is a very steep learning curve—the zeal to learn, improve our knowledge base and share learnings between peers. Not only do our analysts check the rule results from our automated processes, but they study the data in detail, make comparisons with other documents like prospectus and investor reports, and assess the data from a usability perspective by simulating what data users would do to see if it all makes sense. All these are channelled through our toolbox to our ticketing system, establishing a seamlessly integrated blend of technology, process and knowledge.
Well, that’s the how!
“A rule engine producing quality checks that are fed into a ticketing system after some manual scans. That’s it?”, one might wonder. Exactly! It’s all that simple. And yet, extremely difficult to replicate. That makes us unique!
————-
Gopal
