In the article, "Should you Build BI before you clean your data?", BI champion and VP of BI Sales at Jet Reports, Jon Oesch invites users to question the paradigm of waiting till data has been "enthusiastically governed to a state of truth and reliability" before implementing a BI solution. Seems counter-intuitive, but it actually makes sense, How do you know you've got dirty data unless you're actively looking for it already?
I build tons of reports for customers that ask "What happened in my company," but I seldom encounter clients that ask me to identify transactions that are dissimilar from everyone else. Customers frequently ask for "Top 10 Item Sales", but what about reports that show Invoices that were 100% more (or less) than the average? OK fine, maybe 100% variance isn't a big deal, but what about 1000%?
If you're not worried about sales outliers, what about any Shipping Address where the Length of State is greater than two characters? Jon points out that "Nothing highlights inconsistency in your data like trying to build a dashboard against it." I was doing work for a customer where we were trying to report on sales by state. When we dropped "State" into the Pivot table, out came the mess pictured below! Despite using a powerful ERP like NAV or GP, user entry errors were clearly mucking up what should have been an easy dashboard--"I just want to see Sales grouped by State!"
In "Garbage in, garbage out: Why the quality of your data maters more than its speed", Andrea Drajewicz from SimpleBI.com presents the counterpoint to the Jon's article, "If the data you're putting in is less than high quality, you definitely won't be getting high quality insights at the other end." I know that 97206 is in OR and Oregeon is a misspelling, but how are we ever going to know Total Sales by State without a thorough data scrubbing exercise?
With increasingly sophisticated reporting software and an infinitude of self-service BI options, we're getting ever-nearer to the holy grail of real-time answers to virtually any business question you can think of! We're promised "one version of the truth", but as we look at our Dashboards full of "UNKNOWNs", or states that clearly aren't states, we have to decide, HOW are we going to handle the errata in our data? There definitely needs to be a data cleanup process, but what should that look like, and do you have access to resources who can help you with that?
When it comes to handling dirty data, Andrea recommends four options:
I think of the last two options as required steps in part of a BI implementation. Yes we get excited about the fancy dashboards, but please go back and make sure the numbers are right!
I recently was engaged to work on a 'customer rescue' where the customer said, "we can't use our BI product because it keeps spitting out the wrong number!" "What's the definition of wrong?" I asked. The quip response might sound cheeky but that's the first step of a data audit.
Grab a report and answer:
Answering the last question usually involves reverse engineering the BI solution to see how we got from your ERP to the number you see in the Pivot Table. Many times, the problem is just realizing that our definition and your definition of "Expected Run Time" are different! Or maybe we just needed to add a filter for "Company" to differentiate between our many companies.
Data Audits are not a one person job! One of the biggest mistakes businesses make when they implement a BI project is to assume that the product is going to work straight out of the box without any customization. When Excel drops a number or list of States onto your spreadsheet and the number is 'wrong', this is not a problem, this is an opportunity! "What are the components of those numbers?" ,"What were the transactions that placed those numbers into the system." Ideally your BI solution allows a savvy user to go under the hood and identify the fields, calculations, transformations and filters that were implemented to come up with a number.
Ex. When you're doing an analysis of Sales, Inventory or Manufacturing, is Item Cost = Cost Amount (Actual) or should it be Cost Amount (Actual) + Cost Amount (Expected)?
At the end of the day Andrea and Jon are both right -- no matter how fancy the tool is, if you throw garbage into a reporting system, you'll get garbage out, but I wholeheartedly agree with Jon. There IS "magic in seeing those imperfect numbers in an analysis" because it gives you a place to start!
Ugly dashboards are an opportunity to identify, audit and sanitize dirty data! Maybe it's a tweak in how a measure is calculated in your reports, but if it's a data entry issue, you have a world of opportunity! You can implement new business procedures, or leverage automated Jet Essentials reports to audit the outliers for you! And let's face it, who doesn't sleep better at night knowing that you're actually capturing and recording cleaner data?
Jae Wilson is a freelance Jet Reports and BI consultant at OnyxReporting.com. Visit our website to schedule your free 30 minute introductory and fact-finding consultation.