ONYX REPORTING LTD.
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters
Search

Garbage In = Garbage Out? or Build BI on Dirty Data?

28/3/2015

0 Comments

 
Garbage In Garbage Out
In the article, "Should you Build BI before you clean your data?", BI champion and VP of BI Sales at Jet Reports, Jon Oesch invites users to question the paradigm of waiting till data has been "enthusiastically governed to a state of truth and reliability" before implementing a BI solution.  Seems counter-intuitive, but it actually makes sense, How do you know you've got dirty data unless you're actively looking for it already?  

I build tons of reports for customers that ask "What happened in my company," but I seldom encounter clients that ask me to identify transactions that are dissimilar from everyone else.  Customers frequently ask for "Top 10 Item Sales", but what about reports that show Invoices that were 100% more (or less) than the average?  OK fine, maybe 100% variance isn't a big deal, but what about 1000%?  

If you're not worried about sales outliers, what about any Shipping Address where the Length of State is greater than two characters?  Jon points out that "Nothing highlights inconsistency in your data like trying to build a dashboard against it."  I was doing work for a customer where we were trying to report on sales by state.  When we dropped "State" into the Pivot table, out came the mess pictured below!  Despite using a powerful ERP like NAV or GP, user entry errors were clearly mucking up what should have been an easy dashboard--"I just want to see Sales grouped by State!"
Dashboard Problems
We've got a data entry problem!
  In "Garbage in, garbage out: Why the quality of your data maters more than its speed", Andrea Drajewicz from SimpleBI.com presents the counterpoint to the Jon's article, "If the data you're putting in is less than high quality, you definitely won't be getting high quality insights at the other end."  I know that 97206 is in OR and Oregeon is a misspelling, but how are we ever going to know Total Sales by State without a thorough data scrubbing exercise?

With increasingly sophisticated reporting software and an infinitude of self-service BI options, we're getting ever-nearer to the holy grail of real-time answers to virtually any business question you can think of!  We're promised "one version of the truth", but as we look at our Dashboards full of "UNKNOWNs", or states that clearly aren't states, we have to decide, HOW are we going to handle the errata in our data?  There definitely needs to be a data cleanup process, but what should that look like, and do you have access to resources who can help you with that?

When it comes to handling dirty data, Andrea recommends four options:
  • If it's a one-off problem, make an exception.
  • Fix the data after it gets to you - (Some users might clean up messy data using a blend of fancy Excel know how and Jet Essentials wizardry, but can you imagine all the nasty VLOOKUP's you'd need to substitute the correct spellings for Oregon?)
  • Fix the data at the source
  • Have a Data Audit.

I think of the last two options as required steps in part of a BI implementation.  Yes we get excited about the fancy dashboards, but please go back and make sure the numbers are right!

 I recently was engaged to work on a 'customer rescue' where the customer said, "we can't use our BI product because it keeps spitting out the wrong number!"  "What's the definition of wrong?" I asked.  The quip response might sound cheeky but that's the first step of a data audit.  

Grab a report and answer:  
  • What does your BI product say?
  • What does your ERP say? 
  • Why do they differ?  

Answering the last question usually involves reverse engineering the BI solution to see how we got from your ERP to the number you see in the Pivot Table.  Many times, the problem is just realizing that our definition and your definition of "Expected Run Time" are different!  Or maybe we just needed to add a filter for "Company" to differentiate between our many companies.


Data Audits are not a one person job!  One of the biggest mistakes businesses make when they implement a BI project is to assume that the product is going to work straight out of the box without any customization.  When Excel drops a number or list of States onto your spreadsheet and the number is 'wrong', this is not a problem, this is an opportunity!  "What are the components of those numbers?" ,"What were the transactions that placed those numbers into the system."  Ideally your BI solution allows a savvy user to go under the hood and identify the fields, calculations, transformations and filters that were implemented to come up with a number.


Ex. When you're doing an analysis of Sales, Inventory or Manufacturing, is Item Cost = Cost Amount (Actual) or should it be Cost Amount (Actual) + Cost Amount (Expected)?
DataMovement Jet Data Manager
How do you calculate Cost?

At the end of the day Andrea and Jon are both right -- no matter how fancy the tool is, if you throw garbage into a reporting system, you'll get garbage out, but I wholeheartedly agree with Jon.  There IS "magic in seeing those imperfect numbers in an analysis" because it gives you a place to start!  


Ugly dashboards are an opportunity to identify, audit and sanitize dirty data!  Maybe it's a tweak in how a measure is calculated in your reports, but if it's a data entry issue, you have a world of opportunity!  You can implement new business procedures, or leverage automated Jet Essentials reports to audit the outliers for you!  And let's face it, who doesn't sleep better at night knowing that you're actually capturing and recording cleaner data?


Jae Wilson is a freelance Jet Reports and BI consultant at OnyxReporting.com.  Visit our website to schedule your free 30 minute introductory and fact-finding consultation. 

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Profile Picture Jae Wilson
    View my profile on LinkedIn

    Stay Informed.

    * indicates required

    RSS Feed

    Categories

    All
    Automation
    Basic Training Series
    Business Intelligence
    Connect
    Dashboard
    Data Pipeline
    Data Science
    Domo
    Excel Tricks
    Executive Training & Leadership
    Extract
    Jet Enterprise
    Jet Essentials
    New Release
    NP Function
    Onyx Reporting
    Planning
    Power Pivot
    Python
    Report Writing
    Statistics And Analytics
    TimeXtender
    Visualization

London, UK
jae@OnyxReporting.com
+44 747.426.1224
Jet Reports Certified Trainer Logo
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters