By Jae Myong Wilson -- get more articles like this sent to your email.
Last week as part of a Jet Reports Brainteaser, we explored a common report requirement--grouping transactions by an attribute off the vendor card. The frequently-used solution would yield an incorrect result.
"I need help creating a report in Jet Express [report writing software]. I have the "Age" field in the vendor table and I want to show purchases grouped by age bucket: ex. 10 - 20, 20 - 30 , 30 - 40 and the count of vendors .. need help"
Business Use Cases and the Target Audience
Understanding the implications between grouping by historical attribute values versus present day value (consider the example of customer reservations grouped by preferred status in a frequent flyer program) facilitates the accurate interpretation of data and also has valuable applications for business analytics dealing with maturation.
Problem: Calculating age the old-fashioned way is slow and inefficient
Aggregating data by rapidly changing dimension attributes (as opposed to slow changing) are the textbook use case for Junk Dimensions.
Report Requirement: Build a report that shows purchases grouped into vendor age buckets.
Challenge: Even if age existed on the vendor card, that value would be current age as opposed to the vendor's age at the time of the activity. It is not possible to create the required report without calculating age for each transaction--[Posting Date] - [Birthdate].
While the solution is theoretically possible without a data warehouse, as data volume scales, this report would become increasingly slower because every time you run the report, you have to look up birthdate then recalculate age for each transaction before grouping by age buckets. Imagine if you had to do all that for 1 million transactions multiple times a day. Now imagine that you're passing that reporting requirement to your transactional system while other users are trying to input or update data. You'd be really unpopular for slowing down the system.
Note: the user who submitted the support request may have been working with a version of NAV where the vendors are donors.
When a Slowly Changing Dimension is NOT slowly Changing.
A reader from last week's article will gravitate toward applying slowly changing dimensions (SCDII) to the vendor age column to generate a new row in the vendor table every time they have a birthday. This solution while not 'wrong' is not efficient.
The idea behind a slowly changing dimension attribute is that the attribute changesslowly. Demographics information like salary, number of cats, household or zip code' are all excellent candidates for SCDII because they change infrequently and irregularly whereas age moves on with the regularity of clockwork.
Consider a database of 1000 Vendors. With an SCDII age column, after 10 years guaranteed you'd have 10,000 records, that's not very efficient!
Junk Dimensions - A sophisticated alternative to SCDII
The optimal solution would add a calculated column, Vendor Age, on the vendor transactions fact table as you load it into your data warehouse. Because it's part of the table, we don't have to lookup and recalculate age for every transaction every time we run our analytics.
Smart, Extensible & Discrete Grouping BucketsIf you're wondering why we stamped each transaction on the fact table with the actual age instead of an age bucket, think about how your office REALLY works. Today the analysts want to see transactions grouped by 10 years and tomorrow they'll probably ask for 5 and the day after tomorrow... who knows?
Create a separate dimension, Age Dimension, that contains all your permutations of vendor age buckets, (in data warehousing nomenclature, this is referred to as a 'junk dimension') then add columns for each grouping variation (Group by 5, Group by 10, 30-60-90, etc.).
Side Note for BI Developers: Develop your solutions assuming that you can't reload your entire fact table. With this premise in mind, try to develop the most flexible dimensional model possible. It's easy to extend and rebuild a dimension with 100 or 1000 rows and columns, but very time intensive to reload a fact table with millions or billions of rows.
What about buckets for Lifetime Purchases?
Just like vendor age, aggregating by historical life-to-date activity facilitates analyzing maturation. We could analyze whether shopping basket size, frequency or item combinations change as our vendors mature with our company.
For those familiar with cubes from PowerBI or some other front end tool like Tableau, Targit or Jet Enterprise, you may think that 'because lifetime sales are precalculated and retrieved almost instantaneously from SSAS cubes, we'll use a cube-based solution to show sales grouped into lifetime buckets.'
Oops. There's the answer. Buckets. Just like age, we have to consider the implications of aggregating by present-day lifetime sales versus historical. In the case of the latter, we'll extend the fact table with a running total for lifetime activity, then use a junk dimension (or extend an existing one) for discretized buckets.
In Conclusion - The Most Important Part
Whether an internal developer or external consultant, you must understand how the users intend to use the report.
Ask for clarification!
Original Report Request: "I need help creating a report in Jet Express [report writing software]. I have the "Age" field in the vendor table and I want to show purchases grouped by age bucket: ex. 10 - 20, 20 - 30 , 30 - 40 and the count of vendors .. need help"
The Answer is not always the Right Answer:
Report requests are opportunities for internal developers and external consultants to test their knowledge. In the original brainteaser I proposed 3 solution permutations that did not involve a data warehouse, but the real trick was to see if contributors recognized that the requested solution could lead misleading results.
At Onyx Reporting our data strategists recognize and recommend the following:
Join our mailing list. Stay informed about the world of analytics, reporting, and Data Strategy.
Author Jae Wilson, lead data strategist at Onyx Reporting, partners with co-author Joel Conarton, director at executive and management consultancy Catalystis, to provide strategic solutions for data-driven organizations.
Are you ready to partner with us?