Last week as part of a Jet Reports Brainteaser, I reposted a request for help I found in a user group forum:
"I need help creating a report in Jet Express [report writing software]. I have the "Age" field in the vendor table and I want to show purchases grouped by age bucket: ex. 10 - 20, 20 - 30 , 30 - 40 and the count of vendors .. need help"
In the comments section, I outlined three solution permutations suitable for a mixed range of reporting prowess, but building the report described is secondary to the primary issue and focus of this article which addresses:
Why or When is grouping by data on the Vendor card wrong?
In addition to grouping by Vendor age, this two-part article will introduce two new variants of grouping by dimension attributes which will require vastly different solutions:
Get more articles like this sent to your email.
Analytics based on Historical Attributes (not Current Values)
The header says it all.
Although it is theoretically possible to build the report as described against the Vendor card, in most cases the results would be 'incorrect,' because we're aggregating by current age instead of the age at the time of the transaction. This nuance has more obvious implications when analyzing changing purchasing behaviors of 10+ year customers or if vendors are actually aging donors contributing to non-profit organizations.
With regards to the validity of aggregating by current or historical dimension attributes, there is no right answer except to ask the business users for clarification.
Do I need Business Intelligence?
Although the report is theoretically feasible (with caveats) in either permutation, the most efficient and flexible solution requires a data warehouse.
A data warehouse effectively mirrors the data in your ERP, but the tables are restructured and optimized to support reporting and analytics. Each of our three use cases (aggregate by age, preferred status or lifetime sales) calls for discretization of a dimension attribute into buckets, but each reporting requirement will require a slightly different implementation of foundational BI modeling techniques:
Tracking Slowly Changing Values
Analyzing customer hotel reservations grouped by historical preferred status is the textbook use case for data warehousing and slowly changing dimensions
From 2011 to 2012 Billy had bronze preferred status. Then in 2012 he quickly moved through silver and into gold where he remained until 2016 when he finally achieved platinum status.
Build a report that shows the dates Billy changed status and total reservations grouped by status.
The report may be impossible if the Vendor card is the only record of preferred status.
Your Transactional System LOSES Key Analytics Data
In many ERP implementations, you'd see a field "Preferred Status" on the Vendor card, and as Billy achieved a new status, that value would be updated. No matter how robust PowerBI, Jet Professional or any other reporting tool gets, when users update the Vendor table, RECORD OF THE HISTORICAL VALUE IS LOST FOREVER (changelogs or other costly non-Normal customizations notwithstanding).
From your ERP, it is IMPOSSIBLE answer:
Note: this report variation would slow as data volume scales and potentially affect users using the ERP.
Historical Preferred Status: A textbook use case for Slowly Changing Dimensions
Slowly changing dimensions (SCDII) facilitates tracking updates to key dimension attributes. Recall, the data warehouse duplicates the ERP tables but restructures data for reporting and analytics.
In the data warehouse, we'd duplicate Billy's record in the Vendor table each time the "Preferred Status" was changed, and add two new columns "Valid From Date" and "Valid To Date" to track which version of the 'bjackson' record applies to a transaction.
Side Note: In the data warehouse because we can no longer use the Vendor ID, 'bjackson', to uniquely identify a record in the Vendor table, we create a new ID column, Vendor Key, to relate the correct bjackson record with a transaction. The new ID is called a 'surrogate key' in BI nomenclature.
Side Note 2: although PowerBI is presented as an alternative to traditional data warehousing, to date slowly changing dimensions is not available in PowerBI, IMHO excluding it from data warehouse alternatives. See post: Why PowerBI will not replace your Data Warehouse.
What if the Dimension doesn't change Slowly?
A savvy reader may be thinking, "if SCDII can be used to track changes in preferred status, couldn't the same solution be applied to track changes in age?" The short answer is "yes". The long answer is "that would be inefficient and not make sense." While demographics information like salary, number of cats, household or zip code' are all excellent candidates for SCDII because they change infrequently and irregularly, age moves on with the regularity of clockwork.
Consider a database of 1000 Vendors. With an SCDII age column, after 10 years guaranteed you'd have 10,000 records. There must be a better way of analyzing data in age buckets. There is! "Join us next week!"
Sorry to leave you on a cliffhanger! Sign up for our mailing list to be the first to know when the next article is published!
In the interim, when creating or assembling report requirements that involve aggregation by dimension attributes, make sure to ask: "do I want to aggregate by historical or current values?" If the implications of either option are unclear, clarification can only come from the business user.
Author Jae Wilson, lead data strategist and BI consultant at Onyx Reporting, partners with co-author Joel Conarton, director at executive and management consultancy, Catalystis, to provide strategic solutions for data-driven organizations.