ONYX REPORTING LTD.
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
Search

Data Warehouse and Cubes Tips and Tricks - March 2017

19/3/2017

0 Comments

 
Picture
Solutions:  
  • Set a default budget for Budget Amount
  • Invert the sign of Income Statement Accounts.

Why this blog series exists:
When you buy an out of the box data warehouse & cube solution like Jet Enterprise from Jet Reports, www.jetreports.com or ZapBI, it usually meets the 80/20 rule (the stock solution satisfies 80% of your reporting requirements, and you'll need further customizations to meet your remaining analytics requirements).  
​Client requests from customers like you inspire this blog series.

Set a Default Value for Budget Amount

Problem:  Some measures require filtering by a default value to show reasonable results.  

Consider the example of the company that creates and revises a new budget each quarter.  While it's perfectly reasonable to show the [Budget Amount] for March 2017 from the 'Initial 2017 Budget' or 'Q1 2017 Amended', it would be misleading to show the sum of both budgets for March 2017.  Similarly, if your organization uses a forecasting tool to calculate expected item consumption or sales, you may have multiple forecasts numbers for the same item and month. 

Solution:  To minimize the risk of 'double counting', we'll modify [Budget Amount] to always filter by a default [Budget Name].
In this solution, we hard-coded a budget into the Default Member as part of the dimension definition.  For improved maintenance, we could add an [IsDefaultBudget] attribute to the [Budget] dimension in our cube and data warehouse.  Then reference the new column when defining the Default Member.  Ideally our source ERP system can identify the current budget, or we can implement business processes and SQL code to auto-identify the correct budget.

Note:  Because we can have multiple concurrent budgets, but shouldn't sum [Budget Amount] across different budgets, the measure is a semi-additive fact -- in Business Intelligence nomenclature semi-additive facts cannot be summed across all dimensions.  Other common examples include balance, profit percent, average unit price, company consolidation, as well as most cost-allocation schemes.  When customizing our data warehouses and cubes, we must be careful where we implement the calculation to avoid reporting errors.

Non-additive facts cannot be summed (ex. [First Invoice Date]).  Fully-additive facts are 'normal' facts which can be summed across all dimensions.

Invert the sign of Income Statement Accounts

Problem:  Sometimes the dimension used affects how we want to calculate a measure.

In many ERP systems the [Amount] or [Quantity] column have counter-intuitive signs.  Consider G/L Entries which have negative signs for revenue and positive signs for expenses or inventory transactions which show a negative sign for item sales and shipments and positive signs for item purchases.  While the data makes sense to accountants and analysts, business users frequently appreciate seeing signs that intuitively make sense.

In cube-based reports, we'd like to invert the sign on [Amount] for all Income Statement accounts but not Balance Sheet accounts.  Note, it's arguably 'wrong' to invert the sign in the data warehouse because that would make our [Amount] column semi-additive, and potentially cause problems for our auditors!

Solution:  In the cube, use a SCOPE statements to assign a different calculation to [Amount] based on the [GL Account] dimension.
 Note:  in SSAS cube nomenclature we typically deal with two out of three measure types:  Standard and Calculated (avoid Derived measures when possible).

A Standard measure is typically a SUM or COUNT of a column that exists on the fact table in the data warehouse:  ex. sum(Amount) or sum(Quantity).

A Calculated measure is an expression that typically uses standard measures in conjunction with MDX expressions and functions.  Typical examples include division (profit margin percent or avg. daily sales) as well as all business functions (Year To Date, Balance, or  Previous Period). 

A Derived measure is a calculation based on fields from the same fact table -- ex. Profit Margin ([Sales Amount] - [Cost Amount])   When possible, avoid derived measures and use standard measures instead-- just add the calculation as a new column in the data warehouse.  Using standard instead of derived measures has two major advantages:  
  1. Users reporting directly against the data warehouse can use the predefined column.
  2. The cubes will process faster during their overnight load.

That's All Folks

Remember, out of the box solutions by definition are designed to work for ALL customers.  Further customization is expected, even required!  Onyx Reporting is available for development services, training and mentoring as you implement minor (if not major) customizations to your Jet Enterprise implementation.

For clients not using Jet Enterprise, please contact us to find out how data warehouse automation can cut your business intelligence implementation time from weeks and months into days.
0 Comments

SQL - What Version of NAV is it?

15/12/2016

0 Comments

 
If you're anything like me, you've got multiple copies of NAV stored on your SQL server from various clients and if you forget to write down which version of NAV they're on, that can be a pain.  

Fortunately, table structure doesn't change THAAAAAT much ;) but just in case:

SELECT 
    [NAV Database Version] = CASE 
        WHEN [databaseversionno] = 40
            THEN 'Navision 4 SP2'

Read More
0 Comments

Best Practices:  Working with Multiple Environments

29/11/2016

0 Comments

 
Picture
What exactly are "multiple environments?" It's an infrastructure that allows your business to separate BI development endeavors from the live environment the organization uses for reporting and analytics.

"Why would I need it?" Development is not always a quick 5 or even 15-minute fix. Larger projects can take days, weeks, even months to complete, so you need a sandbox for the developers to 'play' in while the rest of the organization continues forward with 100% uptime on their live reporting environment. Some organizations may even choose to separate the Dev sandbox from QA (Quality Assurance) efforts, so a third Test environment may be needed!
But "do I need multiple environments?" As with any business question, 'it depends'. If your development is limited to just adding a field here or a new measure there during the initial implementation of your business intelligence project, you may be able to wiggle by without separated environments.

It may make sense to separate development from the live environment if:
  • your organization has many customizations in the ERP (which won't be reflected in an off-the-shelf implementation)
  • you're tackling questions unanswerable in the off-the-shelf BI environment you purchased
  • your organization is actively using and reliant on the reporting environment
​​

It may make sense to implement a QA environment if your organization has
  • formal processes for auditing data quality
  • legal obligations to report 100% accurate data to external bodies (ex. financial auditors, government bodies)
  • low tolerance for seeing 'weird data' in their live environment

Ready to get started today?
  • Review your current implementation with our  4hr - Scoping & Assessment Workshop
  • Put your BI development team through our DWA Master's Course

This presentation is preparation for a training series Onyx Reporting will be conducting in 2017! Join our mailing list to keep up with special offers, training opportunities and our blog series.
0 Comments

The Data Discovery Hub: a platform for insights

27/10/2016

0 Comments

 
Picture
At two conferences in the same month, I overheard CTO's express, that enterprise resource planning (ERP) systems like Dynamics NAV, GP or Sage could only capture what happened inside a company, and to uncover the 'why', organizations would have to consider data collected outside their ERP (Google Analytics, CRM, webstore, Mailchimp, Twitter etc.)  

Ex. Data analysts can build sales or marketing spend report based on data stored in NAV with aplomb.  They can even ascertain product and customer trends, but without external data, it'd be difficult to ascertain why organizations are seeing the behavior or more importantly, how they can influence behavior.

This blog post will review
  • the use of TimeXtender's Data Warehouse Automation platform to create a proof-of-concept data warehouse which integrates Dynamics CRM in the cloud and Dynamics NAV on premise. 
  • the adoption of a data insights model (the Data Discovery Hub) which facilitates data mashup, manipulation and exploration for data and business analysts.  
  • for classically trained Kimball dimensional modelers, this project demonstrates a workflow for developing conformed dimension sourced from multiple data sources.
The Data Discovery Hub facilitates data integration by separating ETL (extract, transform, load) into three databases:  ODS, DSA, MDW
  • Operational Data Store (ODS):  Extract and store raw source data from web-based (CRM) and on premise (NAV) systems.  Ideally the ODS will be configured to touch the source systems as infrequently as functionally possible and extract the least amount of data necessary to meet reporting requirements (incremental load).
  • Data Staging Area (DSA): data analysts use this area to forge the raw data from the ODS into a presentable format.  Business analysts will ensure the application of business rules and implement transformations required to calculate and feed business metrics while data analysts will find relationships between entities from disparate data sources. (How do you relate an account from CRM to a customer in NAV?)
  • Master Data Warehouse (MDW): This final data store is the source for all reporting and analytics.  This database will feed Qlik visualizations, Jet Reports cubes and reports, PowerBI dashboards etc.  To ensure consistency, users will forgo data transformations and implementing complex logic in their reporting tools, and instead rely on a governed data source that has been cleaned up and prepared for presentation.
0 Comments

Ensure Data Validity (and Completeness) in your Data Warehouse

20/10/2016

0 Comments

 
Picture
In an effort to escape 'Excel-anarchy' many organizations transition away from error-prone spreadsheet-driven reporting toward governed enterprise data warehouses which are expected to provide a 'single-version-of-the-truth'  for all reporting and analytics endeavours.   To that end, during the implementation process, bespoke business rules and constraints are applied to the data warehouse to enforce the consistency and validity of reported data.  ​

Although data warehouse applications (like Jet Report's Jet Data Manager) simplifies the process of applying business rules to data sets, the majority of implementations Onyx Reporting encounters do not report & monitor outlier records that fail data validation constraints.   Translation: you may be truncating data and not know it.  Double Translation:  You may be making decisions based on incomplete data!  That's not to say that supporting tools within the product ecosystem don't exist!  They just frequently go under utilized.
Every Data Warehouse is built on (un)declared Business Rules
The expectation that business keys (like Customer No or G/L Account No) will uniquely identify one member of a dimension is virtually universal.  Though a semi-obvious requirement, particularly when organizations are integrating data from disparate sources (multiple companies, a legacy system, web-based or unstructured data sources), conflicts between the expected versus actual data reality can arise.  Additionally, there can be a disconnect between expected values (as laid out by standard operating procedures) and the actual values recorded in the ERP systems (ex. every sales event should be attributed to a Salesperson or Geographic Region).

To close the loop, and prevent your organization from making strategic decisions based on incomplete information, system implementors must add measures and controls for monitoring and correcting records that fail data validation.

​Go forth!
Join our mailing list to be informed when we publish new articles.

Author Jae Wilson, lead data strategist at Onyx Reporting, partners with co-author Joel Conarton, director at executive and management consultancy Catalystis, to provide strategic solutions for data-driven organizations.

Are you ready to 
partner with us?
0 Comments

Data Warehouse Automation - Advanced Tips & Tricks

5/10/2016

0 Comments

 
As I create new How-To guides specific to data warehouse automation, I'll add them here.

Fuzzy Lookups

Fuzzy Look-ups allow business to match and conform data to a reference set.

Use Cases:
  • A government organisation had several databases for employees including temp hires, contractors, and diverse databases for various civil services.  They also identified challenges when employees left an organisation and came back as contractors or new EmployeeID's.  
  • An entertainment venue with online marketing systems, retail stores, customer loyalty programs, and web sales to track create a unified view of customers while consolidating duplicate entries across the systems.
  • A parcel delivery and courier service had issues with rampant data entry issues for order intake.     Fuzzy logic in conjunction with geocoding addresses  facilitated location-based analytics and faster / more accurate deliveries.

In this video, we use the Jet Data Manager, a data warehouse automation tool, in conjunction with SSIS and Visual Studio to rapidly implement a data quality architecture.
0 Comments

BI Solutions: Aggregate by Dimension Attributes (pt. 2 - Discrete Buckets)

19/8/2016

0 Comments

 
Picture
By Jae Myong Wilson -- get more articles like this sent to your email.


Last week as part of a Jet Reports Brainteaser, we explored a common report requirement--grouping transactions by an attribute off the vendor card.  The frequently-used solution would yield an incorrect result.
​


"I need help creating a report in Jet Express [report writing software]. I have the "Age" field in the vendor table and I want to show purchases grouped by age bucket: ex. 10 - 20, 20 - 30 , 30 - 40 and the count of vendors .. need help"

Read More
0 Comments

Brainteaser Solution: Aggregate by Dimension Attributes (pt 1.  SCDII)

10/8/2016

0 Comments

 
Picture
Last week as part of a Jet Reports Brainteaser, I reposted a request for help I found in a user group forum:  

"I need help creating a report in Jet Express [report writing software]. I have the "Age" field in the vendor table and I want to show purchases grouped by age bucket: ex. 10 - 20, 20 - 30 , 30 - 40 and the count of vendors .. need help"

In the comments section, I outlined three solution permutations suitable for a mixed range of reporting prowess, but building the report described is secondary to the primary issue and focus of this article which addresses:
Why or When is grouping by data on the Vendor card wrong? 

In addition to grouping by Vendor age, this two-part article will introduce two new variants of grouping by dimension attributes which will require vastly different solutions:
  • Analyze hotel reservations by a client's preferred status (platinum, gold, silver, bronze) at the time of the booking
  • Analyze customers sales grouped by lifetime sales volume at the time of the transaction.

Get more articles like this sent to your email.

Read More
0 Comments

Choosing the Right Forecasting Method

6/6/2016

0 Comments

 
With modern reporting tools allowing more and more companies to answer 'what did we do yesterday?' the logical progression is to then predict, 'what are we going to do tomorrow?'  Forecasting solutions vary in complexity with options including elaborate third party software, complex spreadsheets developed by internal analysts, and occasionally a shake of the good-ol' crystal eight ball. I recently had a conversation with a consultant who confided with me that their client swears by XYZ solution. "How does it work?" I asked. Does it look at a bell curves and confidence intervals? 12 month moving average? 3 month moving average? Weighted average? Linear regression? Seasonal adjustment? "It looks at all of that" I was assured. "How does the system know which forecast method to use?" ... silence.
To be honest, almost any method could be a completely (in)appropriate way of forecasting the future! If you sell ice cream or agricultural pesticides, there's likely to be seasonality in your data. If you're predicting the stock market, moving averages and linear regression are common prediction methods. If you're tracking click rates after launching a new marketing campaign ... well that's another level of complicated, but to get the ball rolling, linear or exponential regression models may be a solid place to start.

Calculating Error in a Forecast

Picture
In a nutshell, cost functions evaluate a forecast method's goodness of fit by measuring the amount of error where:
Absolute Error = Absolute Value (Expected - Actual).
In the sample data set I whipped up, we have several months of Actuals and two forecast methods, 3 Month Moving Average and 5 Month Moving Average (3MA and 5MA respectively). An n-month moving average forecasts the current month by taking the average of n preceding months.  
In a future blog post, I'll share how I built these forecast models into my SSAS Sales cube using MDX.
For the month of December, 3MA forecast 625 while 5MA predicted 596. Given that actuals were 846, 3MA was clearly a better forecast because it had the least absolute error.
  • Dec 3MA Absolute Error: ABS(625 - 846) = 221
  • Dec 5MA Absolute Error: ABS(596 - 846) = 250
To avoid the possibility that December was a lucky fluke for 3MA, it would be reasonable even expected to compare absolute error between 3MA and 5MA for other periods.
  • Nov 3MA Absolute Error: ABS(637 - 729) = 92
  • Nov 5MA Absolute Error: ABS(542 - 729) = 187
Because 3MA's forecast is closer than the 5MA forecast two periods in a row, we might say '3MA's average absolute error is less than 5MA.' We've just unwittingly defined the cost function Mean Absolute Deviation (MAD) and how we use it to select 'the best' forecast.
Google tells me that cost functions "provide a criterion by which forecast methods can be compared," but I've chosen to reinterpret the definition as, "cost functions help us decide which forecasting method to use."
DISCLAIMER
To come clear, I am not a statistician. While developing a forecasting data model for a client, I was able to consult a trusted mathematician who guided me through the basics of using cost functions to evaluate and select the 'best' forecasting model. I'm writing this article with the assumption that you either have forecast models that you're currently using, or are interested in adopting budgeting tools for use.
Calculating the Cost Functions MAD, MAPE, and MSEDon't go running for the hills just yet! Cost functions and abbreviations can sound intimidating, but this 5th grade level math will enable us to quantify and compare the error between two forecasts. The three most common cost functions are MAD (Mean Absolute Deviation), MAPE (Mean Absolute Percent Error) and MSE (Mean Squared Error).

Calculating MAD (Mean Absolute Deviation)
MAD is the sum of absolute error (also called deviations) divided by the number of periods measured.
In December and November the absolute errors for 3MA were 221 and 92 as opposed to 5MA's 250 and 187. We concluded that 3MA on average had less error.
  • 3MA MAD: [221 + 92] / 2 = 157
  • 5MA MAD: [250 + 187] / 2 = 219
In application, although MAD measures how much error there was, it's insufficient to the user seeking to understand the severity of the error. A MAD of 157 against a data set measured in the millions would be negligible compared to that same MAD with a data set ranging between 500 and 1000. This desire for providing MAD with context speaks to the next cost function Mean Absolute Percent Error (MAPE).
​Calculating MAPE (Mean Average Percent Error)
MAPE differs from MAD by calculating the average absolute percent error instead of simply taking the average absolute error.
Absolute Percent Error: ABS(Expected - Actual) / Actual
  • 3MA MAPE: [221/846 + 92 / 729] / 2 = 13%
  • 5MA MAPE: [250 / 846 + 187 / 729] / 2 = 28%
Although MAPE and MAD weigh error equally, the statement summarizing MAPE -- "we forecast November and December's sales with 13% accuracy" paints a clearer picture than MAD's "our forecasts had on average had 157 error." For this reason, MAPE is the cost function of choice in most business analytics; however, because MAPE and MAD treat deviation equally, they are unsuitable for forecasts that seek to minimize error per period (as opposed to overall error)
Picture

Calculating MSE (Mean Squared Error)
Mean Squared Error takes the Average of Error Squared.
  • GF Squared Error Period 11: (90 - 70)^2 = 387
  • GF Squared Error Period 12: (115 - 90) ^2 = 640
NOTE: With MSE the unit of measurement is square units as opposed to MAD which is measured in units.
As aforementioned, MAPE is the preferred cost function because it minimizes overall error, but consider the challenge of equipping production lines across factories in different countries. Once resources have been allocated, it may be difficult to reapportion or procure additional machinery; therefore, in this instance our primary concern is not minimizing error across the entire data set so much as minimizing error at each forecast point. In other words, the ideal forecast will allow many smaller errors while avoiding a few significant errors.
In this new data set we have two forecast methods, 3MA and Growth Factor (wherein using a crystal ball we 'managed' to predict expected as actuals * 1.28). Both forecast methods have the same total deviation and MAD but 3MA, with a max percent error of 70%, has larger swings in absolute deviation while Growth Factor consistently has a 28% error for each forecast.
In terms of our factory example, using MSE we would have selected the forecast that equally over allocated equipment at a +28% error across all factories instead of the 3MA model which would have wide swings in equipment over and under allocation.
​
Which Cost Method Should I Choose?
Someone said "there's lies, damned lies and statistics", and it's currently uncertain whether to credit Mark Twain or a British dignitary. Go figure. As with all things business related when faced with the question, "Which cost function is best?" the answer is, "It depends." Although MAPE and MAD weigh error evenly MAPE provides context for the severity of error where MAD does not, thus making it the de facto choice in business analysis.  However, because MAPE represents percent error and 5th grade math tells us we can't divide by zero, you may have to use MAD or MSE in cases where your data set's actuals could include 0's--consider monthly inventory demand for example. Lastly, in instances where we're more concerned with reducing error per forecast, it may be more important to use the MSE cost factor.
One last thought to keep in mind before embarking on the world of forecast evaluation: not all errors were made equal! With MAD, MSE, and MAPE, we're assuming positive error and negative error are equally weighted, but consider a retail environment where running out of inventory may be more costly than carrying inventory. While warehouse fees or product obsolescence from overstocking inventory may add up, overall it may be less costly than a lost sale or disappointed customer.

Visualization:  Closing the Loop

Picture
Take a look at this ball of spaghetti! In a world where our reporting and analytics tool can graph historical demand and quickly generate multiple forecasting models, this mess is just a few clicks away, and though loaded with an overabundance of forecasting data, the chart is too busy and doesn't actually tell us which forecast method to use. Which option has the lowest MAPE?
Picture
The above chart is halfway there insofar as it could be filtered to show us todays' period and identify the forecast with the lowest MAPE (or MSE as appropriate). I can then simplify my chart to the bare essentials: actuals, expected, a trend line, and the prediction for the next period.
Picture

That's All Folks

With the exception of the simplified introductory datasets at the top of the blog, all the charts were generated based on MDX added to the stock cubes provided with Jet Enterprise. If you're interested in adding forecasting or forecast evaluation to your cube-based reporting environment give me a holler at Jae@OnyxReporting.com
0 Comments

My Favorite Things

18/4/2016

0 Comments

 
My Favorite Things--Jae WilsonDocs.com
0 Comments
<<Previous
    Profile Picture Jae Wilson
    View my profile on LinkedIn

    Stay Informed.

    * indicates required

    RSS Feed

    Categories

    All
    Automation
    Basic Training Series
    Business Intelligence
    Connect
    Dashboard
    Data Pipeline
    Domo
    Excel Tricks
    Executive Training & Leadership
    Extract
    Jet Enterprise
    Jet Essentials
    New Release
    NP Function
    Onyx Reporting
    Planning
    Power Pivot
    Report Writing
    Statistics And Analytics
    TimeXtender
    Visualization

London, UK
jae@OnyxReporting.com
+44 747.426.1224
Jet Reports Certified Trainer Logo
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact