ONYX REPORTING LTD.
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters
Search

Domo Data Science Toolkit - Product Review and Best Practices

28/1/2021

0 Comments

 
As part of my "Data Science Demystified" series, I had the opportunity to interview Data Science consultant and Major Domo practitioner John "Doc" Stevens from Crystal Ballers about the Domo Data Science Toolkit.  

These tools include PyDomo & DomoR libraries for local development, Magic ETL tiles, and a Domo Juypter Notebook integration.  Because each tool has distinct strengths and weaknesses, they fit into different parts of your data pipeline and data science project roll out.  

In this video, we review each feature as well as make recommendations on how and when to use each tool as part of your pipeline.

Read More
0 Comments

VIDEO:  10 tips to Polish your Dashboards & Simplify Charts

28/1/2021

0 Comments

 
I had the opportunity to appear as a guest speaker on Janice Taylor's webinar series on Dashboard Design and Best practices, https://www.reportsyourway.com/.  During this 15-minute presentation, I reviewed 10 easy checkpoints dashboard designers can implement to produce concise and effective Dashboards.

Read More
0 Comments

Domo + Python Tutorial: Execute Datasets and Dataflows via API and PyDomo

28/1/2021

4 Comments

 
We all agree that Domo is a very powerful data and visualization platform, but automating Domo using external orchestration tools (like Python or Java can feel daunting if you've never dipped your toes in that space.

The PyDomo library, https://github.com/domoinc/domo-pytho..., makes it surprisingly easy to get started in automation without the user interface.  This past week I sat down with Everli consultant Giuseppe Russo to work through a simple Python script for automating dataset and dataflow execution.

This is the first step in a larger #DataflowOrchestration project we're working on!  Stay tuned!

Read More
4 Comments

Preventing Stockout and Improving Supply Chain Logistics with Data

21/5/2020

0 Comments

 
Picture

The Problem.

Acme Corp is a national distributor of electronics and hardware to small and medium-sized companies.  Because size and cashflow prevent Acme Corps' SMB customers from carrying large inventories, they frequently stockout or request expensive last-minute air freight deliveries.  

Acme Corp recognized that by combining customer sell-out with product shipment data, they could apply forecasting models that would automate and optimize product replenishment for their SMBs.  This would deliver a better customer experience, while preventing stockout and creating new opportunities for optimizing product distribution logistics.

Unfortunately, although SMBs were happy to send till data to Acme Corp, the Domo implementation team did not have access to reliable starting inventories at the customer locations, so they approached me to design a process for deriving a starting inventory.

In this tutorial, we'll:
  • build a dataflow that derives starting inventory
  • learn why certain ETL (blocking functions) tiles have such a drastic impact on query performance and how you can optimize around them.

While this blog post is more developer-focused, it's important to recognise that these types of data normalization activities are the bridge step to building forecasting models and implementing workflow automation.
  • Ex.  If we build a forecasting model around customer sales but don't consider available inventory, it's impossible to gauge whether the lack of sales is 'normal customer behaviour' or attributable to stock out.
  • In a similar vein, before 'advanced analytics' can begin, we must carefully evaluate whether the available data represents reality.  If we don't apply domain expertise to the problem, we might dismiss rows with negative inventory as invalid or outlier measurements and therefore exclude them from our forecasting analysis.

With any project like this, validation is almost more important than the solution designed.  Make sure to identify and confirm assumptions with the domain experts before proceeding to the next step.

​In any case... on to the tutorial!

Read More
0 Comments

7 Tips for the new Major Domo

11/5/2020

0 Comments

 
Picture
If you're new or newer to Domo, things can feel overwhelming.  It may not be clear what the difference between a dataflow and a dataset is.  Or maybe you can't figure out how to make your chart do exactly what you want without having to build yet another Magic ETL.  Even if you're a seasoned BI veteran, getting started with Domo can feel frustrating.

Never fear.  Our team at Onyx Reporting has assembled a few tips to help you move faster and experience less frustration in Domo.

Read More
0 Comments

Domo - Parsing JSON in a MySQL 5.6 Dataflow

6/5/2020

0 Comments

 
Need to parse a JSON string in a MySQL dataflow?  See the tutorial below.

Doesn't MySQL support JSON specific transforms?
Yes, however, Domo's MySQL 5.6 environment predates JSON parsing support which was introduced in MySQL 5.7 and expanded in MySQL8.0+.

Are there better ways to handle JSON parsing in Domo?
Domo's ETL and visualziation engine require data structured in a relational format (one value per field). Users can use custom connectors, Python scripting or MagicETL to parse large string blobs, which should scale better than parsing the same data in SQL transforms.

Can I do this as a stored procedure?
Yes, see the Domo KB Article. 

At scale, stored procedures can be tuned to outperform SELECT ... INTO table; however, Onyx Reporting recommends the table approach during the initial implementation because the code may be easier to parse and troubleshoot than Dynamic SQL.

Read More
0 Comments

#SpringCleaning.  15 Tips to manage your Data  Mart in Domo

4/5/2020

0 Comments

 
Picture

Time for spring cleaning in the Data Mart?

  • Is your Domo implementation an unruly, tangled mess of dataflows and datasets?
  • Are you struggling to balance new user requests, rationalize ETL while retaining high performance in your dashboards and dataflows?
  • Do you want to leverage Domo to support an increasing number of advanced analytics workflows without creating unnecessary dataflow overhead?

The Domo Governance and Data Stats tools can certainly help identify where the problem areas are, or which assets are going unused; however, the first step to rationalizing your Domo data mart is creating a best-practices model for where and how to name or transform datasets. At Onyx Reporting, our consultants align dataset naming conventions recommended in Domo's consulting methodology with conventional wisdom espoused in traditional BI strategy.

In this blog post we'll cover 15 tips for managing your RAW, INT, PROD and STACK dataflows.

Read More
0 Comments

Your 9-Point Data Pipeline Resiliency Check

20/4/2020

0 Comments

 
New data initatives and BI projects are a fickle thing.  You only get one shot at making a good first impression with end-users and senior stakeholders, and the last thing you want them saying is, "I don't trust these numbers."
  • IF your data pipeline is held together with 'scotch tape and bubble gum,'
  • OR you can't go on holiday because the world / your pipeline might go up in flames
  • OR you're having data trust issues in your user community
 
  • THEN it might be time to review your data pipeline and make sure it meets these 9 resiliency checkpoints.

If you have stake in project adoption but most of the checkpoints read like technical jargon, give me a call.  I'd be happy to sit with your developer team and co-review your data pipeline.

Read More
0 Comments

5+ Methods for Getting Data in and out of Domo

16/4/2020

0 Comments

 
Picture
An IT/IS stakeholder raised to me their concern that Domo, like many BI vendors, was a one-way street: a tool where you can easily push data in and not get data out.  The underlying narrative: both business and IT stakeholders see vendor-lock as a risk to minimize.  I assured the gentleman, in addition to extensive dashboard creation and distribution capabilities, Domo positions itself as a data distribution hub by providing several data extraction methods suited for a broad mix of users with different usage requirements. 


TLDR for Data Governance teams: My most successful enterprise clients position Domo at the end of a data lake or data warehouse pipeline then use built-in tools to secure, monitor and distribute data to different stakeholders.  Domo's distribution methods get data into the hands of data quality and governance teams, data scientists, and business analysts in the platforms of their preference, which of course ranges from Tableau and Qlik to Jupyter notebooks, Office 365 products, or any other SQL or data storage platform.  IT/IS accomplish this without increasing administrative overhead by applying security measures in Domo which trickles through all the data distribution options outlined below.

TLDR for Developers: Domo facilitates systems integration, business process automation, app development as well as data ingestion and distribution via SDKs and CLIs which leverage an openly accessible API framework, including Java, JavaScript, Python and R libraries.  In addition to storing data in Domo's cloud, it also supports a federated query model to leave data in source location.  Check out developer.domo.com.  I could go on for days.

Read More
0 Comments

Data Science + Domo: KMeans

18/8/2018

0 Comments

 
Disclaimer:  Let's get it out of the way.  I am employed by Domo; however, the views and ideas presented in this blog are my own and not necessarily the views of the company.

In this post I'm going to show you how to use Domo with it's built-in data science functions to perform a KMeans clustering using the Hotels dataset from Data Mining Techniques for Marketing, Sales, and Customer Relationship Management, 3rd Edition, by Gordon S. Linoff and Michael J. A. Berry (Wiley, 2011).  Then I'll show you how you can do the same thing using Domo's platform to host the data and R to transform and analyze the data.

Why are we doing this?
From the platform perspective, my goal is to showcase how Domo can support Business Analytics and Data Science workflows.

From the business perspective, an organization may want to cluster Hotels to facilitate creating 'hotel personas'.  These personas (ex. luxury business hotel versus weekend warrior favorite) may enable the creation of marketing campaigns or 'similar hotel' recommendation systems. 
  
​Disclaimer 2:  I do not own the Hotels dataset.

The Domo Platform Organizes Datasets

Step 1:  Upload the Hotels dataset to the Domo datacenter using the file connector. 

Duration:   < 2 minutes or 8-ish clicks.

Domo enables analytics workflows by assembling all the recordsets in one place.  The data can be sourced internally from a data warehouse, POS system, Excel spreadsheet or SSAS cube or come from external sources like  Facebook, Google Analytics, or Kaggle.
Picture
Once you've uploaded data to Domo you can apply Tags to keep track of your datasets or implement security to control who has access to your data (even down to the row-level).

Additional notes for InfoSec:
Given that the largest risk in data security is user behavior, finding solutions that are preferable to personal GitHub or Dropbox accounts, USB sticks or (God forbid) the desktop remains a priority.  One of Domo's most understated highlights is its ability to surface IT governed and cleansed datasets to analysts in a single place that's highly accessible yet fortified with bulletproof (Akamai) security.

Notes for architects and engineers:
Under the hood, Domo's platform takes the best of big data / data lake architecture (distributed, high availability, etc. ) and (good) datamart recordset management techniques.  Data workers familiar with Hadoop or Amazon web services will recognize Domo as a modular yet integrated big data stack wrapped in an easy to use GUI that business users will appreciate and get value out of from day one.

When you set up a trial of Domo you effectively have a free data lake that's ready for use -- complete with the ability to store literally millions of rows and/or gigabytes of data at rates you'd expect from Amazon S3 or Microsoft Azure storage at a fraction of the cost.  Try it.
Picture

Introducing DataFlows

To the horror of every data scientist out there, in this blog, I'll skip data preprocessing, profiling or exploration and go straight to using Magic ETL to apply KMeans clustering on a set of columns in my data.

Side Note
ETL stands for Extract, Transform and Load, and Domo provides a proprietary drag-and-drop interface (Magic ETL) for data transformation.  Technically the data has already been extracted from your source system and loaded into Domo, so all we're left with is transform phase.

Double Side Note for Clarity:  
Each DataSet is stored in Domo somewhere as a separate file.  
Each DataFlow has Input DataSets and creates Output DataSets - new file(s)

You have 3 types of dataflows native to the Domo platform.
  • Magic ETL (Domo's proprietary drag and drop tool)
  • MySQL and Redshift.  NOTE:  Redshift is a beta feature for transforming 5 million+ row datasets.  If you need the feature 'turned on' give me a holler.
  • Additionally, Domo supports data ingestion via API, ODBC, and various other SDKs.  In other words, you can transform data using virtually any platform before pushing it to Domo. -- In this blog, I'll do some data analysis using R and then push it to Domo from R.
Picture
Magic ETL has a range of user-friendly set of Data Science functions.  They are in Beta, so if you don't see them in Magic ETL, shoot me an email so we can get it enabled for you.

What is Clustering and why are we using it?
It is not my goal to make you a data scientist in a 5-minute blog post, but if you are interested, the Data Mining Techniques book I linked earlier may be a decent place to start.

Consider the following (completely fictitious) example:

"Jae, given your stay at the Threadneedles Hotel in London, you might like the Ace Hotel in Portland which has similar review rates by international guests, number of rooms and avg. room rates."  

In short, clustering allows you to group entities (hotels) by sets of attributes (in our case, number of rooms, number of domestic reviews, international reviews, whether the hotel is independent etc.).
Picture
How easy is Magic ETL? 

It's 1, 2, 3.  Drag transformation functions (1) into the workspace (2), then define the function parameters (3).

In this example, I used the previously uploaded hotels-wiley dataset as an input, then I applied a K-Means function over a subset of columns.

Note:  I explicitly defined the number of clusters to create (5).
If you don't know what KMeans does, Domo's documentation is on par with what you'd expect from any data science-lite platform.
Picture
Step 3:  Preview and Output the Dataset for visualization or further analysis

​
In the final steps, we'll add a SELECT function to choose which columns to keep.  In this case, we'll only keep columns that were used by the KMeans algorithm, as well as columns to identify the hotel (hotel_code).

Lastly, we add an Output Dataset function to create a new dataset in Domo which we can later visualize or use for further analysis.
Picture
 Beware the false prophets ...

THIS IS IMPORTANT. 
Do not fall into the trap of misinterpreting your results!

In the output dataset below, you'll see we have a new column cluster which happens to be next to a column bubble_rating.  The hasty analyst might conclude: "oh hey, there appears to be a correlation between cluster and hotel ratings." ​

​And all Data Scientists in the room #facepalm while #cheerForJobSecurity.
Picture
There are 5 clusters because in an earlier step, we told the algorithm to create 5 clusters.  We could have easily created 3 or 7.  There is not necessarily a correlation between which cluster a hotel ended up in and its rating.  Keep in mind, cluster number is just an arbitrary numbering.  If you re-run the function, ideally, you'll end up with the same groupings of hotels, but they could easily have different cluster numbers.  (the hotel_cluster_3 could become hotel_cluster_4 and hotel_cluster_4 could become hotel_cluster_1)

Side Note which will become important later:
It would be nice if Domo included metrics for measuring separation between clusters or the strength of clusters.  We arbitrarily told KMeans to create 5 clusters.  But who knows, maybe there are really only 3 clusters.  There are quantitative methods for identifying 'the right' number of clusters, but they aren't available in Domo out of the box.

Domo embraces all Analytic Tools

Let's do it again.  In R.
​

As demonstrated, Domo has user-friendly data science functionality suitable for the novice data scientist; however, in real-world applications, analysts will likely use Domo's data science functions to build up and validate proof of concepts before transitioning to a 'proper' data science development platform to deliver analytics.

Domo has both Python and R packages that facilitate the easy data manipulation in full-fledged data analytics environments.

To extract data from Domo into R:
  1. set up an access token & install the DomoR package
  2. find the dataset ID: https://<yourDomoInstance>.domo.com/datasources/0afe1c21-18f3-496a-acb5-599e07c9e33c/details/overview

The R Code:  

#install and load packages required to extract data from Domo
install.packages('devtools', dependencies = TRUE)
library(devtools)
install_github(repo='domoinc-r/DomoR')
library(DomoR)

#initialize connection to Domo Instance
domo_instance <- ' yourDomoInstance '
your_API_key <- ' yourAPIKey '
DomoR::init(domo_instance, your_API_key)

#extract data from Domo into R
datasetID <- 'yourDataSetID '
raw_df<- DomoR::fetch(datasetID)

## PROFIT!! ##
Picture
Picture
Earlier we asked the question, was 5 'the right' number of clusters. 

Given the variables from before (number of rooms, number of domestic reviews, etc.), once NULL values have been removed and the variables scaled and centerered, it appears that 6 clusters may have been a better choice).

Side bar: READ THE MANUAL!  Unless you read the detailed documentation, it's unclear how Domo's KMeans handles NULL values or whether any data pre-processing takes place.  This can have a significant impact on results.

​With the revelation that we should have used 6 KMeans clusters, we can either adjust our Magic ETL dataflow in Domo, or we can use R to create / replace a new dataset in Domo!

DomoR::create(  cluster_df, "R_hotel KMeans")
DomoR::replace_ds('datasetID', cluster_df)

In Domo's Magic ETL, we'll bind both cluster results to the original dataset for comparison.
  1. Add the R generated dataset (R_hotel_KMeans) to Magic ETL
  2. JOIN the datasets on the hotel_code
  3. SELECT columns to keep (remove any duplicate or unwanted columns)
  4. Revel in the glory!
Picture

Wrap it up

NEXT STEPS:  Create 'Cluster Personas.'  
Typically, organizations would use clusters of attributes to define 'hotel personas'. 
  1. 'Budget Business Hotel'
  2. 'Luxury Business Hotel'
  3. 'Backpackers Delight'
  4. 'Destination for Honeymooners'
  5. 'Weekend Treat'
  6. 'Long-Term Stay'
Because the attributes of 'Budget Business Hotel' and 'Backpacker's Delight' are similar, with 5 clusters, maybe they get combined into 'Budget Hotel'.  Alternatively, perhaps 'Weekend Treat' combines with 'Destination for Honeymooners' into a 'Luxury Short Stay' persona.

From there, the outcome of this clustering and persona generation may influence discount packages or marketing campaigns to appeal to specific travelers.

REMEMBER:  Clustering is not intended to predict Ratings!  If you're still stuck on that, review the purpose of clustering.  
Picture
In this quick article, I give the most cursory of overviews of how Domo's native (and beta) features can enable data science workflows.  We also explored how other analytic workflows can integrate into the Domo offering.

If you're interested in finding out more, I can connect you with a Domo Sales representative (not me), or I'd love to talk to you about your analytic workflows and use case to see if Domo's platform might be a good match for your organization!

Disclaimer: the views, thoughts, and opinions expressed in this post belong solely to the author and do not necessarily reflect the views of the author's employer, Domo Inc.
0 Comments
<<Previous
    Profile Picture Jae Wilson
    View my profile on LinkedIn

    Stay Informed.

    * indicates required

    RSS Feed

    Categories

    All
    Automation
    Basic Training Series
    Business Intelligence
    Connect
    Dashboard
    Data Pipeline
    Data Science
    Domo
    Excel Tricks
    Executive Training & Leadership
    Extract
    Jet Enterprise
    Jet Essentials
    New Release
    NP Function
    Onyx Reporting
    Planning
    Power Pivot
    Python
    Report Writing
    Statistics And Analytics
    TimeXtender
    Visualization

London, UK
jae@OnyxReporting.com
+44 747.426.1224
Jet Reports Certified Trainer Logo
  • Welcome
  • Services
  • Blog
  • YouTube Channel
  • Contact
  • Domo IDEA Exchange
    • Schedule
    • Call for Presenters