New data initatives and BI projects are a fickle thing. You only get one shot at making a good first impression with end-users and senior stakeholders, and the last thing you want them saying is, "I don't trust these numbers."
If you have stake in project adoption but most of the checkpoints read like technical jargon, give me a call. I'd be happy to sit with your developer team and co-review your data pipeline.
Just throw more technology at it
I'm going to say something that makes me unpopular with IT/IS vendors. Be wary of throwing more hardware at process and pipeline problems. Your IT vendor sees green when they hear, "yeah, we need more RAM" or "we've pushed the system to the limits, we need more horsepower." But oftentimes, 'more compute' comes at a steep cost with diminishing returns and you're just delaying pipeline failure.
In my experience, refactoring troubled pipelines to adhere to these resiliency checkpoints provides more sustainable performance and growth.
9 resiliency checkpoints to make your pipeline more efficient and sustainable
** full disclosure, in addition to summarising a decade of experience in BI development, some of these tips come from partnering with strategic architects and project managers at Domo.
Key Focus areas for maturing Domo clients
Realistically, during development, initial implementation and early adoption; strict adherence to these guidelines are probably overkill and even detrimental to agile iteration. That said, as projects move into production, adoption and data volumes increases forcing the refactoring of pipelines to sustain performance and scalability.
Minimize the use of recursive dataflows.
There is a time and place for recursive ETL patterns, but this pattern will not scale with data volume. Instead, consider UPSERT or Partition schemes when reprocessing large datasets.
Minimize the use of APPEND with datasets if you're processing the data in Domo ETL.
This restates the problem with recursive dataflows. For optimum performance, avoid transforming data that doesn't require re-transforming.
Be smart about multi-stage Transformation.
In Domo, passing data between the storage layer and the ETL engine frequently consumes the most time in your data pipeline. Design workflows that minimize unnecessary data transfer and maximize the availability of re-usable datasets in the data mart.
Consider the following example. In a campaign analysis, it may be necessary to consider data at the granularity of one row per campaign versus one row for each day the campaign was active. Consider:
In other words.
Use Fusions. For goodness sakes, use Fusions!
Self-Service BI requires a balance between Flexibility and Performance
Any data engineer will tell you that performance comes at the expense of flexibility (or increased price). It sounds bad, and every IT/IS vendor will minimize that fact, but that's an inescapable truth with self-service models.
Domo's off-the-shelf performance and ease-of-use empowers citizen developers and business analysts to design and deliver their own small-scale data projects. As these projects mature, the demand for flexibilty goes down. Schemas will stabilize as end-user engagement drives final iteration requests. From there, citizen developers can comfortably hand their mature data pipelines over to data engineers who can then refactor pipelines for performance.
-- END --
My name is Jae Wilson, I partner with clients to design, implement, monitor and refactor data pipelines and ETL dataflows built on the Domo stack.