It is no secret that data offers a ridiculously important competitive advantage to those who know how to turn it into straightforward information. However, many companies do not make the best out of the available data. Transforming raw data into actionable insights is the key to improved decision-making.
A challenge for growing organizations
As organizations grow, they store and produce more and more data. Based on immediate needs, each system or business unit opts for a database option. But it soon turns out that these many databases use different technologies, different data organization and/or even different data formats. These heterogeneous databases turn data consolidation into a nightmare.
This is the common problem our client was facing when he turned to us. His accounting system was not unified. It involved its invoicing system, its marketing data and its consignment management software. All platforms – Filemaker, MySQL, AWS – were different and they were cohabitating on different environments, databases, vpns, etc.
This interfered with the readability of the data and resulted in the absence of a unified accounting source of trust. Long story short, an ETL problem.
The limits of scripts based synchronization
To deal with that problem, our client was running scripts based synchronizations coded in Python and Bash. It became an impediment when trying to validate processes because the complexity of the data sources made the error management a chaotic procedure. Automatically notifying managers about error occurrences, and notifying developers the cause of the mentioned errors could be described as a pain in the neck.
Debugging accounting reconciliation differences with multiple scripts in different languages was another big issue. Teams could spend hours trying to guess where the problem was.
The problems were not only on the development side but also impacted production environments. The lack of good logging practices made the team blind over the quality of the reported information and could not accurately decide if the report passed the tests before using it.
Robust modeling with Pentaho Data Integration
Changing approach was unavoidable, and so was choosing the right tool to tackle the pain and needs of this company.
- Complex data integration from the different source systems
- Lack of error management strategy
- Providing actionable accounting insights
When it comes to challenging these complex situations, Pentaho Data Integration (aka Kettle), the ETL tool part of the Pentaho suite, is a robust tool that should not be underestimated.
And here is why we decided to go for Pentaho.
- It is a powerful tool to integrate multiple data sources and processes (such as sending emails, downloading documents, scheduling actions…) into a smoothly functioning alliance
- It spares manual logging of information, thereby reducing error probabilities
- It offers robust error handling possibilities through process testing
Pentaho data integration prepares and blends data to create a complete picture of your business that drives actionable insights.”
- We modeled the problem “synchronising data sources”.
- We adapted the different sources, removed useless information and aggregated when needed.
- We tested the intermediate information. If it automatically passed the tests, a report was created and a notification was sent by email to confirm that the process had been successfully completed.
It is when things go wrong that the power of Pentaho best stands out. It proved to be remarkable to identify why a process is not completed when dealing with flows, schedules, testing and notifications. Now if an execution fails, we have a very detailed log to track the problem…and fix it!
Going further: an incredible Business Intelligence potential
And that is not it. Pentaho has an enormous potential that deserves to be exploited to not only load the information but deeply analyze it and let it do the talking.
We could create a Business Intelligence cube by creating dimensions. Or create useful dynamic reports using Pentaho Business Analytics. Thanks to data visualization tools, analysis could become almost instantaneous. And we could also integrate PDI with Hadoop to take advantage of tons of information that are currently squandered.
It would be great to hear about your experiences with Pentaho or other ETL tools. Comment section is all yours!