We live in a world that’s generating and collecting seemingly infinite amounts of data. That same high-volume of data, in its raw form, requires transformation — or cleansing, modeling and manipulating to prepare it to inform insights.
Most transformations are performed via SQL, the standard language all analysts use to query and manipulate data — but the method and tools utilized are imperative.
While stored procedures have provided analysts and engineers with an ability to create and re-use SQL queries, they were never intended to act as the foundation for today’s data transformation needs.
What is a stored procedure?
Stored procedures consist of one or many SQL statements that are used repeatedly. These statements are then stored on your database so that you can “call” or “execute” them without re-writing the underlying logic each and every time.
Stored procedures provide great time-saving benefits as you reuse the same, consistent logic to achieve your procedural needs. Additionally, as they are compiled and stored on the database they are often very performant over one-off queries.
In the eyes of a database administrator (DBA), this is perfectly sufficient for accomplishing tasks like restricting access to sensitive data or increasing database performance. However, there are inefficiencies to using stored procedures in your analytics data pipelines — especially as your team and organization scale.
Where stored procedures fail
The increase in demand for clear, concise and consistent data has put some serious stress on the functionality of stored procedures. While they might suffice to handle the volume of data, they were never designed for the needs of modern data analytics teams. In fact, relying on stored procedures can put companies in serious risks of vast technical debt.
Documentation and visibility
Stored procedures lack built-in functionality for documentation. Any addition or modification is untracked and applied immediately without accompanying documentation on what was changed or why.
If your company is lucky, you might find some comments in the code itself explaining the code. If your company is really lucky, you might find a timestamp alongside those comments. Some companies even try to keep tabs on their stored procedures by saving a copy in a shared drive — which is unsurprisingly difficult to maintain.
But, typically, the majority of knowledge of the stored procedure is held by the author of the query. If that person leaves the company, which is occurring at record high rates recently, they take that knowledge with them. This leaves pipelines vulnerable to breakage if there’s not an official knowledge transfer to update and maintain the logic.
Logging and debugging
Similar to their lack of documentation, stored procedures also lack a native logging functionality. Stored procedures - when called or executed - simply respond with whatever was written in the underlying code, meaning that debugging is as easy or hard as the author made it.
As the complexity of a stored procedure increases, so does the debugging. Thus, the author potentially evolves into a major liability if business critical reports are built on stored procedures that only one or few people built or understand.
Flexibility and scalability
Stored procedures are best used for things that rarely or never change. Have you heard the infamous quote, “that’s the way we have always done it”? This is often the mantra adopted by companies reliant on legacy transformation infrastructure.
It’s inevitable that data and reporting will change over time as businesses evolve and scale. Stored procedures slow growth, especially for fast-moving businesses that need flexible transformations and reporting. Why let your data transformation process limit your data strategy?
There are many data-driven companies who rely on stored procedures to power their critical business decisions. And there are just as many that want to modernize their transformations technology and move off of their reliance on stored procedures. The barrier to this modernization though is often resourcing and timing.
While these are legitimate concerns, there are straightforward ways to chip away at the issue at hand.
Better data transformations with dbt™ and Fivetran
The answer to this problem is actually simple - to address it head on. There is no better time than the present to slash technical debt, especially when it comes to modernizing your transformation pipelines.
Likely the best in-class tool for addressing analytical tech debt is dbt™ by dbt Labs. They’ve built an amazing product that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD and documentation.
A dbt project provides full documentation, like descriptions of your model including the business use case and value, plus markdown files for more in-depth documentation. These utilities help populate a user-friendly documentation site your company can use to better understand your transformation holistically.
Once your project is written and documented, logging and debugging become easier as well. Instead of sifting through hundreds of lines of code, a dbt project with digestible, modular steps provides a straightforward approach to debugging as the steps are logged independently.
But dbt doesn’t operate alone, and is most impactful when paired with modern data movement technology. That’s why we built our native integration with dbt: Fivetran Transformations for dbt Core™*. This integration allows companies to fully automate and scale their ELT process while leveraging best practices along the way.
You can synchronize your dbt transformations in unison with your data load in your destination, reducing data latency and computational costs — while taking advantage of all of dbt’s features. In addition, we provide a full library of pre-built data models that are fully documented and solve your most common data use cases
All of these advantages further decrease your reliance on stored procedures, reduce your tech debt — and ultimately modernize your data processes.
If you still need to use a stored procedure, Fivetran can help
While dbt was not designed to call or execute stored procedures, it’s still possible to do so through dbt hooks or macros. Hooks can be used to execute raw SQL commands in the warehouse (pre- or post-model execution) which includes calling stored procedures (e.g. +post-hook: "call cool_stored_procedure();").
You can then utilize Fivetran Transformations for dbt Core’s Integrated Scheduling feature to kick off your stored procedures directly after relevant connectors have finished syncing. The combination of these features allow a seamless and simple approach to migrating from legacy procedures into a modern transformations architecture.
The best time to modernize is now
Fivetran and dbt offer you the best solution possible to reduce vulnerabilities in your pipeline.
If you need support to get started, we even offer professional services for integrating dbt Core and Fivetran to completely migrate off stored procedures and into a dbt project where your transformations are documented and productionalized.
[CTA_MODULE]