As an automated data movement platform, we frequently discuss how our ELT (Extract-Load-Transform) process is the modern approach to moving data in comparison to the legacy ETL process — but why? Beyond many of the reasons we’ve discussed previously, one of the most significant is when transformations occur in the process.
Transformations are an integral step in moving and centralizing data for actionable use, so let’s discuss what they are and how you can accelerate the speed of how you accomplish them.
What is “data transformation”?
“Data transformation” is the T in ELT. It’s the process of cleansing, modeling and manipulating raw data to prepare it for analysis. In its raw form, data is not prepared for reporting or generating insights — it requires transformation.
That transformation might include:
- Removing duplicate rows or entries
- Joining disparate tables to generate KPIs
- Removing PII from data analyst tables
- Normalizing data
- Creating calculations or business logic classifications
- And more!
While raw data is great, transformed data is powerful! Want to learn more about data transformations? Check out this blog.
[CTA_MODULE]
What are some use cases for transformed data?
Data transformations ultimately serve an end analytical goal. With transformations you turn raw data into data ready for analysis by:
- Joining marketing data with financial data to understand Customer Acquisition Cost (CAC)
- Normalizing time zones across your global datasets
- Preparing financial data into income statements and balance sheets for end of year reporting
- Creating lead scores out of CRM data to identify highest propensity to buy opportunities (and feeding it back into said CRM platform using reverse ETL)
- Extracting business logic from naming conventions to enrich data
- Modeling CRM data to capture custom fields and hierarchies
- Preparing data for reverse ETL or AI/ML workflows
These are just a few use cases! Anything that makes data "analytics-ready" counts, as transformations prepare your data for use and action.
But, too often this process is manual and time intensive.
How do analysts and engineers typically transform data?
Most transformations are performed via SQL, the standard language all analysts use to query and manipulate data. But, from there, the process differs.
Many analysts manually execute SQL by running ad hoc queries directly in the warehouse, or wrapping SQL into a stored procedure to run on a cron schedule. These options can prove to be inefficient and costly. It also becomes hard to manage or identify transformations if they are performed in siloes.
These data models must also be run. In an attempt to “automate” this process, some analysts use orchestration tools to run transformations. This means a data team is managing several, disparate tools just to perform ELT.
These tools often use what are called “crons” (short for chronological) - which are essentially timers to run SQL statements, which run their transformations based off of a schedule. But, crons are not natively synced to when data loads into the warehouse — meaning transformations running on a timed schedule can miss new data coming in until the next scheduled run.
They also run regardless of data load, incurring errant computational costs in the data warehouse. At a high level, teams using this cron method may have breakages between their freshest data and reporting which leads to data latency issues. These data latency issues include timeout issues in data warehouses, additional costs and poorly transformed data.
What’s the modern approach to data transformations?
In order to optimize their workflows and impart data engineering best practices like version control, CI/CD and testing, data analysts and analytics engineers have adopted dbt™.
dbt is an open-source, SQL-based tool from dbt Labs™ that simplifies data transformation and stands out for its ability to expedite the process of transforming data and building data pipelines. Users model their data using SQL SELECT statements, create relationships and dependencies between models, and then materialize those models as tables and views in a data warehouse. From there, it’s easy to turn the models into business intelligence.
In order to utilize dbt’s software, customers have two choices. They either self-host the open-source version - dbt Core™* - on their computer and manage their own Git repository. Or, they can buy dbt Cloud™, which provides a web-based integrated development environment (IDE) to help teams develop dbt projects and a scheduler. Some dbt Cloud features are free, while other features, for collaboration and enterprise use, have a cost to use them.
[CTA_MODULE]
Fivetran Transformations - Overview
What is Fivetran Transformations?
These difficulties and inefficiencies for data transformations are why we’ve developed a better solution: Fivetran Transformations for dbt Core!
With our Transformations offering, data analysts can automate the transformation process and orchestrate it in harmony with their data load. This allows teams to manage the entire ELT process from within Fivetran, centralizing the ELT process in one platform with increased visibility and governance.
Let’s go through a few different examples of how Fivetran Transformations can help you accelerate your time to value and tackle end-to-end ELT.
All clicks, no code transformations for SaaS connectors
Fivetran’s fully-managed connector for Salesforce helps easily centralize your opportunities, accounts and leads data into your destination of choice. But, your sales and finance teams want more than just access to raw data — they want analysis on their recurring revenue and where teams should focus to close new business bookings.
That requires the data analyst team to take the raw Salesforce data, join it with opportunities including owner and sales data and calculate KPIs like ARR and daily activity.
With Fivetran’s Quickstart data models, all of this foundational work is done for you with all clicks: no manual code, dbt project or 3rd party orchestration tools required.
With a few clicks you can:
- Download a data model that transforms your raw data into the analytics-ready tables you need
- Orchestrate your data model runs in synchronization with your connector loads using our “fully integrated scheduling”, reducing computational costs and data latency
- Manage the output model and all upstream dependencies — which are automatically detected — right from the Fivetran platform
You can read more about Quickstart data models here and find all of the SaaS sources we support here.
Data analysts can use the analytics-ready tables to create territory reporting for AEs, feed data back into Salesforce using their preferred reverse ETL tool or generate the topline analysis executives need to monitor the health of their business.
Regardless of your SaaS data source or your use case, Quickstart data models are here to help.
[CTA_MODULE]
Automated dbt model orchestration and management
Not all data sources are as easy to model. Some data sources hold proprietary data that doesn’t fit as easily into a predefined schema. Let’s take, for example, product categorization data that an ecommerce team might enter into a Google Sheet. Or maybe, sales data that is stored within your proprietary database—like a MongoDB.
This product data is vital to understanding the performance of your business. But, when analyzed alone, neglects opportunities for optimization that maximize your sales.
For example, your marketing team might want to analyze which channels influence sales of a particular product category.
In order to do this analysis you must:
- Join your marketing data across multiple channels - Facebook, Twitter, Google Ads - into one model
- Join marketing data with sales data - held in MongoDB
- Classify products into categories - manually input into Google Sheets - to group specific reporting and optimization
If your team has adopted dbt, these transformations are built out in their dbt project and orchestrated using a 3rd party tool or independent schedulers.
In these cases where the team already has a dbt project in a managed Git repository, our dbt Transformations helps you find efficiencies. It’s as easy as integrating your Git repository into Fivetran and then you can:
- Manage output models from the UI. We will automatically detect all upstream dependencies (tables and connectors) reducing your operational workload.
- Orchestrate the model runs in synchronization with connector load.
This improves your overall data pipeline. You are reducing tech stack complexity, as you use one tool - Fivetran - for both data movement and orchestration. You also reduce data latency, by running models immediately upon load of new data. And, since those models only run when there is new data, you reduce computational costs in your destination, as you aren’t running unnecessary models.
You can read more about our dbt Core integration here.
[CTA_MODULE]
Data models orchestrated on your schedule
Across Quickstart and dbt Core Transformations, you have the ability to choose the orchestration schedule that suits your business needs. Our innovative “fully integrated scheduling” provides the highest levels of automation and efficiency.
With fully integrated scheduling we automatically orchestrate the data models to run in your destination upon successful load of new data in your destination. This has several benefits, you:
- Reduce data latency, as data is transformed immediately upon load
- Reduce computational costs associated with transformations in the destination, as you only run models when new data is loaded
- Reduce tech stack complexity, as your end-to-end ELT is now automated from one platform
You can read more about our scheduling options here.
Open-source, pre-built data models
We also make custom data modeling easy. Fivetran creates and maintains a robust library of open-source dbt data models that all data teams can utilize. Built to complement our connector entity relationship diagrams (ERDs), they help you turn raw data into the tables you need for analysis.
15 of these data models are available via Quickstart Transformations, with more to come in the future. In the meantime, you can download and customize the data models of your choice in your own dbt project.
This includes our ad reporting data model, that helps you compare ad spend performance across multiple advertising platforms by joining data across multiple sources and many tables. You can read more about the ad reporting data model here.
Monitor your end-to-end ELT pipeline
Regardless of how you choose to transform your data in Fivetran, we provide you with visibility into the health and status of your end-to-end pipelines. If something in the pipeline goes awry, our logs help you pinpoint exactly what needs fixing. Our alerts and notifications ensure that you are informed the moment something happens, keeping your pipelines always-on.
Accelerate your time to insight
ELT is the modern approach to data movement, giving organizations the real-time, actionable stream of data they require to make data-driven decisions. That’s only possible due to the critical step of data transformation that evolves raw data into usable, normalized data ready for analysis.
No matter your level of customization or data model complexity, Fivetran has a data transformation solution for you, with ease and automation built-in.
Ready to automate your entire ELT pipeline and accelerate your time to insight? Get started with a 14 day free trial of Fivetran. Or, if you’re already a Fivetran customer, set up your first Transformation.
*dbt Core is a trademark of dbt Labs, Inc. All rights therein are reserved to dbt Labs, Inc. Fivetran Transformations is not a product or service of or endorsed by dbt Labs, Inc.