Fivetran announced the Fivetran Airflow provider in 2021 — fast forward to today and thousands of Fivetran connectors are orchestrated together with other components of the modern data stack in Airflow. Fivetran’s collaboration with Astronomer is the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.
[CTA_MODULE]
Airflow architecture before deferrable operators
There are many components that make up an Airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.
New architecture and the async provider
Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.
Try out the Fivetran async provider today
For more information on Astronomer’s Airflow provider check out their blog post.