It’s no overstatement to say that nearly every company in the world is in some stage of digital transformation. Many are still forming their data strategies, searching for the best way to build reliable data pipelines and figuring out how to leverage scores of new tools to generate value from whatever amounts of data they possess.
This undertaking can prove daunting for even the biggest enterprises. But what if you're a smaller firm or a bootstrapped startup, with limited money, minutes and resources? Maybe the entire data team is a single person — a team of one.
If so, making a case for building data pipelines from scratch and maintaining them in-house is incredibly difficult.
Make the most of lean resources
For starters, creating a data pipeline is a notoriously labor-intensive process that can take months. The lone analyst or engineer will likely find he or she must devote large blocks of time to obtaining access to data sources, designing schemas or data models and building connector frameworks.
Building data pipelines in-house bogs companies down with development work as their businesses grow. Each time a company adds a new information source, they must then create another connector, and that means more coding.
Maintenance demands are also a lot of work for companies that build their own connectors. API and schema changes must be tracked and then manually updated. Remember, organizations today are constantly adding data sources, and this increases the complexity. When data pipelines break, it’s up to the one-person data team to take scarce time out and fix them.
This was the case for Billie, a Berlin-based fintech startup. Igore Chtivelband, co-founder and VP of data, describes its DIY approach as time-consuming and difficult to scale. "Replicating 140 tables is a completely different story from trying to do 20,” Chtivelband said. “Your schema is constantly changing; it's harder to maintain and you start to have performance problems."
Companies using an off-the-shelf solution can add new data sources with a few keystrokes and as a result, they’re more nimble and scaling is simple. For Billie, leveraging an automated, out-of-the-box solution enables its data teams to do what was once only possible with a team of eight engineers now possible with just one person.
Collecting data is relatively easy. What separates the best data programs is the ability to integrate information across an entire business to inform day-to-day decisions.
Automation in action
Raider Express, a Texas-based trucking company, monitors a multitude of metrics in near real time, including fuel efficiency, traffic, refrigeration temperature and tire pressures. The company relies on a fully managed solution like Fivetran to sync data from SQL Server to Snowflake and oversees 50,000 events daily. As a result, Raider’s operations team is always prepared to act on any issue.
“There’s zero upkeep in Fivetran,” said Dan Eggleton, chief financial officer at Raider Express. “We add a new table in our database, and it automatically populates the data into Snowflake. There’s no thinking about it on our end. I can’t imagine not having that functionally. Before Fivetran, there was a lot we couldn’t do.”
Final thoughts
It’s important to remember that data integration is a bear to build and manage. It demands a lot of coding and expertise.
For resource-strapped companies, devoting the staff’s only data analyst or engineer to building or repairing busted pipelines means the employee spends valuable time doing less of what matters most: refining algorithms, analyzing data and generating the kind of insights that help companies find product-market fit, secure funding and ultimately grow.