Building Trino data pipelines with SQL or Python

Description

Trino and Starburst are well known for their performance and scalability when running analytical queries, but they are equally valid platforms for executing data engineering transformation pipelines. This talk will explain how fault-tolerant execution mode adds robustness and reliability for long-running jobs that are prevalent in transformation pipelines. These pipelines can obviously be created with SQL code, but there are Python options as well. The PyStarburst and Ibis frameworks will show how data engineers can use familiar Dataframe APIs to implement their transformation logic. Finally, these options will be reviewed for their applicability to Trino and Starburst clusters.

Abstract

Trino and Starburst are well known for their performance and scalability when running analytical queries, but they are equally valid platforms for executing data engineering transformation pipelines. This talk will explain how fault-tolerant execution mode adds robustness and reliability for long-running jobs that are prevalent in transformation pipelines. These pipelines can obviously be created with SQL code, but there are Python options as well. The PyStarburst and Ibis frameworks will show how data engineers can use familiar Dataframe APIs to implement their transformation logic. Finally, these options will be reviewed for their applicability to Trino and Starburst clusters.

Key Takeaways

Understand how Trino and Starburst can be used for data engineering transformation pipelines
Learn about fault-tolerant execution mode for robust long-running jobs
Explore Python options with PyStarburst and Ibis frameworks
Discover how to use familiar Dataframe APIs for transformation logic
Evaluate the applicability of these options to Trino and Starburst clusters

Speaker

Lester Martin

Lead Developer Advocate, Starburst Data

Lester Martin leads the DevRel function at Starburst. He is a seasoned developer advocate, trainer, blogger, and data engineer focused on data pipelines & data lake analytics using Starburst, Trino, Iceberg, Hive, Spark, Flink, Kafka, NiFi, NoSQL databases, and, of course, classical RDBMSs. Check out Lester's blog at https://lestermartin.blog.

LinkedIn YouTube View Profile