← Back to Sessions

Building Trino data pipelines with SQL or Python

30 minIntermediateSydneyMelbourne
TrinoStarburstData EngineeringPythonSQLData Pipelines

Description

Trino and Starburst are well known for their performance and scalability when running analytical queries, but they are equally valid platforms for executing data engineering transformation pipelines. This talk will explain how fault-tolerant execution mode adds robustness and reliability for long-running jobs that are prevalent in transformation pipelines. These pipelines can obviously be created with SQL code, but there are Python options as well. The PyStarburst and Ibis frameworks will show how data engineers can use familiar Dataframe APIs to implement their transformation logic. Finally, these options will be reviewed for their applicability to Trino and Starburst clusters.

Abstract

Trino and Starburst are well known for their performance and scalability when running analytical queries, but they are equally valid platforms for executing data engineering transformation pipelines. This talk will explain how fault-tolerant execution mode adds robustness and reliability for long-running jobs that are prevalent in transformation pipelines. These pipelines can obviously be created with SQL code, but there are Python options as well. The PyStarburst and Ibis frameworks will show how data engineers can use familiar Dataframe APIs to implement their transformation logic. Finally, these options will be reviewed for their applicability to Trino and Starburst clusters.

Key Takeaways

  • Understand how Trino and Starburst can be used for data engineering transformation pipelines
  • Learn about fault-tolerant execution mode for robust long-running jobs
  • Explore Python options with PyStarburst and Ibis frameworks
  • Discover how to use familiar Dataframe APIs for transformation logic
  • Evaluate the applicability of these options to Trino and Starburst clusters

Speaker

Lester Martin

Lester Martin

Lead Developer Advocate, Starburst Data

Lester Martin leads the DevRel function at Starburst. He is a seasoned developer advocate, trainer, blogger, and data engineer focused on data pipelines & data lake analytics using Starburst, Trino, Iceberg, Hive, Spark, Flink, Kafka, NiFi, NoSQL databases, and, of course, classical RDBMSs. Check out Lester's blog at https://lestermartin.blog.

LinkedIn Tracking