Mohan Golla
4 min readFeb 21, 2024

Build an ETL application using the AWS Glue Data Catalog, Crawlers, Glue Spark ETL job and use Athena to view the data

In this project based in a real-world scenario, I acted as the Cloud DevOps engineer to create a streamlined ETL process using AWS Glue serverless data integration service components to change schema of the csv file uploaded to S3 bucket into parquet format and use Athena to view the table structure and data.

Below are few screenshots:

Created the AWS S3 bucket with required folders

Add the datasource to the bucket and create the IAM role

Create the Target database

Create the Glue Crawler

Run the Crawler

Table created

Table data can be viewed in Athena

Create the ETL job

Run the ETL job

ETL job ran successfully

Parquet file created in S3 output/customers folder

Create the output database to read the parquet file

Create the Crawler

Run the Crawler

Crawler ran successfully and 1 table created

Table can be queried in Athena

We can see how AWS Glue which is a serverless data integration service can be used to provides both visual and code-based interfaces to make data integration easier.

Mohan Golla
Mohan Golla

No responses yet