Build an ETL application using the AWS Glue Data Catalog, Crawlers, Glue Spark ETL job and use Athena to view the data
In this project based in a real-world scenario, I acted as the Cloud DevOps engineer to create a streamlined ETL process using AWS Glue serverless data integration service components to change schema of the csv file uploaded to S3 bucket into parquet format and use Athena to view the table structure and data.
Below are few screenshots:
Created the AWS S3 bucket with required folders
Add the datasource to the bucket and create the IAM role
Create the Target database
Create the Glue Crawler
Run the Crawler
Table created
Table data can be viewed in Athena
Create the ETL job
Run the ETL job
ETL job ran successfully
Parquet file created in S3 output/customers folder
Create the output database to read the parquet file
Create the Crawler
Run the Crawler
Crawler ran successfully and 1 table created
Table can be queried in Athena
We can see how AWS Glue which is a serverless data integration service can be used to provides both visual and code-based interfaces to make data integration easier.