Use AWS EMR Serverless service that allows users to run Spark and Hive applications on demand.
In this project based in a real-world scenario, I acted as the Cloud Specialist to use AWS EMR serverless service to run jobs by which we no longer have to manage the underlying infrastructure that comes with EMR.
Below are few screenshots:
Create a custom trust policy for role1-notebook role
Add permissions to the policy
Create a custom trust policy for role2-serverless execution role
Add permissions.
Create the S3 bucket with folders
Add S3 bucket name in hive statement sql
Upload files to scripts folder
Upload the orders csv
Create EMR Studio
Create Spark serverless application
Create Submit job
Submit Spark job
Create Hive serverless application
Submit Hive job
emrdb database is created after the jobs ran successfully
View schema of the table created in AWS Glue
View data in the table using Athena