Mohan Golla
3 min readApr 2, 2024

Use AWS EMR Serverless service that allows users to run Spark and Hive applications on demand.

In this project based in a real-world scenario, I acted as the Cloud Specialist to use AWS EMR serverless service to run jobs by which we no longer have to manage the underlying infrastructure that comes with EMR.

Below are few screenshots:

Create a custom trust policy for role1-notebook role

Add permissions to the policy

Create a custom trust policy for role2-serverless execution role

Add permissions.

Create the S3 bucket with folders

Add S3 bucket name in hive statement sql

Upload files to scripts folder

Upload the orders csv

Create EMR Studio

Create Spark serverless application

Create Submit job

Submit Spark job

Create Hive serverless application

Submit Hive job

emrdb database is created after the jobs ran successfully

View schema of the table created in AWS Glue

View data in the table using Athena

Mohan Golla
Mohan Golla

No responses yet