Graph Data Analysis with Serverless Spark and iGraph

1. Overview

In this use case, we will use a grpah dataset with Serverless Spark on R to build a network of users based on the characters present in the GOT books.

2. Services Used

Google Cloud Dataproc
Google Cloud Storage
Google Artifact Registry

3. Permissions / IAM Roles required to run the lab

Following permissions / roles are required to execute the serverless batch

Viewer
Dataproc Editor
Service Account User
Storage Admin

4. Checklist

To perform the lab, below are the list of activities to perform.

1. GCP Prerequisites
2. Spark History Server Setup
3. Uploading scripts and datasets to GCP
4. Creating a custom container image

Note down the values for below variables to get started with the lab:

PROJECT_ID                                         #Current GCP project where we are building our use case
REGION                                             #GCP region where all our resources will be created
SUBNET                                             #subnet which has private google access enabled
BUCKET_CODE                                        #GCP bucket where our code, data and model files will be stored
HISTORY_SERVER_NAME                                #Name of the history server which will store our application logs
UMSA                                               #User managed service account required for the PySpark job executions
SERVICE_ACCOUNT=$UMSA@$PROJECT_ID.iam.gserviceaccount.com
NAME=<your_name_here>                              #Your Unique Identifier

5. Lab Modules

The lab consists of the following modules.

Understand the Data
Solution Architecture
Using the graph dataset to build a network
Explore the output

There are 4 ways of perforing the lab.

Using Google Cloud Shell
Using GCP console

Please chose one of the methods to execute the lab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Graph Data Analysis with Serverless Spark and iGraph

1. Overview

2. Services Used

3. Permissions / IAM Roles required to run the lab

4. Checklist

5. Lab Modules

Files

README.md

Latest commit

History

README.md

File metadata and controls

Graph Data Analysis with Serverless Spark and iGraph

1. Overview

2. Services Used

3. Permissions / IAM Roles required to run the lab

4. Checklist

5. Lab Modules