Machine Learning Models
User Clustering
Loyalty programs often categorize users into discrete segments based on the user’s purchase behavior and site engagement, such as Platinum / Gold / Silver / Bronze tiers.
In this example, we showcase how a ML driven clustering model can be applied to achieve user clustering. K-means clustering K-means is an unsupervised learning technique identifying customer segments, so model training does not require labels nor split data for training or evaluation.
Step 1 : Cluster Users based on following attributes
The following attributes can reliably predict user behavior, and can be used to cluster users into various Loyalty levels:
Session count Total time spent Average order value No of orders
Step 1 - Preprocess to training data**
WITH ViewStats AS (
SELECT user_pseudo_id, COUNT( DISTINCT(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_count, SUM( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec' )) AS total_time_spend_by_user_in_msec
FROM
-- Replace table name.
ecommerce_public.events_flat_data
GROUP BY 1 ORDER bY total_time_spend_by_user_in_msec desc ) ,
OrderStats as (
SELECT
user_pseudo_id,
avg(items.price) as average_order_value,
count(*) as no_of_orders,
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE
event_name = 'purchase'
GROUP BY
1 ORDER bY no_of_orders desc
) , #Select * from OrderStats
UserStats as ( select v.*, o.average_order_value,o.no_of_orders from ViewStats v FULL OUTER JOIN OrderStats o ON v.user_pseudo_id = o.user_pseudo_id order by o.no_of_orders desc )
SELECT * EXCEPT(user_pseudo_id) FROM UserStats
Step 2: Create a k-means model
CREATE OR REPLACE MODEL ecommerce_public.user_clusters OPTIONS(model_type='kmeans',kmeans_init_method = 'KMEANS++', NUM_CLUSTERS=4 ) AS WITH ViewStats AS (
SELECT user_pseudo_id, COUNT( DISTINCT(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_count, SUM( (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec' )) AS total_time_spend_by_user_in_msec
FROM
-- Replace table name.
ecommerce_public.events_flat_data
GROUP BY 1 ORDER bY total_time_spend_by_user_in_msec desc ) ,
OrderStats as (
SELECT
user_pseudo_id,
avg(items.price) as average_order_value,
count(*) as no_of_orders,
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE
event_name = 'purchase'
GROUP BY
1 ORDER bY no_of_orders desc
) , #Select * from OrderStats
UserStats as ( select v.*, o.average_order_value,o.no_of_orders from ViewStats v FULL OUTER JOIN OrderStats o ON v.user_pseudo_id = o.user_pseudo_id order by o.no_of_orders desc )
SELECT * EXCEPT(user_pseudo_id) FROM UserStats
** Step 3: Use the ML.PREDICT function to predict a user’s cluster ** Query
WITH ViewStats AS (
SELECT
user_pseudo_id,
COUNT(
DISTINCT(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))
AS session_count,
SUM(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec'
)) AS total_time_spend_by_user_in_msec
#ecommerce.getAverageOrderValue((userId)) as average_order_value,
#ecommerce.getNoOfOrders((userId)) as no_of_orders,
FROM
-- Replace table name.
ecommerce_public.events_flat_data
GROUP BY 1 ORDER bY total_time_spend_by_user_in_msec desc ) ,
OrderStats as (
SELECT
user_pseudo_id,
avg(items.price) as average_order_value,
count(*) as no_of_orders,
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE
event_name = 'purchase'
GROUP BY
1 ORDER bY no_of_orders desc
) , #Select * from OrderStats
UserStats as ( select v.*, o.average_order_value,o.no_of_orders from ViewStats v FULL OUTER JOIN OrderStats o ON v.user_pseudo_id = o.user_pseudo_id order by o.no_of_orders desc )
SELECT
- EXCEPT(nearest_centroids_distance) FROM ML.PREDICT( MODEL ecommerce_public.user_clusters, ( SELECT * FROM UserStats WHERE user_pseudo_id = '5053424.7664335544' ))
Output
Step 4: Use your model to make data-driven decisions
Inference
Cluster 1 has more orders, session count and total time engaged making them a premium customer. We can offer discount offers to encourage them. Cluster 2 has the highest average order value, but very less no of orders. These are potential customers who tend to spend more and are very selective in the purchase. We can recommend highly rated products so that they can buy more thus increasing their order count. Also they have less engagement time , we can send some advertising messages to increase engagement and thereby increase the number of orders. Cluster 3 has more orders than 2 but very less session count. Here again we can send messages to increase user engagement. Cluster 4 has very less session count as well as low no of orders. This segment can be characterized as our bronze segment.
Product Recommendation
A common use case for eCommerce is product recommendations. Customers who are presented with contextual personalized products tend to have higher average order values. Most retailers showcase “Customers who bought this also bought these” during the purchase process, which helps the user to make quick purchase decisions based on recommendations. For contextual personalization, it is important that personalization also takes into account the context – which products has the customer been viewing recently. Other promotions, such as “10% Off Product X” can be targeted to the customer based on the interest shown in a particular category / product.
Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users. This example uses matrix factorization for creating a recommendation engine.
Matrix factorization
Matrix factorization (collaborative filtering) is one of the most common and effective methods of creating recommendation systems.
There are two types of matrix factorization based on the kind of user feedback:
-
Explicit With explicit feedback, the dataset must indicate a user’s preference for a product, like a rating between 1 and 5 stars.
-
Implicit Implicit preferences for the user need to be determined based on their purchase behavior.
Pre-process the data to training Data
With matrix factorization, in order to train the model, you will need a table with userId, itemId, and the products viewed based on the click-stream events.
** Recommendations
Product affinity - Based on product bought together historically ( Implicit feedback)
Step 1 - Preprocess to training data ** Query:
WITH UniqueUsersList As ( ## get all unique users
select distinct(user_pseudo_id) from ecommerce_public.events_flat_data
),
UserPurchaseEvents AS ( ## get all purchase orders
SELECT
user_pseudo_id,
items.item_name
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE event_name = 'purchase' group by 1,2
)
,
ProductBoughtTogetherHistorically as ( ## get all items which are bought together
select user_pseudo_id as prevUserId, STRING_AGG(items.item_name, ', ') AS productsBoughtTogether from ecommerce_public.events_flat_data
, UNNEST(items) AS items WHERE event_name = 'purchase' and ecommerce.unique_items > 1 group by 1
)
,
#select * from ProductBoughtTogetherHistorically
UserProductRecommendations AS ( ## for all items purchased by user in an order prepare the list of other items bought together
select user_pseudo_id,cust_prev_purchase,split(productsBoughtTogether,',') AS product from
(
select b.,a. from
ProductBoughtTogetherHistorically a,
(select user_pseudo_id,item_name AS cust_prev_purchase from UserPurchaseEvents where user_pseudo_id in (select user_pseudo_id from UniqueUsersList ) ) b where a.productsBoughtTogether like CONCAT('%', CAST(b.cust_prev_purchase AS STRING),'%' ) ## Compare each itemn in an order with orders bought historically and prepare a list of items bought together and user_pseudo_id != prevUSerId ## remove if the product in an order is same as the compared order historically ) ) , #select * from UserProductRecommendations UserProductMatrix as (select UserProductRecommendations.user_pseudo_id, #cust_prev_purchase, product_recommended, ecommerce_public.getratingbasedonProduct(TRIM(product_recommended)) as rating from UserProductRecommendations CROSS JOIN UNNEST(UserProductRecommendations.product) AS product_recommended where TRIM(product_recommended) <> TRIM(UserProductRecommendations.cust_prev_purchase) ### , remove if the previous order is the same as the current order group by user_pseudo_id, product_recommended order by user_pseudo_id) select * from UserProductMatrix l where not exists ( select 1 from UserPurchaseEvents where user_pseudo_id = l.user_pseudo_id and Trim(item_name) = Trim(l.product_recommended)## check if an item in the recommendation list is already bought by the user, if so remove it ) order by user_pseudo_id
#AS ((
#select sum(cast(items.quantity as FLOAT64)) from ecommerce_public.events_flat_data
, UNNEST(items) AS items WHERE event_name = 'purchase' and items.item_name = product
Training data
** Step 2 : Create a BigQuery ML model **
Query:
#standardSQL CREATE OR REPLACE MODEL ecommerce_public.product_affinity OPTIONS (model_type='matrix_factorization', feedback_type='implicit', user_col='user_pseudo_id', item_col= 'product_recommended', rating_col='rating', l2_reg=30, num_factors=15) AS
WITH UniqueUsersList As ( ## get all unique users
select distinct(user_pseudo_id) from ecommerce_public.events_flat_data
),
UserPurchaseEvents AS ( ## get all purchase orders
SELECT
user_pseudo_id,
items.item_name
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE event_name = 'purchase' group by 1,2
)
,
ProductBoughtTogetherHistorically as ( ## get all items which are bought together
select user_pseudo_id as prevUserId, STRING_AGG(items.item_name, ', ') AS productsBoughtTogether from ecommerce_public.events_flat_data
, UNNEST(items) AS items WHERE event_name = 'purchase' and ecommerce.unique_items > 1 group by 1
)
,
#select * from ProductBoughtTogetherHistorically
UserProductRecommendations AS ( ## for all items purchased by user in an order prepare the list of other items bought together
select user_pseudo_id,cust_prev_purchase,split(productsBoughtTogether,',') AS product from
(
select b.,a. from
ProductBoughtTogetherHistorically a,
(select user_pseudo_id,item_name AS cust_prev_purchase from UserPurchaseEvents where user_pseudo_id in (select user_pseudo_id from UniqueUsersList ) ) b where a.productsBoughtTogether like CONCAT('%', CAST(b.cust_prev_purchase AS STRING),'%' ) ## Compare each itemn in an order with orders bought historically and prepare a list of items bought together and user_pseudo_id != prevUSerId ## remove if the product in an order is same as the compared order historically ) ) , #select * from UserProductRecommendations UserProductMatrix as (select UserProductRecommendations.user_pseudo_id, #cust_prev_purchase, product_recommended, ecommerce_public.getratingbasedonProduct(TRIM(product_recommended)) as rating from UserProductRecommendations CROSS JOIN UNNEST(UserProductRecommendations.product) AS product_recommended where TRIM(product_recommended) <> TRIM(UserProductRecommendations.cust_prev_purchase) ### , remove if the previous order is the same as the current order group by user_pseudo_id, product_recommended order by user_pseudo_id) select * from UserProductMatrix l where not exists ( select 1 from UserPurchaseEvents where user_pseudo_id = l.user_pseudo_id and Trim(item_name) = Trim(l.product_recommended)## check if an item in the recommendation list is already bought by the user, if so remove it ) order by user_pseudo_id
**Step 3 : Evaluate **
Query
#standardSQL
SELECT
*
FROM
ML.EVALUATE(MODEL ecommerce_public.product_affinity,
(
WITH UniqueUsersList As ( ## get all unique users
select distinct(user_pseudo_id) from ecommerce_public.events_flat_data
),
UserPurchaseEvents AS ( ## get all purchase orders
SELECT
user_pseudo_id,
items.item_name
FROM
ecommerce_public.events_flat_data
,
UNNEST(items) AS items
WHERE event_name = 'purchase' group by 1,2
)
,
ProductBoughtTogetherHistorically as ( ## get all items which are bought together
select user_pseudo_id as prevUserId, STRING_AGG(items.item_name, ', ') AS productsBoughtTogether from ecommerce_public.events_flat_data
, UNNEST(items) AS items WHERE event_name = 'purchase' and ecommerce.unique_items > 1 group by 1
)
,
#select * from ProductBoughtTogetherHistorically
UserProductRecommendations AS ( ## for all items purchased by user in an order prepare the list of other items bought together
select user_pseudo_id,cust_prev_purchase,split(productsBoughtTogether,',') AS product from
(
select b.,a. from
ProductBoughtTogetherHistorically a,
(select user_pseudo_id,item_name AS cust_prev_purchase from UserPurchaseEvents where user_pseudo_id in (select user_pseudo_id from UniqueUsersList ) ) b where a.productsBoughtTogether like CONCAT('%', CAST(b.cust_prev_purchase AS STRING),'%' ) ## Compare each itemn in an order with orders bought historically and prepare a list of items bought together and user_pseudo_id != prevUSerId ## remove if the product in an order is same as the compared order historically ) ) , #select * from UserProductRecommendations UserProductMatrix as (select UserProductRecommendations.user_pseudo_id, #cust_prev_purchase, product_recommended, ecommerce_public.getratingbasedonProduct(TRIM(product_recommended)) as rating from UserProductRecommendations CROSS JOIN UNNEST(UserProductRecommendations.product) AS product_recommended where TRIM(product_recommended) <> TRIM(UserProductRecommendations.cust_prev_purchase) ### , remove if the previous order is the same as the current order group by user_pseudo_id, product_recommended order by user_pseudo_id) select * from UserProductMatrix l where not exists ( select 1 from UserPurchaseEvents where user_pseudo_id = l.user_pseudo_id and Trim(item_name) = Trim(l.product_recommended)## check if an item in the recommendation list is already bought by the user, if so remove it ) order by user_pseudo_id))
**Step 4 : Recommend **
Top 5 recommendations for a user
#standardSQL SELECT * FROM ML.RECOMMEND(MODEL ecommerce_public.product_affinity, ( select '2534169.4418845320' as user_pseudo_id)) order by predicted_rating_confidence desc LIMIT 5
Model Management - Vertex AI
We will need a Google Cloud Platform project with billing enabled to run this. To create a project, follow the instructions here. Step 1: Enable the Compute Engine API Navigate to Compute Engine and select Enable if it isn't already enabled. You'll need this to create your notebook instance. Step 2: Enable the Vertex AI API Navigate to the Vertex AI section of your Cloud Console and click Enable Vertex AI API.
Step 3: Create a Notebooks instance We'll use Notebooks to get predictions after we've deployed our model. From the Vertex AI section of your Cloud Console, click on Notebooks: From there, select New Instance. Then select the TensorFlow Enterprise 2.3 instance type without GPUs:
Use the default options and then click Create.
**Train a BigQuery ML model ** Step 1: Create a BigQuery dataset in your project To train a model in BigQuery ML, you'll need to create a dataset within your project to store this model. Click on your project in the left menu bar, and then select Create Dataset: Step 2: Run a CREATE MODEL query In the BigQuery Query editor, run the following CREATE MODEL query to create and train a BigQuery ML model on the public dataset we'll be using
** Export your BigQuery ML model** With a trained BQML model, we can use the BQML SQL syntax to get predictions or we can export the model to deploy it elsewhere. Here we'll export our model so that we can deploy it to Vertex AI to scalably serve the model and get predictions. Step 1: Create a Cloud Storage Bucket for your model In the model details, click Export Model:
This will prompt you to enter the Google Cloud Storage (GCS) location where you'd like your model's assets to be exported. If you don't have a GCS bucket yet, don't worry! We're about to create one. First, click Browse:
Then click the + icon to create a new bucket:
Give it a unique name (Storage bucket names need to be globally unique). Click Continue. In the next step, under Location type select Region and choose any of the regions from the dropdown:
Use the default storage class, and under access control make sure Uniform is selected:
Click continue and use the defaults for the rest of the options. Then click Create.
Step 2: Export the BQML model With your new bucket created, enter model-assets (or anything you'd like) in the Name field and then click Select:
Then click Export. This will create a job in BigQuery to export your model in TensorFlow's SavedModel format to the newly created GCS bucket you specified. This will take about a minute to export. While your model is being exported, navigate to the Storage section of your Cloud console. When your job completes, you should see your model assets exported to the bucket you just created under a model-assets subdirectory:
Export your BigQuery ML model With a trained BQML model, we can use the BQML SQL syntax to get predictions or we can export the model to deploy it elsewhere. Here we'll export our model so that we can deploy it to Vertex AI to scalably serve the model and get predictions. Step 1: Create a Cloud Storage Bucket for your model In the model details, click Export Model:
This will prompt you to enter the Google Cloud Storage (GCS) location where you'd like your model's assets to be exported. If you don't have a GCS bucket yet, don't worry! We're about to create one. First, click Browse:
Then click the + icon to create a new bucket:
Give it a unique name (Storage bucket names need to be globally unique). Click Continue. In the next step, under Location type select Region and choose any of the regions from the dropdown:
Use the default storage class, and under access control make sure Uniform is selected:
Click continue and use the defaults for the rest of the options. Then click Create. Step 2: Export the BQML model With your new bucket created, enter model-assets (or anything you'd like) in the Name field and then click Select:
Import the model to Vertex AI In this step we'll reference the GCS storage location where we just exported our model assets to create and import the model to Vertex AI. Step 1: Import the model In your Cloud console, navigate to the Vertex AI Models section. From there, select Import:
In the first step, give your model the name predict_default. Select the same region where you created your bucket (either us-central1, europe-west4, or asia-east1. Then click Continue. In Model settings, keep "Import model artifacts into a new pre-built container" selected. In the Model framework dropdown, select TensorFlow. Then select 2.3 as the framework version. In the Model artifact location field, click Browse, click into the GCS bucket you just created, and click on the model-assets directory:
Then click Import. It will take a few minutes to import your model. Once it has been created, you'll see it in the models section of your Cloud console:
Then click Export. This will create a job in BigQuery to export your model in TensorFlow's SavedModel format to the newly created GCS bucket you specified. This will take about a minute to export. While your model is being exported, navigate to the Storage section of your Cloud console. When your job completes, you should see your model assets exported to the bucket you just created under a model-assets subdirectory:
Deploy the model to an endpoint Now that we've uploaded our model, the next step is to create an Endpoint in Vertex. A Model resource in Vertex can have multiple endpoints associated with it, and you can split traffic between endpoints. Step 1: Creating an endpoint On your model page, navigate to the Deploy and test tab and click Deploy to endpoint:
Give your endpoint a name, like default_pred_v1, leave the traffic splitting settings as is, and then select a machine type for your model deployment. We used an n1-highcpu-2 here, but you can choose whichever machine type you'd like. Then select Done and click Continue. Leave the selected location settings as is and then click Deploy. Your endpoint will take a few minutes to deploy. When it is completed you'll see a green check mark next to it:
Testing model Online
UI
Google Cloud Function
Request
curl -m 70 -X POST https://us-central1-ecomm-analysis.cloudfunctions.net/UserCluster
-H "Authorization:bearer $(gcloud auth print-identity-token)"
-H "Content-Type:application/json"
-d '{"userId":"2534169.4418845320"}'
REST API
Request
curl
-X POST
-H "Authorization: Bearer $(gcloud auth print-access-token)"
-H "Content-Type: application/json"
https://us-central1-aiplatform.googleapis.com/v1/projects/ecomm-analysis/locations/us-central1/endpoints/6123971903456542720:predict
-d '{"instances": [{"user_pseudo_id":"2534169.4418845320"}]}'
Sample req {"instances": [{"average_order_value":12.8,"no_of_orders":60,"session_count":8,"total_time_spend_by_user_in_msec": 2224770}]}
Sample resp { "predictions": [ { "nearest_centroid_id": [ 3 ], "centroid_distances": [ 17.12813603951723, 142.2885807575315, 14.73747037975442, 15.66436157266043 ], "centroid_ids": [ 1, 2, 3, 4 ] } ], "deployedModelId": "5967942407382106112", "model": "projects/591702499920/locations/us-central1/models/7915265461103624192", "modelDisplayName": "userclustermodel", "modelVersionId": "1" }
**Model updating with only incremental records **
Based on real time increamental changes to Bigquery table data, our model also has to be retrained on an increamental basis. BigML model provides an option 'warm_start' from which we can train our existing model with new data
Bigquery change history lets us track the history of changes to a BigQuery table using APPENDS TVF.
The APPENDS TVF returns a table of all rows appended to a table for a given time range. The following operations add rows to the APPENDS change history:
CREATE TABLE DDL statement
INSERT DML statement
MERGE DML statement
Loading data into BigQuery
Streaming ingestion
Once we are able to get only the increamental records we can retrain the existing model.
Once we have the increamental training model query we can schedule it in Bigquery scheduled queries as below
Note : Currently only K-Means model is supported for retraining. Matrix model require to purchase flex slots for model creation so it can't be scheduled.
**Business Analytics **
Looker – Mongo DB Integration
The general steps for setting up a Google BigQuery Standard SQL or Google BigQuery Legacy SQL connection are: Create a service account with access to the Google project and download the JSON credentials certificate. Set up the Looker connection to your database. Test the connection.
Creating a service account and downloading the JSON credentials certificate You must have Google Cloud admin permissions to create a service account. Google has documentation on creating a service account and generating a private key. Open the credentials page in the Google Cloud Platform API Manager and, if necessary, select your project:
Click CREATE CREDENTIALS and choose Service account:
Enter a name for the new service account, optionally add a description, and click CREATE:
Your service account requires two Google BigQuery predefined roles: BigQuery > BigQuery Data Editor BigQuery > BigQuery Job User Select the first role in the Select a role field, then click ADD ANOTHER ROLE and select the second role:
After selecting both roles, click CONTINUE:
Click CREATE KEY:
Select JSON and click CREATE:
The JSON key will be saved to your computer. BE SURE TO REMEMBER WHERE IT IS SAVED. After noting the download location, click CLOSE:
Click DONE:
Find the email address corresponding to the service account. You will need this to configure the Looker connection to BigQuery.
- Set up the Looker connection to your database.
Use the service account created in above step and Json generated as an authenticate mechanism
- Test the connection
Creating Projects
Configuring Projects
Create Views from table
Sample view file
Dashboard
Sales Distribution
Sales Trend Sales Distribution by Category
Products with repeat orders