Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

new recipe notebook for scalefit and transform function #850

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
359 changes: 359 additions & 0 deletions Recipes/ClearScape_Functions/ScaleFitandTransform.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,359 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dad53685-d8fb-4bdc-a179-8676d96ac7a0",
"metadata": {},
"source": [
"<header>\n",
" <p style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>\n",
" ScaleFit and ScaleTransform function in Vantage\n",
" <br>\n",
" <img id=\"teradata-logo\" src=\"https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg\" alt=\"Teradata\" style=\"width: 125px; height: auto; margin-top: 20pt;\">\n",
" </p>\n",
"</header>"
]
},
{
"cell_type": "markdown",
"id": "de955807-45c2-43e9-b2d8-a685f24b2846",
"metadata": {},
"source": [
"<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>ScaleFit is used to calculate scaling statistics for selected columns, preparing data for standardization or normalization.<br>\n",
" ScaleTransform function scales specified columns in input data, using ScaleFit function output.\n",
"<br> In this notebook we will use the ScaleFit and ScaleTransform functions available in Vantage to demonstrate the real-life application of data scaling in a financial context. </p>"
]
},
{
"cell_type": "markdown",
"id": "83f1e3af-aaee-442c-aa1f-bf0446fc8729",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Initiate a connection to Vantage</b>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d8b43f8-2b67-4824-90ff-92bb67110b62",
"metadata": {},
"outputs": [],
"source": [
"from teradataml import *\n",
"\n",
"# Modify the following to match the specific client environment settings\n",
"display.max_rows = 5"
]
},
{
"cell_type": "markdown",
"id": "f1a476e3-916c-4a5c-96e1-fd0a16852047",
"metadata": {},
"source": [
"<hr style=\"height:1px;border:none;background-color:#00233C;\">\n",
"<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Connect to Vantage</b></p>\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a721387a-5a5c-47e5-a3d6-8c2b0c198dd5",
"metadata": {},
"outputs": [],
"source": [
"%run -i ../../UseCases/startup.ipynb\n",
"eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)\n",
"print(eng)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d815784-0c36-40c5-9d38-697a3b309dfa",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"execute_sql('''SET query_band='DEMO=PP_ScaleFitandTransform_Python.ipynb;' UPDATE FOR SESSION; ''')"
]
},
{
"cell_type": "markdown",
"id": "91dc957d-96e7-4b5b-963a-ae6f1e4f3db3",
"metadata": {},
"source": [
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>"
]
},
{
"cell_type": "markdown",
"id": "478319bb-a179-401f-8f89-ae548b315c58",
"metadata": {},
"source": [
"<hr style='height:1px;border:none;background-color:#00233C;'>\n",
"\n",
"<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Getting Data for This Demo</b></p>\n",
"\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc41f652-ca52-4bbe-941d-a235012b1e5c",
"metadata": {},
"outputs": [],
"source": [
"%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_BankChurn_cloud');\" # Takes 30 seconds\n",
"#%run -i ../../UseCases/run_procedure.py \"call get_data('DEMO_BankChurn_local');\" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12c3e237-edd8-4119-ae16-d22bee0e9dce",
"metadata": {},
"outputs": [],
"source": [
"%run -i ../../UseCases/run_procedure.py \"call space_report();\" # Takes 10 seconds"
]
},
{
"cell_type": "markdown",
"id": "34e358f1-390d-4784-b92c-ccb476433481",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Data Exploration</b>\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Create a \"Virtual DataFrame\" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85993f0c-be64-4513-b610-0baf9f134030",
"metadata": {},
"outputs": [],
"source": [
"tdf = DataFrame(in_schema(\"DEMO_BankChurn\", \"customer_churn\"))\n",
"print(\"Shape of the data: \", tdf.shape)\n",
"tdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "416adc93-139c-4dcf-b2bf-fe3f1503a1cf",
"metadata": {},
"outputs": [],
"source": [
"tdf.tdtypes"
]
},
{
"cell_type": "markdown",
"id": "4b32c6ff-5960-491c-95f6-90d0ecf39c70",
"metadata": {},
"source": [
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A bank wants to scale some numerical columns like CreditScore, Age, and Balance based on a certain method. For this, we'll use the mean scaling method. This means each column will be scaled by subtracting the mean and dividing by the standard deviation.\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b05ee0fe-8fa4-4243-b5a0-7a383019ba2a",
"metadata": {},
"outputs": [],
"source": [
"help(ScaleFit)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ffab126-32a7-4c4c-a504-23862f1dc228",
"metadata": {},
"outputs": [],
"source": [
"# Apply ScaleFit to relevant columns\n",
"fit_result = ScaleFit(data=tdf, \n",
" target_columns=[\"CreditScore\", \"Age\", \"Balance\"], \n",
" scale_method=\"MEAN\", # Scaling based on mean\n",
" miss_value=\"KEEP\", # Keep missing values as they are\n",
" global_scale=False) # Do not scale globally across all data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6c259e0-398b-48cc-abce-20d9135e9673",
"metadata": {},
"outputs": [],
"source": [
"# Print the scaling model result\n",
"fit_result.output"
]
},
{
"cell_type": "markdown",
"id": "30020e94-e004-4de2-9421-d104d123aaec",
"metadata": {},
"source": [
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The ScaleTransform will use the statistics from ScaleFit to scale the data. We will keep the Exited column unchanged cause it doesn't require scaling, and apply the scaling only to CreditScore, Age, and Balance.\n",
"</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7aa7e047-8150-4b90-8344-89e9a5c99288",
"metadata": {},
"outputs": [],
"source": [
"help(ScaleTransform)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d17123f-5369-4fd9-afc7-339bd2b3228a",
"metadata": {},
"outputs": [],
"source": [
"# Apply scaling to the data using the model fit earlier\n",
"transformed_data = ScaleTransform(data=tdf, \n",
" object=fit_result.output, \n",
" accumulate=\"Exited\") # Keep the \"Exited\" column as is"
]
},
{
"cell_type": "markdown",
"id": "607192d7-99c8-45e1-ae28-ed1de4c29d7d",
"metadata": {},
"source": [
"The scaling has transformed the CreditScore, Age, and Balance columns, standardizing their values to a comparable range, making them suitable for machine learning models without being influenced by their original magnitudes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9589c755-8c6a-4850-8183-a6f1ed8c9953",
"metadata": {},
"outputs": [],
"source": [
"# Display the result of scaled data\n",
"transformed_data.result"
]
},
{
"cell_type": "markdown",
"id": "d2d7c981-c440-427c-9ae4-c11d6221c848",
"metadata": {},
"source": [
"<hr style=\"height:2px;border:none;background-color:#00233C;\">\n",
"<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Cleanup</b>"
]
},
{
"cell_type": "markdown",
"id": "016af885-1571-4d80-aa1b-3119d1ce5d4e",
"metadata": {},
"source": [
"<hr style=\"height:1px;border:none;background-color:#00233C;\">\n",
"<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98498b6a-ad96-4741-9641-2d4c311349e1",
"metadata": {},
"outputs": [],
"source": [
"%run -i ../../UseCases/run_procedure.py \"call remove_data('DEMO_BankChurn');\" # Takes 10 seconds"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9669a79-29cd-4505-92ac-a394203879cb",
"metadata": {},
"outputs": [],
"source": [
"remove_context()"
]
},
{
"cell_type": "markdown",
"id": "405b60bc-4900-47b3-bc93-6f3f42946242",
"metadata": {},
"source": [
"<hr style=\"height:1px;border:none;background-color:#00233C;\">\n",
"<b style = 'font-size:18px;font-family:Arial;color:#00233C'>Dataset:</b>\n",
"\n",
"- `RowNumber`: Row index\n",
"- `CustomerId`: Unique customer ID\n",
"- `Surname`: Customer's surname\n",
"- `CreditScore`: Credit score of the customer\n",
"- `Geography`: Country (Germany / France / Spain)\n",
"- `Gender`: Gender (Male / Female)\n",
"- `Age`: Age of the customer\n",
"- `Tenure`: Number of years the customer has been associated with the bank\n",
"- `Balance`: Account balance\n",
"- `NumOfProducts`: Number of bank products used\n",
"- `HasCrCard`: Credit card status (0 = No, 1 = Yes)\n",
"- `IsActiveMember`: Active membership status (0 = No, 1 = Yes)\n",
"- `EstimatedSalary`: Estimated salary of the customer\n",
"- `Exited`: Customer churn status (0 = No, 1 = Yes)\n",
"\n",
"<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Links:</b></p>\n",
"<ul style = 'font-size:16px;font-family:Arial'>\n",
" <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>\n",
" <li>ScaleFit function reference: <a href = 'https://docs.teradata.com/search/all?query=ScaleFit&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*1fpysxk*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczODg0MzQ3My4xNTIuMS4xNzM4ODQzNTUwLjUwLjAuMA..'>here</a></li>\n",
" <li>ScaleTransform function reference: <a href = 'https://docs.teradata.com/search/all?query=ScaleTransform&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*1fpysxk*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczODg0MzQ3My4xNTIuMS4xNzM4ODQzNTUwLjUwLjAuMA..'>here</a></li></li>\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"id": "af29da16-3b8f-4249-9356-dd9fc510c4f9",
"metadata": {},
"source": [
"<footer style=\"padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C\">\n",
" <div style=\"float:left;margin-top:14px\">ClearScape Analytics™</div>\n",
" <div style=\"float:right;\">\n",
" <div style=\"float:left; margin-top:14px\">\n",
" Copyright © Teradata Corporation - 2025. All Rights Reserved\n",
" </div>\n",
" </div>\n",
"</footer>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}