From cca34f1e882254e9c85bc6bd767bd248d877e688 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Sat, 23 Sep 2023 01:56:43 -0400 Subject: [PATCH 1/7] Skip 17-home-rental-prediction.ipynb due to postgres setup --- script/test/test.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/script/test/test.sh b/script/test/test.sh index dd3c0b5a42..dc40092478 100644 --- a/script/test/test.sh +++ b/script/test/test.sh @@ -88,7 +88,7 @@ long_integration_test() { } notebook_test() { - PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" --ignore="tutorials/16-homesale-forecasting.ipynb" + PYTHONPATH=./ python -m pytest --durations=5 --nbmake --overwrite "./tutorials" --capture=sys --tb=short -v --log-level=WARNING --nbmake-timeout=3000 --ignore="tutorials/08-chatgpt.ipynb" --ignore="tutorials/14-food-review-tone-analysis-and-response.ipynb" --ignore="tutorials/15-AI-powered-join.ipynb" --ignore="tutorials/16-homesale-forecasting.ipynb" --ignore="tutorials/17-home-rental-prediction.ipynb" code=$? print_error_code $code "NOTEBOOK TEST" } From 27a20f49079365481f79ac77a1be76384c912762 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Sat, 23 Sep 2023 01:58:47 -0400 Subject: [PATCH 2/7] Fix the format issues on existing pages. --- docs/source/overview/model-inference.rst | 2 +- docs/source/usecases/homesale-forecast.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/overview/model-inference.rst b/docs/source/overview/model-inference.rst index 6be8d281d8..2db183343b 100644 --- a/docs/source/overview/model-inference.rst +++ b/docs/source/overview/model-inference.rst @@ -43,7 +43,7 @@ In EvaDB, we can also use models in joins. The most powerful usecase is lateral join combined with ``UNNEST``, which is very helpful to flatten the output from `one-to-many` models. The key idea here is a model could give multiple outputs (e.g., bounding box) stored in an array. This syntax is used to unroll elements from the array into multiple rows. Typical examples are `face detectors `_ and `object detectors `_. -In the below example, we use `emotion detector _` to detect emotions from faces in the movie, where a single scene can contain multiple faces. +In the below example, we use `emotion detector `_ to detect emotions from faces in the movie, where a single scene can contain multiple faces. .. code-block:: sql diff --git a/docs/source/usecases/homesale-forecast.rst b/docs/source/usecases/homesale-forecast.rst index 5f1f2937ea..db3f96d61b 100644 --- a/docs/source/usecases/homesale-forecast.rst +++ b/docs/source/usecases/homesale-forecast.rst @@ -74,7 +74,7 @@ Particularly, we are interested in the price of the properties that have three b In the ``home_sales`` dataset, we have two different property types, houses and units, and price gap between them are large. We'd like to ask EvaDB to analyze the price of houses and units independently. -To do so, we specify the ``propertytype`` column as the ``ID `` of the time series data, which represents an identifier for the series. +To do so, we specify the ``propertytype`` column as the ``ID`` of the time series data, which represents an identifier for the series. Here is the query's output ``DataFrame``: .. note:: From 8bd0adfb37928ee154cce8c081dfa2177456227b Mon Sep 17 00:00:00 2001 From: xzdandy Date: Sun, 24 Sep 2023 20:12:32 -0400 Subject: [PATCH 3/7] Checkpoint --- docs/source/usecases/homesale-forecast.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/usecases/homesale-forecast.rst b/docs/source/usecases/homesale-forecast.rst index db3f96d61b..a1f08bf199 100644 --- a/docs/source/usecases/homesale-forecast.rst +++ b/docs/source/usecases/homesale-forecast.rst @@ -22,7 +22,7 @@ Home Sale Forecasting Introduction ------------ -In this tutorial, we present how to use :ref:`forecasting models` in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function. +In this tutorial, we present how to use :ref:`Forecasting AI Engines` in EvaDB to predict home sale price. EvaDB makes it easy to do time series predictions using its built-in Auto Forecast function. .. include:: ../shared/evadb.rst @@ -34,7 +34,7 @@ To load the home sales data into your database, see the complete `home sale fore Preview the Home Sales Data ------------------------------------------- -We use the `raw_sales.csv of the House Property Sales Time Series `_ in this usecase. The data contains five columns: postcode, price, bedrooms, datesold, and propertytype. +We use the `raw_sales.csv of the House Property Sales Time Series `_ in this usecase. The data contains five columns: ``postcode``, ``price``, ``bedrooms``, ``datesold``, and ``propertytype``. .. code-block:: sql From bd853f6ed21a3705a7336c34593ff69e8bc5383f Mon Sep 17 00:00:00 2001 From: xzdandy Date: Tue, 26 Sep 2023 01:04:50 -0400 Subject: [PATCH 4/7] Update predicion usecases --- docs/_toc.yml | 2 + docs/source/usecases/homerental-predict.rst | 124 ++++++++++++++++++++ 2 files changed, 126 insertions(+) create mode 100644 docs/source/usecases/homerental-predict.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index 189d5523ff..135d36d921 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -36,6 +36,8 @@ parts: title: Emotion Analysis - file: source/usecases/homesale-forecast.rst title: Home Sale Forecasting + - file: source/usecases/homerental-predict.rst + title: Home Rental Prediction # - file: source/usecases/privategpt.rst # title: PrivateGPT diff --git a/docs/source/usecases/homerental-predict.rst b/docs/source/usecases/homerental-predict.rst new file mode 100644 index 0000000000..be7711eb56 --- /dev/null +++ b/docs/source/usecases/homerental-predict.rst @@ -0,0 +1,124 @@ +.. _homerental-predict: + +Home Rental Prediction +======================= + +.. raw:: html + + + + + + +
+ Run on Google Colab + + View source on GitHub + + Download notebook +


+ + +Introduction +------------ + +In this tutorial, we present how to use :ref:`Prediction AI Engines` in EvaDB to predict home rental prices. EvaDB makes it easy to do predictions using its built-in AutoML engines with your existing databases. + +.. include:: ../shared/evadb.rst + +.. include:: ../shared/postgresql.rst + +We will assume that the input data is loaded into a ``PostgreSQL`` database. +To load the home rental data into your database, see the complete `home rental prediction notebook on Colab `_. + +Preview the Home Sales Data +------------------------------------------- + +We use the `home rental data `_ in this usecase. The data contains eight columns: ``number_of_rooms``, ``number_of_bathrooms``, ``sqft``, ``location``, ``days_on_market``, ``initial_price``, ``neighborhood``, and ``rental_price``. + +.. code-block:: sql + + SELECT * FROM postgres_data.home_rentals LIMIT 3; + +This query previews the data in the home_rentals table: + +.. code-block:: + + +------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+ + | home_rentals.number_of_rooms | home_rentals.number_of_bathrooms | home_rentals.sqft | home_rentals.location | home_rentals.days_on_market | home_rentals.initial_price | home_rentals.neighborhood | home_rentals.rental_price | + |------------------------------|----------------------------------|-------------------|-----------------------|-----------------------------|----------------------------|---------------------------|---------------------------| + | 1 | 1 | 674 | good | 1 | 2167 | downtown | 2167 | + | 1 | 1 | 554 | poor | 19 | 1883 | westbrae | 1883 | + | 0 | 1 | 529 | great | 3 | 2431 | south_side | 2431 | + +------------------------------+----------------------------------+-------------------+-----------------------+-----------------------------+----------------------------+---------------------------+---------------------------+ + +Train a Home Rental Prediction Model +------------------------------------- + +Let's next train a prediction model from the home_rental table using EvaDB's ``CREATE FUNCTION`` query. +We will use the built-in :ref:`Ludwig` engine for this task. + +.. code-block:: sql + + CREATE OR REPLACE FUNCTION PredictHouseRent FROM + ( SELECT * FROM postgres_data.home_rental ) + TYPE Ludwig + PREDICT 'rental_price' + TIME_LIMIT 3600; + +In the above query, we use all the columns (except ``rental_price``) from ``home_rental`` table to predict the ``rental_price`` column. +We set the training time out to be 3600 seconds. + +.. note:: + + Go over :ref:`predict` page on exploring all configurable paramters for the model training frameworks. + +.. code-block:: + + +----------------------------------------------+ + | Function PredictHouseRent successfully added | + +----------------------------------------------+ + +Predict the Home Rental Price using the Trained Model +----------------------------------------------------- + +Next we use the trained ``PredictHouseRent`` to predict the home rental price. + +.. code-block:: sql + + SELECT PredictHouseRent(*) FROM postgres_data.home_rentals LIMIT 3; + +We use ``*`` to simply pass all columns into the ``PredictHouseRent`` function. + +.. code-block:: + + +-------------------------------------------+ + | predicthouserent.rental_price_predictions | + +-------------------------------------------+ + | 2087.763672 | + | 1793.570190 | + | 2346.319824 | + +-------------------------------------------+ + +We have the option to utilize a ``LATERAL JOIN`` to compare the actual rental prices in the ``home_rentals`` dataset with the predicted rental prices generated by the trained model, ``PredictHouseRent``. + +.. code-block:: sql + + SELECT rental_price, predicted_rental_price + FROM postgres_data.home_rentals + JOIN LATERAL PredictHouseRent(*) AS Predicted(predicted_rental_price) + LIMIT 3; + +Here is the query's output: + +.. code-block:: + + +---------------------------+----------------------------------+ + | home_rentals.rental_price | Predicted.predicted_rental_price | + +---------------------------+----------------------------------+ + | 2167 | 2087.763672 | + | 1883 | 1793.570190 | + | 2431 | 2346.319824 | + +------------------ --------+----------------------------------+ + +.. include:: ../shared/footer.rst From b1d3def5e422b21e1a561d92a19efb676cb27b2c Mon Sep 17 00:00:00 2001 From: xzdandy Date: Tue, 26 Sep 2023 02:35:53 -0400 Subject: [PATCH 5/7] Update the ludwig documentation --- .../source/reference/ai/model-forecasting.rst | 2 +- docs/source/reference/ai/model-train.rst | 53 +++++++++++++------ docs/source/reference/evaql/create.rst | 2 +- docs/source/usecases/homerental-predict.rst | 6 +-- 4 files changed, 41 insertions(+), 22 deletions(-) diff --git a/docs/source/reference/ai/model-forecasting.rst b/docs/source/reference/ai/model-forecasting.rst index ac61527838..f88462be17 100644 --- a/docs/source/reference/ai/model-forecasting.rst +++ b/docs/source/reference/ai/model-forecasting.rst @@ -47,7 +47,7 @@ EvaDB's default forecast framework is `statsforecast `_ for details. If not provided, an auto increasing ID column will be used. diff --git a/docs/source/reference/ai/model-train.rst b/docs/source/reference/ai/model-train.rst index 30c701a0dd..8442be18da 100644 --- a/docs/source/reference/ai/model-train.rst +++ b/docs/source/reference/ai/model-train.rst @@ -1,25 +1,32 @@ -.. _predict: +.. _ludwig: -Training and Finetuning -======================== +Model Training with Ludwig +========================== -1. You can train a predication model easily in EvaDB +1. Installation +--------------- -.. note:: +To use the `Ludwig framework `_, we need to install the extra ludwig dependency in your EvaDB virtual environment. + +.. code-block:: bash + + pip install evadb[ludwig] - Install Ludwig in your EvaDB virtual environment: ``pip install evadb[ludwig]``. +2. Example Query +---------------- .. code-block:: sql - CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM + CREATE OR REPLACE FUNCTION PredictHouseRent FROM ( SELECT sqft, location, rental_price FROM HomeRentals ) TYPE Ludwig PREDICT 'rental_price' TIME_LIMIT 120; -In the above query, you are creating a new customized function by automatically training a model from the `HomeRentals` table. The `rental_price` column will be the target column for predication, while `sqft` and `location` are the inputs. +In the above query, you are creating a new customized function by automatically training a model from the ``HomeRentals`` table. +The ``rental_price`` column will be the target column for predication, while ``sqft`` and ``location`` are the inputs. -You can also simply give all other columns in `HomeRentals` as inputs and let the underlying automl framework to figure it out. Below is an example query: +You can also simply give all other columns in ``HomeRentals`` as inputs and let the underlying AutoML framework to figure it out. Below is an example query: .. code-block:: sql @@ -29,18 +36,30 @@ You can also simply give all other columns in `HomeRentals` as inputs and let th PREDICT 'rental_price' TIME_LIMIT 120; -2. After training completes, you can use the `PredictHouseRent` like all other functions in EvaDB - -.. code-block:: sql +.. note:: - CREATE PredictHouseRent(sqft, location) FROM HomeRentals; + Check out our :ref:`homerental-predict` for working example. -You can also simply give all columns in `HomeRentals` as inputs for inference. The customized function with the underlying model can figure out the proper inference columns via the training columns. +3. Model Training Parameters +---------------------------- -.. code-block:: sql +.. list-table:: Available Parameters + :widths: 25 75 - CREATE PredictHouseRent(*) FROM HomeRentals; + * - PREDICT (**required**) + - The name of the column we wish to predict. + * - TIME_LIMIT + - Time limit to train the model in seconds. Default: 120. + * - TUNE_FOR_MEMORY + - Whether to refine hyperopt search space for available host / GPU memory. Default: False. -Check out our `Integration Tests `_ for working example. +Below is an example query specifying the above parameters: +.. code-block:: sql + CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM + ( SELECT * FROM HomeRentals ) + TYPE Ludwig + PREDICT 'rental_price' + TIME_LIMIT 3600 + TUNE_FOR_MEMORY True; diff --git a/docs/source/reference/evaql/create.rst b/docs/source/reference/evaql/create.rst index a175a3c8cd..1783977ef0 100644 --- a/docs/source/reference/evaql/create.rst +++ b/docs/source/reference/evaql/create.rst @@ -117,7 +117,7 @@ Where the `parameter` is ``key value`` pair. .. note:: - Go over :ref:`hf`, :ref:`predict`, and :ref:`forecast` to check examples for creating function via type. + Go over :ref:`hf`, :ref:`ludwig`, and :ref:`forecast` to check examples for creating function via type. CREATE MATERIALIZED VIEW ------------------------ diff --git a/docs/source/usecases/homerental-predict.rst b/docs/source/usecases/homerental-predict.rst index be7711eb56..5339ff4eae 100644 --- a/docs/source/usecases/homerental-predict.rst +++ b/docs/source/usecases/homerental-predict.rst @@ -22,7 +22,7 @@ Home Rental Prediction Introduction ------------ -In this tutorial, we present how to use :ref:`Prediction AI Engines` in EvaDB to predict home rental prices. EvaDB makes it easy to do predictions using its built-in AutoML engines with your existing databases. +In this tutorial, we present how to use :ref:`Prediction AI Engines` in EvaDB to predict home rental prices. EvaDB makes it easy to do predictions using its built-in AutoML engines with your existing databases. .. include:: ../shared/evadb.rst @@ -56,7 +56,7 @@ Train a Home Rental Prediction Model ------------------------------------- Let's next train a prediction model from the home_rental table using EvaDB's ``CREATE FUNCTION`` query. -We will use the built-in :ref:`Ludwig` engine for this task. +We will use the built-in :ref:`Ludwig` engine for this task. .. code-block:: sql @@ -71,7 +71,7 @@ We set the training time out to be 3600 seconds. .. note:: - Go over :ref:`predict` page on exploring all configurable paramters for the model training frameworks. + Go over :ref:`ludwig` page on exploring all configurable paramters for the model training frameworks. .. code-block:: From 1061828cdcf9c5f14054cf5d68514a38746fd475 Mon Sep 17 00:00:00 2001 From: xzdandy Date: Tue, 26 Sep 2023 02:44:42 -0400 Subject: [PATCH 6/7] Add sklearn documentation --- docs/_toc.yml | 6 ++- docs/source/reference/ai/model-train.rst | 65 ------------------------ 2 files changed, 4 insertions(+), 67 deletions(-) delete mode 100644 docs/source/reference/ai/model-train.rst diff --git a/docs/_toc.yml b/docs/_toc.yml index 135d36d921..7efb746f28 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -71,8 +71,10 @@ parts: - file: source/reference/ai/index title: AI Engines sections: - - file: source/reference/ai/model-train - title: Model Training + - file: source/reference/ai/model-train-ludwig + title: Model Training with Ludwig + - file: source/reference/ai/model-train-sklearn + title: Model Training with Sklearn - file: source/reference/ai/model-forecasting title: Time Series Forecasting - file: source/reference/ai/hf diff --git a/docs/source/reference/ai/model-train.rst b/docs/source/reference/ai/model-train.rst deleted file mode 100644 index 8442be18da..0000000000 --- a/docs/source/reference/ai/model-train.rst +++ /dev/null @@ -1,65 +0,0 @@ -.. _ludwig: - -Model Training with Ludwig -========================== - -1. Installation ---------------- - -To use the `Ludwig framework `_, we need to install the extra ludwig dependency in your EvaDB virtual environment. - -.. code-block:: bash - - pip install evadb[ludwig] - -2. Example Query ----------------- - -.. code-block:: sql - - CREATE OR REPLACE FUNCTION PredictHouseRent FROM - ( SELECT sqft, location, rental_price FROM HomeRentals ) - TYPE Ludwig - PREDICT 'rental_price' - TIME_LIMIT 120; - -In the above query, you are creating a new customized function by automatically training a model from the ``HomeRentals`` table. -The ``rental_price`` column will be the target column for predication, while ``sqft`` and ``location`` are the inputs. - -You can also simply give all other columns in ``HomeRentals`` as inputs and let the underlying AutoML framework to figure it out. Below is an example query: - -.. code-block:: sql - - CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM - ( SELECT * FROM HomeRentals ) - TYPE Ludwig - PREDICT 'rental_price' - TIME_LIMIT 120; - -.. note:: - - Check out our :ref:`homerental-predict` for working example. - -3. Model Training Parameters ----------------------------- - -.. list-table:: Available Parameters - :widths: 25 75 - - * - PREDICT (**required**) - - The name of the column we wish to predict. - * - TIME_LIMIT - - Time limit to train the model in seconds. Default: 120. - * - TUNE_FOR_MEMORY - - Whether to refine hyperopt search space for available host / GPU memory. Default: False. - -Below is an example query specifying the above parameters: - -.. code-block:: sql - - CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM - ( SELECT * FROM HomeRentals ) - TYPE Ludwig - PREDICT 'rental_price' - TIME_LIMIT 3600 - TUNE_FOR_MEMORY True; From da0ca5a4d3c3a3461575090564a386562c7ff8ab Mon Sep 17 00:00:00 2001 From: xzdandy Date: Tue, 26 Sep 2023 02:45:15 -0400 Subject: [PATCH 7/7] Add missing files --- .../reference/ai/model-train-ludwig.rst | 65 +++++++++++++++++++ .../reference/ai/model-train-sklearn.rst | 26 ++++++++ 2 files changed, 91 insertions(+) create mode 100644 docs/source/reference/ai/model-train-ludwig.rst create mode 100644 docs/source/reference/ai/model-train-sklearn.rst diff --git a/docs/source/reference/ai/model-train-ludwig.rst b/docs/source/reference/ai/model-train-ludwig.rst new file mode 100644 index 0000000000..8442be18da --- /dev/null +++ b/docs/source/reference/ai/model-train-ludwig.rst @@ -0,0 +1,65 @@ +.. _ludwig: + +Model Training with Ludwig +========================== + +1. Installation +--------------- + +To use the `Ludwig framework `_, we need to install the extra ludwig dependency in your EvaDB virtual environment. + +.. code-block:: bash + + pip install evadb[ludwig] + +2. Example Query +---------------- + +.. code-block:: sql + + CREATE OR REPLACE FUNCTION PredictHouseRent FROM + ( SELECT sqft, location, rental_price FROM HomeRentals ) + TYPE Ludwig + PREDICT 'rental_price' + TIME_LIMIT 120; + +In the above query, you are creating a new customized function by automatically training a model from the ``HomeRentals`` table. +The ``rental_price`` column will be the target column for predication, while ``sqft`` and ``location`` are the inputs. + +You can also simply give all other columns in ``HomeRentals`` as inputs and let the underlying AutoML framework to figure it out. Below is an example query: + +.. code-block:: sql + + CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM + ( SELECT * FROM HomeRentals ) + TYPE Ludwig + PREDICT 'rental_price' + TIME_LIMIT 120; + +.. note:: + + Check out our :ref:`homerental-predict` for working example. + +3. Model Training Parameters +---------------------------- + +.. list-table:: Available Parameters + :widths: 25 75 + + * - PREDICT (**required**) + - The name of the column we wish to predict. + * - TIME_LIMIT + - Time limit to train the model in seconds. Default: 120. + * - TUNE_FOR_MEMORY + - Whether to refine hyperopt search space for available host / GPU memory. Default: False. + +Below is an example query specifying the above parameters: + +.. code-block:: sql + + CREATE FUNCTION IF NOT EXISTS PredictHouseRent FROM + ( SELECT * FROM HomeRentals ) + TYPE Ludwig + PREDICT 'rental_price' + TIME_LIMIT 3600 + TUNE_FOR_MEMORY True; diff --git a/docs/source/reference/ai/model-train-sklearn.rst b/docs/source/reference/ai/model-train-sklearn.rst new file mode 100644 index 0000000000..2428677366 --- /dev/null +++ b/docs/source/reference/ai/model-train-sklearn.rst @@ -0,0 +1,26 @@ +.. _sklearn: + +Model Training with Sklearn +============================ + +1. Installation +--------------- + +To use the `Sklearn framework `_, we need to install the extra sklearn dependency in your EvaDB virtual environment. + +.. code-block:: bash + + pip install evadb[sklearn] + +2. Example Query +---------------- + +.. code-block:: sql + + CREATE OR REPLACE FUNCTION PredictHouseRent FROM + ( SELECT number_of_rooms, number_of_bathrooms, days_on_market, rental_price FROM HomeRentals ) + TYPE Sklearn + PREDICT 'rental_price'; + +In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Sklearn`` framework. +The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELET`` query are the inputs.