The following workflow will demonstrate how to use a CNN to do deep learning in KNIME for image classification of a sign language dataset.
Daily weather forecast (India): https://www.kaggle.com/sumanthvrao/daily-climate-time-series-data
Daily Weather Forecast (India) Workflow: https://tinyurl.com/2p9bywnr
All the training data is in a csv format with the following columns:
1.date
2.mean temperature
3.humidity
4.wind speed
5.mean pressure
There are 1462 rows with the dates spanning from 2013-01-01 to 2017-01-01. The testing data has the same columns with dates spanning from 2017-01-01 to 2017-04-24.
Before anything, the images from the dataset has to be cleaned and processed in a way that the model can accept it.
We will first use two CSV Reader nodes in order to read both the training and test csv files from the downloaded dataset. We can then use the Concatenate node to combine both tables together.
Inside the Pre Processing Metanode, we will clean the data up before feeding it into the model. We use Sorter to sort the data ascending by date (as time series is usually highly dependent on the date/time, this step is crucial).
We then convert the date column - originally a string - to KNIME’s version of date/time using the String to Date&Time node. This node will convert the date column (string) to KNIME’s date and string object. You can configure the date format and type as well as the execution (whether it should fail on error). Follow the configurations on the image.
Now we can use the column filter node to filter out the unneeded data. In our case we are going to use the Mean Temperature column to train our model, so we are only passing the date and mean temperature column.
The timestamp alignment node hecks whether the selected timestamp column is uniformly sampled in the selected time scale. Missing values will be inserted at skipped sampling times. Select Period = Day, Timestamp column = date and tick replace timestamp column.
We are now going to lag and partition the data before feeding it into our model. Using the lag column, we are going to put in the following configurations.
This will lag our input column by 1,200 times. We then aggregate all the lagged columns so that they fit into one list column.
Finally, partition the data into a 80-20 split using TAKE FROM TOP.
Follow the configurations listed below for all 3 layers of the LSTM model.
Follow the configurations below for the Keras Network Learner.
The Deployment loop component can be taken from the original cited workflow, or made from scratch. Essentially, this component will first convert the Keras network to Tensorflow. Then to execute the network, we start with an input of the same length as the training. We apply our network to predict the next character, delete the first character, and apply the network again to our new sequence and so on.
The scorer will use the selected columns to determine the model’s accuracy. Follow the configurations. To see the confusion matrix, right click on the node and select “confusion matrix.”
This nline plot node will present a visual plot of how accurate the predicted forecast was in comparison with the actual values (mean temperature). To see this plot, just right click on the node and select view line plot.