Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to give a new dataset (my dataset) as an input this project? #3

Open
PavithranRick opened this issue Feb 13, 2018 · 13 comments
Open
Labels

Comments

@PavithranRick
Copy link

I have my data set of 100 features and 1 target variable in an numpy array.. How to give it as an input to this project... Thanks

@titu1994
Copy link
Owner

titu1994 commented Feb 13, 2018

Prepare the dataset im a format similar to the UCR datasets. It should be CSV files, with the 0th column being the class label and the remaining columns as the 100 features. Note, this file must have no extension like .csv at the end. Put this file in the data folder.

Next, go to utils/constants.py and edit each of the arrays at the end to provide the path to the train and test files, the number of classes and the number of variables. Note the index of this array (probably the 86th index).

Then go to the root folder and duplicate any script with _model and edit it. Change the DATASET_ID variable to point to the index from the constants.py file. Edit the models as needed. Edit the main portion by changing the string names and then run

@titu1994 titu1994 reopened this Feb 13, 2018
@PavithranRick
Copy link
Author

Cool Thanks Bro... But

  1. what does MAX_SEQUENCE_LENGTH_LIST indicate for a data set?
  2. how do you decide when to use Attention LSTM and normal LSTM?
  3. Why haven't you experimented with standard architectures for the CNN part like resnet, inception(I know they are designed for images, but have you tried)?

@titu1994
Copy link
Owner

  1. The number of timesteps your univariate timeseries (sequence) dataset has. This model is good for univariate datasets, where at each timestep you have 1 input.

  2. Try both. A small indicator is that you don't overfit even when using 128 lstms. Then switch to 128 Attention LSTM models. But it can perform better even on smaller data. So try both.

  3. UCR dataset is a standard benchmark, but it is a very simple benchmark. Complex CNN parts like ResNets and Inception will overfit and reduce performance. There has been some work on trying ResNet like architectures prior to this work, and it was beaten by regular CNNs.

@PavithranRick
Copy link
Author

Thanks for the clarification Titu...

Just a few more doubts:

  1. according to your 3rd point "There has been some work on trying ResNet like architectures prior to this work, and it was beaten by regular CNNs."
    -- Can you provide us any literature to learn more about this behaviour of standard CNN architectures

  2. Can you please provide the source from which you took State of the art algorithm's accuracy ??
    (existing SOTA in this https://github.com/titu1994/LSTM-FCN/blob/master/images/LSTM-FCN-scores.png)

@fazlekarim
Copy link
Collaborator

  1. Z. Wang, W. Yan, T. Oates, "Time series classification from scratch with deep neural networks: A strong baseline", Proc. Int. Joint Conf. Neural Netw. (IJCNN), pp. 1578-1585, May 2017.

  2. For the exact source of each model that is SOTA for each dataset can be found in our paper: http://ieeexplore.ieee.org/abstract/document/8141873/
    For a summary with all models we compared against on each dataset:
    https://www2.informatik.hu-berlin.de/~schaefpa/weasel/results.xlsx

@ligesangmeiduo
Copy link

Hello, I saw a note in the code of LSTM-FCN that the fine-tuning model can be added, but the input data is one-dimensional. How to use a picture classification model like VGG16 for fine-tuning?

@sxjjxs
Copy link

sxjjxs commented Jul 5, 2019

Hi! I want to use your model to do the classification of my sequence data. But in the visualization context vector section, how you get the attention vector? The code is as follows:

attention_vector = (attention_vector - attention_vector.min()) / (attention_vector.max() -
attention_vector.min())
attention_vector = (attention_vector * 2.) - 1.

Looking forward for your reply! Thanks!

@Soyiba
Copy link

Soyiba commented Aug 26, 2019

"Then go to the root folder and duplicate any script with _model and edit it. Change the DATASET_ID variable to point to the index from the constants.py file. Edit the models as needed. Edit the main portion by changing the string names and then run".
How to locate the root folder?

@sherrygarg
Copy link

Prepare the dataset im a format similar to the UCR datasets. It should be CSV files, with the 0th column being the class label and the remaining columns as the 100 features. Note, this file must have no extension like .csv at the end. Put this file in the data folder.

Next, go to utils/constants.py and edit each of the arrays at the end to provide the path to the train and test files, the number of classes and the number of variables. Note the index of this array (probably the 86th index).

Then go to the root folder and duplicate any script with _model and edit it. Change the DATASET_ID variable to point to the index from the constants.py file. Edit the models as needed. Edit the main portion by changing the string names and then run

Is there any script available to convert my or any time series dataset to UCR format?

@sherrygarg
Copy link

Is there any script available to convert my or any time series dataset to UCR format?

@GaotongWu
Copy link

I am very confused about why the dataset files have no extension? What is the format type of them?

@titu1994
Copy link
Owner

They are CSV files. It's an old and outstanding issue which drops the file type.

@GaotongWu
Copy link

I got this error: 'ModelCheckpoint' object has no attribute '_implements_train_batch_hooks'. Why this happens and how do we fix it? The "ModelCheckpoint" is imported from keras so I have no idea why something is missing?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants