Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

For our own dataset #15

Open
Better-Boy opened this issue Apr 2, 2018 · 3 comments
Open

For our own dataset #15

Better-Boy opened this issue Apr 2, 2018 · 3 comments

Comments

@Better-Boy
Copy link

Better-Boy commented Apr 2, 2018

How to apply the same architecture to our own dataset with images and captions? Please give instructions

@314rated
Copy link

314rated commented Apr 3, 2018

Yes, this info shall be greatly useful. Thanks

@zsdonghao
Copy link
Owner

zsdonghao commented Apr 4, 2018

For customised dataset, we need to prepare the data to fit the same format here : https://github.com/zsdonghao/text-to-image/blob/master/data_loader.py#L166

BTW, this is the steps to create the vocabulary: https://github.com/wagamamaz/tensorlayer-tricks/blob/master/README.md#9-sentences-tokenization

@Better-Boy
Copy link
Author

I went through the code of data_loader.py. Here, in this code, no use of ".t7" files was encountered by me. Only arrange your dataset with all the images in one directory and the text descriptions in sync with the file names in another directory with class name as the directory name. All the directories containing text descriptions, should be under one parent directory called "text_c10". Then, run the data _loader.py. You will get the output as required to train the model.

If any mistake in my understanding, please correct it

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants