You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python
Python 3.6.6 (default, Jun 28 2018, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> exit()
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ pyton -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
No command 'pyton' found, did you mean:
Command 'python' from package 'python-minimal' (main)
Command 'pytone' from package 'pytone' (universe)
pyton: command not found
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
TRAIN: 0%| | 0/350 [00:00<?, ?it/sL
oaded hyperparameters from configuration config.json, model: original-fp: {'state_description': False, 'g_layers': [256, 256, 256, 256], 'question_injection_position': 0, 'f_fc1': 256, 'f_fc2': 256, 'dropout': 0
.5, 'lstm_hidden': 128, 'lstm_word_emb': 32, 'rl_in_size': 52}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_val_questions.pkl
CLEVR dataset initialized!
Supposing original DeepMind model
Training (350 epochs) is starting...
Dataset reinitialized with batch size 640
Current learning rate: 1e-05
T
raceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1093/1094 [11:21:28<00:37, 37.41s/it, loss=1.92]
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 418, in <module>
main(args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 356, in main
train(clevr_train_loader, model, optimizer, epoch, args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 40, in train
output = model(img, qst)
File "/data/Rudra/virtualenvs/rn_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/data/Rudra/RelationNetworks-CLEVR/model.py", line 200, in forward
x = torch.cat([x, self.coord_tensor], 1) # (B x 24+2 x 8*8)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 469 and 640 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Train Epoch: 1 [0/700160 (0%)] Train loss: 39.945804595947266
Train Epoch: 1 [6400/700160 (1%)] Train loss: 36.57775611877442
Train Epoch: 1 [12800/700160 (2%)] Train loss: 29.848896408081053
Train Epoch: 1 [19200/700160 (3%)] Train loss: 24.984291648864748
Train Epoch: 1 [25600/700160 (4%)] Train loss: 20.945134353637695
.
.
.
Train Epoch: 1 [684800/700160 (98%)] Train loss: 1.8508247494697572
Train Epoch: 1 [691200/700160 (99%)] Train loss: 1.8768051743507386
Train Epoch: 1 [697600/700160 (100%)] Train loss: 1.8581566572189332
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$
I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.
Hi @saharudra, this issue is probably due to a batch handling issue on the Multi GPU setup.
You should be able to run the code by simply removing the condition (the entire line):
This is not the most efficient solution; however, if that is the problem, I will fix it permanently as soon as possible using a better approach.
Thanks!
When I run the code, I get the following output:
I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.
logfile.log
The text was updated successfully, but these errors were encountered: