Error for creating ImageNet LMDB #1

wlw208dzy · 2016-06-11T09:03:02Z

Thanks for releasing the train script for Inception-resnet-v2. When I used the code of create-imagenet-lmdb.lua, an error came out as:
Error in `PATH-OF-TORCH/install/bin/luajit': double free or corruption (!prev): 0x00007e61b30fbe10 *** 20ms Aborted (core dumped)
Would you please tell me if you have met this problem? Thanks!

The text was updated successfully, but these errors were encountered:

lim0606 · 2016-06-11T13:23:54Z

@wlw208dzy

Oh... I used normal imagenet.lua. It might be the problem related to eladhoffer/lmdb.torch#8. However, I'm not sure about it since I left behind it for a long time.

As far as I remember, torch lmdb support was not stable for my case (the process was frequently died when I ran the create-imagenet-lmdb.lua); therefore, I just bought a ssd and ran the script with normal imagenet.lua

I made a bad mistake not to clean up irrelative files.

I'm sorry for inconvenience.

Best regards,

Jaehyun

wlw208dzy · 2016-06-11T13:53:14Z

@lim0606 Thanks very much for your reply. You are so kind. However, it is strange that we also bought a SSD and run the script with normal imagenet.lua. Our training speed is not stable(mainly because of the data loading time), as shown below:

| Epoch: [1][6252/8008] Time 1.856 Data 1.156 Err 4.5771 top1 86.875 top5 66.250 LR 0.045000
| Epoch: [1][6253/8008] Time 1.696 Data 0.922 Err 4.6205 top1 81.250 top5 68.125 LR 0.045000
| Epoch: [1][6254/8008] Time 6.023 Data 5.354 Err 4.1523 top1 83.125 top5 62.500 LR 0.045000
| Epoch: [1][6255/8008] Time 1.894 Data 1.209 Err 4.1505 top1 83.750 top5 63.125 LR 0.045000
| Epoch: [1][6256/8008] Time 7.237 Data 6.376 Err 4.1333 top1 83.750 top5 61.875 LR 0.045000
| Epoch: [1][6257/8008] Time 2.363 Data 1.634 Err 4.4003 top1 88.125 top5 64.375 LR 0.045000
| Epoch: [1][6258/8008] Time 3.415 Data 2.793 Err 4.2830 top1 80.000 top5 61.875 LR 0.045000
| Epoch: [1][6259/8008] Time 4.514 Data 3.685 Err 4.3248 top1 81.875 top5 66.250 LR 0.045000
| Epoch: [1][6260/8008] Time 1.063 Data 0.353 Err 4.3800 top1 87.500 top5 64.375 LR 0.045000
| Epoch: [1][6261/8008] Time 1.116 Data 0.490 Err 4.3797 top1 85.625 top5 63.750 LR 0.045000

For other configures, 4 TitanX are utilized in parallel and the batch size is 160(4@40). I noticed that your training speed is quite stable in your logs. So would you please tell me if you have other methods to improve the speed for data loading? Thanks again!

lim0606 · 2016-06-11T14:23:33Z

Actually, I've never tried 4 GPU setting since I only have 2 gpus. However, my friend studying with me for this work (@shuni1001) suffered from the same problem. See, facebookarchive/fb.resnet.torch#61.

I think it is a kind of general problem for 2 > gpus setting in torch.

Best regards,

Jaehyun

wlw208dzy closed this as completed Jun 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error for creating ImageNet LMDB #1

Error for creating ImageNet LMDB #1

wlw208dzy commented Jun 11, 2016 •

edited

Loading

lim0606 commented Jun 11, 2016 •

edited

Loading

wlw208dzy commented Jun 11, 2016 •

edited

Loading

lim0606 commented Jun 11, 2016

Error for creating ImageNet LMDB #1

Error for creating ImageNet LMDB #1

Comments

wlw208dzy commented Jun 11, 2016 • edited Loading

lim0606 commented Jun 11, 2016 • edited Loading

wlw208dzy commented Jun 11, 2016 • edited Loading

lim0606 commented Jun 11, 2016

wlw208dzy commented Jun 11, 2016 •

edited

Loading

lim0606 commented Jun 11, 2016 •

edited

Loading

wlw208dzy commented Jun 11, 2016 •

edited

Loading