Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error when processing training images #4

Closed
blixt opened this issue Nov 21, 2015 · 13 comments
Closed

Error when processing training images #4

blixt opened this issue Nov 21, 2015 · 13 comments

Comments

@blixt
Copy link
Contributor

blixt commented Nov 21, 2015

Edit: See @susemeee's comment below (the image COCO_train2014_000000167126.jpg is corrupted, and you can download a replacement at https://msvocds.blob.core.windows.net/images/262993_z.jpg)


I was trying to run prepro.py but eventually ran into an issue in scipy's pilutil package (see below).

I've installed all dependencies, run the coco_preprocess.ipynb, and downloaded train2014.zip + val2014.zip and extracted them into coco/images.

Am I missing something?

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5
parsed input parameters:
{
  "output_json": "coco/cocotalk.json",
  "images_root": "coco/images",
  "input_json": "coco/coco_raw.json",
  "word_count_threshold": 5,
  "max_length": 16,
  "output_h5": "coco/cocotalk.h5",
  "num_test": 5000,
  "num_val": 5000
}
example processed tokens:
['a', 'woman', 'riding', 'a', 'bike', 'down', 'a', 'bike', 'trail']
... lots of info deleted for brevity ...
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size  (616767, 16)
processing 0/123287 (0.00% done)
... lots of percentages deleted for brevity ...
processing 60000/123287 (48.67% done)
Traceback (most recent call last):
  File "prepro.py", line 236, in <module>
    main(params)
  File "prepro.py", line 186, in main
    Ir = imresize(I, (256,256))
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 424, in imresize
    im = toimage(arr, mode=mode)
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 234, in toimage
    raise ValueError("'arr' does not have a suitable array shape for "
ValueError: 'arr' does not have a suitable array shape for any mode.
@karpathy
Copy link
Owner

thank you posting an issue.

poof! I'm not quite sure what's up here. One strategy to follow here is to print the filenames as they are being processed, and then manually look at the filename that failed. Presumably there is something wrong with its encoding. Could you report what the filename is? And can you try opening it in some image editing program, saving it back to a jpg, replacing the original, and rerunning?

@blixt
Copy link
Contributor Author

blixt commented Nov 21, 2015

Okay, I'm going to rerun the script now with the following change in place:

diff --git a/prepro.py b/prepro.py
index ea581da..7440963 100644
--- a/prepro.py
+++ b/prepro.py
@@ -183,7 +183,11 @@ def main(params):
   for i,img in enumerate(imgs):
     # load the image
     I = imread(os.path.join(params['images_root'], img['file_path']))
-    Ir = imresize(I, (256,256))
+    try:
+        Ir = imresize(I, (256,256))
+    except:
+        print 'failed resizing image %s' % (img['file_path'],)
+        raise
     # handle grayscale input images
     if len(Ir.shape) == 2:
       Ir = Ir[:,:,np.newaxis]

@blixt
Copy link
Contributor Author

blixt commented Nov 21, 2015

Okay here's the error I got:

 failed resizing image train2014/COCO_train2014_000000167126.jpg

The file is definitely corrupted (the file size is too small and most of it is just gray). The question is if it got corrupted during decompression on my side or if it's actually like that in the original train2014.zip file. I'll try to investigate.

The image at mscoco.org: http://mscoco.org/explore/?id=167126

@susemeee
Copy link

Note : The image COCO_train2014_000000167126.jpg was also corrupted on my side, so I put fresh new copy from http://mscoco.org/explore/?id=167126 and the preprocessing worked.

https://msvocds.blob.core.windows.net/images/262993_z.jpg is an actual URL of the image.

@karpathy
Copy link
Owner

Thank you! I am going to close this issue and adjust the documentation to point to it.

@cdluminate
Copy link

Hello, I wrote some scripts which may be useful:
https://github.com/CDLuminate/cocofetch

script check_jpeg.py finds out broken jpegs and non-jpegs.

@Tejeshwarabm
Copy link

Tejeshwarabm commented Jan 29, 2018

47: 7 0.001135%
48: 1 0.000162%
49: 4 0.000649%
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size (616767, 16)
Traceback (most recent call last):
File "prepro.py", line 245, in
main(params)
File "prepro.py", line 190, in main
I = imread(os.path.join(params['images_root'], img['file_path']))
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/utils.py", line 101, in newfunc
return func(*args, **kwds)
File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 164, in imread
im = Image.open(name)
File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2543, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000000152328.jpg'

Should we need to download the images?

@heartraeh
Copy link

@Tejeshwarabm
I have the same problem as you, have you solved it? Can you share your experience?

@mymuli
Copy link

mymuli commented Apr 4, 2019

遇到了同样 的问题,不知道怎么解决....

@mymuli
Copy link

mymuli commented Apr 5, 2019

针对这个错误:IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000000152328.jpg'
我找到了解决办法...在coco文件夹下面,新建images文件夹,再在images文件夹下面新建train2014文件夹和val2014文件夹,分别放置训练集的图片就好了...

To address this error: IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000152328.jpg'
I found a solution... Under the coco folder, create a new image folder, and then under the image folder, create a new train2014 folder and a new val2014 folder, respectively, to place the training set pictures.

@sssilence
Copy link

针对这个错误:IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000000152328.jpg'
我找到了解决办法...在coco文件夹下面,新建images文件夹,再在images文件夹下面新建train2014文件夹和val2014文件夹,分别放置训练集的图片就好了...

To address this error: IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000152328.jpg'
I found a solution... Under the coco folder, create a new image folder, and then under the image folder, create a new train2014 folder and a new val2014 folder, respectively, to place the training set pictures.

您好,请问您的意思是 data/coco/images/train2014和val2014吗?

@sssilence
Copy link

When I run "python2 scripts/prepro_feats.py --input_json data/dataset_coco.json --output_dir data/cocotalk", I got this error:
IOError: [Errno 2] No such file or directory: u'val2014/COCO_val2014_000000391895.jpg'
So What should I do to solve that?
Thank you!

@Mayurji
Copy link

Mayurji commented Dec 29, 2019

I have been getting FileNotFound Error for 524 images in train2014, so i have written the script to remove those corrupted images from train2014 and its dependences in other json files.

We can do the same for val2014, if the files are missing in that folder, only the folder names needs to be changed.

https://github.com/Mayurji/Computer-Vision/blob/master/Image_Captioning/missingFile_cocodataset.py

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants