Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Handling 'point_cloud' Field in JSON Export Without Images: Import Error in Datumaro #1626

Closed
ainayves opened this issue Oct 3, 2024 · 4 comments
Assignees

Comments

@ainayves
Copy link

ainayves commented Oct 3, 2024

Hello, dear developers of Datumaro,

First, thank you for Datumaro, which I recently started using and find very interesting.

I have a question that has been on my mind for a few days.

Here are the steps I followed:

  1. Exporting a dataset from CVAT in Datumaro format without requiring the corresponding images to be re-downloaded.

  2. Running the following commands:

    datum project create
    datum project import -f datumaro <path to the JSON file>

  3. I then encountered the following error:

datumaro.components.errors.MediaTypeError: Unexpected media type of a dataset '<class 'datumaro.components.media.Image'>'.
Expected media type is '<class 'datumaro.components.media.PointCloud'>.
  1. Upon investigation, I found that the JSON file includes a "point_cloud" field when exporting without images and when the image has no annotations.
  "items": [
    {
      "id": "1713265728.502414164",
      "annotations": [],
      "attr": {
        "frame": 0
      },
      "point_cloud": {
        "path": ""
      }
    }
]
....
  1. I manually removed all "point_cloud" fields to make the import work.

My question is : Is there a way to automatically ignore the "point_cloud" field when using Datumaro? Or should I always manually remove it in cases of export without images? Alternatively, could you suggest a different approach?

Note : Sometimes, datasets annotated in CVAT can include thousands of images, so re-downloading them would be a huge time drain.

Thanks in advance for your help, and thank you again for this tool.

@sooahleex
Copy link
Contributor

Hi @ainayves, sorry for the late reply. I tried to reproduce your problem with our test asset with cvat format.

import datumaro as dm
test_path = "tests/assets/cvat_dataset/for_images/export_project"
dm_dataset = dm.Dataset.import_from(test_path, format="cvat")
dm_dataset.export("cvat2datum", format="datumaro")

And I tried the following commands you mentioned

datum project create
datum project import -f datumaro ~/workspace/datumaro/cvat2datum/annotations/Train.json

For me this command works well with the following results

2024-10-12 14:10:30,168 INFO: Checking source... 
2024-10-12 14:10:30,217 INFO: Source 'source-1' with format 'datumaro' has been added to the project

If you think the method I used was wrong, could you give me the dataset you used? Let me look at it again.

@ainayves
Copy link
Author

ainayves commented Oct 14, 2024

Thank you for your answer @sooahleex ,

In fact , I directly export the annotation in Datumaro format from CVAT

image

Then, I get this json , with "point_cloud" item , and the import command doesn't work :

default.json

@sooahleex
Copy link
Contributor

Hi @ainayves I updated to unread point cloud when images do not exist and point cloud too. This update will be included in the next release. Thank you for reporting this issue.

@ainayves
Copy link
Author

Thank you very much

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants