Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Impossible to import data from local FS using bulk_import in docker-compose build #40571

Open
1 task done
EvilaMany opened this issue Mar 11, 2025 · 3 comments
Open
1 task done
Assignees
Labels
area/deployment any issues that related to deployment kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@EvilaMany
Copy link

EvilaMany commented Mar 11, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.5.6
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 64gb / 16cpu
docker-compose version of standalone milvus

Current Behavior

I tried to mount my folder with data to mulvus and minio containers. In /root/data, in /minio-data /minio-data/a-bucket (inside minio container) and many many other attempts.
Then I do
from pymilvus.bulk_writer import bulk_import
resp = bulk_import(url=url, collection_name=collection_name, files=bulk_files)
with filenames to import (parquet files)
But absolutely each attempt gives me
new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.: importing data failed (PreImportTask)

The parquest files was created with RemoteBulkWriter and uploaded to s3, after which downloaded manually to this new server.

There wasn't such error on importing when I started milvus using only single container, but reproduces only with docker-compose version
What I miss?

Expected Behavior

No response

Steps To Reproduce

Milvus Log

No response

Anything else?

No response

@EvilaMany EvilaMany added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 11, 2025
Copy link

dosubot bot commented Mar 11, 2025

I found a similar closed issue regarding the use of bulk import with Parquet files in Milvus, where only one file was effectively processed despite multiple file names being passed [1]. However, no solution was found in the comments of that issue.

For your specific problem, it seems like the error "parquet: could not retrieve footer offset" might be related to how the file paths and permissions are handled in the docker-compose environment. Here are a few things you might want to check:

  1. File Paths and Mounting: Ensure that the file paths are correctly mounted in the docker-compose setup. The best practice is to use the volumes key in the docker-compose.yml file to specify the local directory paths that should be mounted into the containers. For example, you should have something like this for the Milvus and MinIO containers:

    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
  2. File Path Format: When using the bulk_import function, ensure that each file path is wrapped in its own list within the main files list. The expected format is:

    files=[
        ['file1.parquet'],
        ['file2.parquet'],
        ['file3.parquet']
    ]

    Using a single flat list like files=['file1.parquet', 'file2.parquet', 'file3.parquet'] will not work [2].

  3. Permissions: Verify that the permissions for the mounted directories allow the Milvus and MinIO containers to read the files.

These steps should help address the issue you're facing with importing data using the docker-compose setup.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the area/deployment any issues that related to deployment label Mar 11, 2025
@yanliang567
Copy link
Contributor

As you deployed milvus with docker compose, you need to upload the files to minio bucket. could you please retry and collect the milvus logs

/assign @EvilaMany

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2025
@xiaofan-luan
Copy link
Collaborator

Key Difference between Standalone and Docker-Compose Milvus
Standalone Mode:

Parquet files can be imported directly from a local directory.
Local filesystem mounts (/root/data) work seamlessly.
Docker-Compose (Distributed) Mode:

Parquet files must be imported directly from MinIO (S3-compatible).
Milvus reads files from MinIO directly, not from the local filesystem.

This is precisely why you are getting the error:

new parquet reader failed, err=parquet: could not retrieve footer offset: The specified key does not exist.
The error above occurs when Milvus attempts to load Parquet files from MinIO, but it can't find them there (the keys/files do not exist at the specified path in MinIO).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/deployment any issues that related to deployment kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants