Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Question]: Support rewirte function adelete_by_doc_id #926

Open
2 tasks done
FeHuynhVI opened this issue Feb 23, 2025 · 1 comment
Open
2 tasks done

[Question]: Support rewirte function adelete_by_doc_id #926

FeHuynhVI opened this issue Feb 23, 2025 · 1 comment
Labels
question Further information is requested

Comments

@FeHuynhVI
Copy link

FeHuynhVI commented Feb 23, 2025

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Hello everyone,

I am currently researching and rewriting the adelete_by_doc_id function, but it is not working as expected.

When I insert data, the following files are generated:

  • kv_store_doc_status.json
  • kv_store_full_docs.json
  • kv_store_llm_response_cache.json
  • kv_store_text_chunks.json
  • vdb_chunks.json
  • vdb_entities.json
  • vdb_relationships.json

However, after performing a delete operation, only kv_store_doc_status.json and kv_store_full_docs.json are cleared.
The other files, such as kv_store_text_chunks.json, vdb_chunks.json, vdb_entities.json, and vdb_relationships.json, still retain data.

When I execute the delete function, the log output looks like this:

Entity "CHƯƠNG TRÌNH HỖ TRỢ SINH VIÊN KHUYẾT TẬT" will be updated with new source_id: chunk-162bce74c760fe66c99533a859f0a99
...
Relationship "ĐẠI HỌC FPT"-"HỌC PHÍ" will be updated with new source_id: chunk-0394bc729a1a95b5ed78520c45a90f0f<SEP>chunk-07faeca75ea529d68bebc9e1091eec8d<SEP>chunk-f39536497bcaec92d0ac376e4b61683a<SEP>chunk-4362f624b29f410655a1368526e76f77<SEP>chunk-c39de74fa236f94e2d915488411f73e0<SEP>chunk-490fa4d198f8570338fe53b9743a508c<SEP>chunk-e73aeb5552144dcf82d5fbc81212cdf2<SEP>chunk-62e09ccb2c3dd69d68ceba6a3cc955b3<SEP>chunk-2f7b29bb73884c282149cce1dceda279<SEP>chunk-0ac6c7ccc71c9a895860e2c25ab1c2de<SEP>chunk-c0806ff190e6a16a376d3227cde6624d<SEP>chunk-e107bba564834ace74c34f2be9e7627e<SEP>chunk-137923f6465df5c0693bb192b9f00f08<SEP>chunk-2b335583f7fd18045987663ac98a9319<SEP>chunk-bf19d4b9a24dbc166ffff75f36238200<SEP>chunk-d914e3cf67ebc677cd073f8d98c8a370<SEP>chunk-963a8aded463f18dc584fe3cce413325<SEP>chunk-c3e78e09392f89cb9794a172a16fca5d
...
Updated entity "CÂU HỎI 5" with new source_id: chunk-102a463cac1fa32c7a584bac79b093ca
...
Updated relationship "TOP 40 THPT NĂM 2025"-"ĐIỂM LỚP 11" with new source_id: chunk-65890d78138df035f0f1b26eacbacc4a
...

It seems like the chunks are not being deleted from the database.

I have already created an issue for this, but it has not been fixed yet.
Since I frequently need to delete documents and update them for my project, I usually have to wipe all data and reinsert everything, which is very costly and time-consuming.

To fix this, I am investigating and modifying the function. I suspect that the issue is with this line:

chunks = await self.text_chunks.get_by_id(doc_id)

I replaced "doc" with "chunk" to see if it works:

chunks = await self.text_chunks.get_by_id(doc_id.replace("doc", "chunk"))

However, the chunks are still not being deleted from the database.

If anyone has experience with this issue, please share your insights.
I would greatly appreciate any help!

Additional Context

No response

@FeHuynhVI FeHuynhVI added the question Further information is requested label Feb 23, 2025
@chain-qq
Copy link

me too

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants