-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
community[minor]: GLiNER link extraction #24314
community[minor]: GLiNER link extraction #24314
Conversation
bjchambers
commented
Jul 16, 2024
- Description: This allows extracting links between documents with common named entities using GLiNER.
- Issue: None
- Dependencies: None
This allows extracting links between documents with common named entities using [GLiNER](https://github.com/urchade/GLiNER).
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
libs/community/langchain_community/graph_vectorstores/extractors/gliner_link_extractor.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/graph_vectorstores/extractors/gliner_link_extractor.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/graph_vectorstores/extractors/gliner_link_extractor.py
Show resolved
Hide resolved
libs/community/langchain_community/graph_vectorstores/extractors/gliner_link_extractor.py
Show resolved
Hide resolved
Looks good -- could you update the in code doc-strings and we can merge? |
Link.bidir(kind="entity:Award", tag="European Golden Shoes"), | ||
Link.bidir(kind="entity:Competitions", tag="European\nChampionship"), | ||
Link.bidir(kind="entity:Award", tag="UEFA Men's\nPlayer of the Year Awards"), | ||
Link.bidir(kind="entity:Date", tag="5 February 1985"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these tags be difficult to use in practice? Dates usually benefit from ISO-8601 formatting so one can do comparison tests with them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tags currently only support equality, so not particularly. These edges would be most useful for something like "what other things happened on this day".
I based this on an example from GLiNER for extracting named entities. In practice, the other kinds of entities are likely more useful (awards, competitions, etc.) for linking related content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equality would have the same issue -- 5 February 1985 vs. 1985-02-05.
Maybe there's a need for another component at some point to standardize fields that can be standardized?
Dealing with dates is obviously non trivial since it can reference something like "last monday" in a document -- this isn't something we'd expand the entity detection model to be able to do
Resolved merge conflicts |
extractor = GLiNERLinkExtractor( | ||
labels=["Person", "Award", "Date", "Competitions", "Teams"] | ||
) | ||
results = extractor.extract_one("some long text...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of example labels is helpful!
To make this even clearer it would be nice to see an example sentence with actual example results.
The difficult for users is going to be trying to understand what type of transformation this code does (what goes in and what comes out), to figure out when/if they should use it.
|
||
Example: | ||
|
||
.. code_block: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting here is very strict -- this needs to be .. code-block:: python
(note usage of -
and double ::
) followed by a new line
Updated the formatting of the doc-strings to use code block correctly |
- **Description:** This allows extracting links between documents with common named entities using [GLiNER](https://github.com/urchade/GLiNER). - **Issue:** None - **Dependencies:** None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>