Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

community[minor]: GLiNER link extraction #24314

Merged
merged 6 commits into from
Jul 19, 2024

Conversation

bjchambers
Copy link
Contributor

  • Description: This allows extracting links between documents with common named entities using GLiNER.
  • Issue: None
  • Dependencies: None

This allows extracting links between documents with common named
entities using [GLiNER](https://github.com/urchade/GLiNER).
Copy link

vercel bot commented Jul 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jul 19, 2024 3:21pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. community Related to langchain-community 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Jul 16, 2024
@eyurtsev eyurtsev self-assigned this Jul 16, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jul 18, 2024
@eyurtsev eyurtsev added the waiting-on-author PR Status: Confirmation from author is required label Jul 18, 2024
@eyurtsev
Copy link
Collaborator

Looks good -- could you update the in code doc-strings and we can merge?

@bjchambers bjchambers requested a review from eyurtsev July 18, 2024 15:10
Link.bidir(kind="entity:Award", tag="European Golden Shoes"),
Link.bidir(kind="entity:Competitions", tag="European\nChampionship"),
Link.bidir(kind="entity:Award", tag="UEFA Men's\nPlayer of the Year Awards"),
Link.bidir(kind="entity:Date", tag="5 February 1985"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these tags be difficult to use in practice? Dates usually benefit from ISO-8601 formatting so one can do comparison tests with them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tags currently only support equality, so not particularly. These edges would be most useful for something like "what other things happened on this day".

I based this on an example from GLiNER for extracting named entities. In practice, the other kinds of entities are likely more useful (awards, competitions, etc.) for linking related content.

Copy link
Collaborator

@eyurtsev eyurtsev Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality would have the same issue -- 5 February 1985 vs. 1985-02-05.

Maybe there's a need for another component at some point to standardize fields that can be standardized?

Dealing with dates is obviously non trivial since it can reference something like "last monday" in a document -- this isn't something we'd expand the entity detection model to be able to do

@eyurtsev
Copy link
Collaborator

Resolved merge conflicts

extractor = GLiNERLinkExtractor(
labels=["Person", "Award", "Date", "Competitions", "Teams"]
)
results = extractor.extract_one("some long text...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of example labels is helpful!

To make this even clearer it would be nice to see an example sentence with actual example results.

The difficult for users is going to be trying to understand what type of transformation this code does (what goes in and what comes out), to figure out when/if they should use it.


Example:

.. code_block: python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting here is very strict -- this needs to be .. code-block:: python (note usage of - and double ::) followed by a new line

@eyurtsev
Copy link
Collaborator

Updated the formatting of the doc-strings to use code block correctly

@eyurtsev eyurtsev changed the title community: GLiNER link extraction community[minor]: GLiNER link extraction Jul 19, 2024
@eyurtsev eyurtsev enabled auto-merge (squash) July 19, 2024 13:51
@eyurtsev eyurtsev merged commit 83f3d95 into langchain-ai:master Jul 19, 2024
43 checks passed
@bjchambers bjchambers deleted the link-extractor-gliner branch July 19, 2024 15:43
olgamurraft pushed a commit to olgamurraft/langchain that referenced this pull request Aug 16, 2024
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
community Related to langchain-community 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging. size:L This PR changes 100-499 lines, ignoring generated files. waiting-on-author PR Status: Confirmation from author is required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants