You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
No.
Describe the solution you'd like
Additional Zero Shot models; such as Grounding DINO. Maybe Detectron2 or Segment Anything. However, Grounding DINO - which is promptable - would be great.
Describe alternatives you've considered
n/a
Additional context
The Grounding DINO model is promptable and apparently scores higher than CLIP.
The text was updated successfully, but these errors were encountered:
I think Segment Anything / Grounding DINO are creating more restrictive embeddings due to their promptable nature (more focused training data). In other words, CLIP on its own allows you to search using more "obscure" language, while others might be restricted to more common words (car, sky, bird, face etc.)
We're preparing an update which also allows the use of GPT-Vision and LLaVA-like models that would allow you to ingest and prompt directly too.
Is your feature request related to a problem? Please describe.
No.
Describe the solution you'd like
Additional Zero Shot models; such as Grounding DINO. Maybe Detectron2 or Segment Anything. However, Grounding DINO - which is promptable - would be great.
Describe alternatives you've considered
n/a
Additional context
The Grounding DINO model is promptable and apparently scores higher than CLIP.
The text was updated successfully, but these errors were encountered: