-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add rfc for Routing Agent #310
Conversation
Assigned as a feature in #308 - please approve PR. |
@haim-barad thank you for your proposal. Would you kindly add alternatives considering -- any existing open source projects in this space. Perhaps OPEA can re-use instead of build. Perhaps you have noticed some missing features and you might want to contribute it to a project that we could reuse. Some options are listed at https://github.com/Not-Diamond/awesome-ai-model-routing#intelligent-ai-model-routing. Support for discovering inference endpoints, and obtaining meta data about them perhaps from a model card to help determine best match for an incoming request like cost, access latency, query specific (math based or healthcare, finance) etc. See https://docs.withmartian.com/martian-model-router which even mentions migrating away from inference endpoints with degraded performance, and integrating new models. The more details you can provide upfront at a high level, more folks can chime in before coding begins. |
Our code is already 95% ready and is based on the open source framework of RouteLLM. We also have plans to incorporate a semantic router and future features to help route between:
Our sources cite the works of RouteLLM and more as appropriately incorporated into our routing agent. |
@haim-barad @mkbhanda @ftian1 thank you a lot for addressing the problem. Please help to review the RFC. thank you |
Is there a reason this PR is awaiting review still? Do we need to simultaneously submit our code (which is ready for first release)? |
@haim-barad what you mention in this conversation is missing in the RFC. Please add. I shall approve. |
DCO issue too |
Code not necessary |
Link to RouteLLM added in RFC. Signed off with DCO. Does it take time to recognize DCO? |
Signed-off-by: Haim Barad <haim.barad@intel.com>
* Create index.rst * Update index.rst Signed-off-by: Haim Barad <haim.barad@intel.com>
* Update index.rst Added "Moving from OpenAI to Opensource using OPEA" blog post * Updated index.rst with commit message * Updated index.rst with the right dates. Signed off by: chrisahsiong23 <chris2397as@gmail.com> * Update index.rst to pass DCO Signed-off-by: chrisahsiong23 <chris.ah-siong@intel.com> --------- Signed-off-by: chrisahsiong23 <chris.ah-siong@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
…ct#332) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
…ject#321) Fixes opea-project#181 Co-authored-by: Ghosh, Soumyadip <soumyadip.ghosh@intel.com> Signed-off-by: Piroozan, Nariman <nariman.piroozan@intel.com> Signed-off-by: Jaini, Pallavi <pallavi.jaini@intel.com> Signed-off-by: Kavulya, Soila <soila.kavulya@intel.com> Signed-off-by: Rajabose, Shifani <shifani.rajabose@intel.com> Signed-off-by: Shifani Rajabose <srajabose@habana.ai> Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
Signed-off-by: Katherine Druckman <katherine.druckman@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com> Signed-off-by: Katherine Druckman <katherine.druckman@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com> Signed-off-by: Katherine Druckman <katherine.druckman@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
* Fix the URL for add_vectorDB.md Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> * minor formattingupdate to CONTRIBUTING.md Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> --------- Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
Signed-off-by: Yu Wang <yu.wang6@amd.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
* [RFC] OPEA Inference Microservices (OIM) Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> * review fixes Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> * review fixes Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> * review fixes Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> * RFC: add OIM operator diagram Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> * review fixes, add picture Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> --------- Signed-off-by: Sakari Poussa <sakari.poussa@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
* doc:Add emeritus code owners page Signed-off-by: Wang,Le3 <le3.wang@intel.com> * remove lines Signed-off-by: Wang,Le3 <le3.wang@intel.com> --------- Signed-off-by: Wang,Le3 <le3.wang@intel.com> Signed-off-by: Haim Barad <haim.barad@intel.com>
Signed-off-by: Haim Barad <haim.barad@intel.com>
DCO successful. Please accept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What types of input this particular solution is intended for? I.e. what OPEA apps could benefit from it; ChatQnA, AudioQnA, VisualQnA, DocSum...?
The router is a decision maker (classifier). Currently, it supports text based prompts and will make decisions regarding complexity of prompts. We expect it to be used initially with chat based apps, but there's no limit depending on how the model is constructed and we expect this be useful in many scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it supports text based prompts and will make decisions regarding complexity of prompts.
Maybe RFC could mention that it's for text based LLM prompts?
3cfae1e
to
c48592a
Compare
Signed-off-by: Haim Barad <haim.barad@intel.com>
Signed-off-by: Haim Barad <haim.barad@intel.com>
Now mention text-based inputs. Please approve. (I still need 2 more approvals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. looking forward to the PRs.
How does this co-operate with K8s routing solutions such as Gateway API for LLMs and service level load balancers, e.g., here? Is this additional solution for those mentioned above, or replacement? It is also unclear what use case this solution is solving. |
Use-case is cost-optimization. Getting better latency with weaker model/HW, by using cheaper (to run) model when such is deemed enough for given prompt. However, that implies potential risk of low / inconsistent quality replies though, would probably require better testing than what OPEA currently has (real life prompts), and otherwise smells a bit of premature optimization. There are IMHO other improvements that should be done for OPEA first, both to improve service latency & utilization of the already available HW, and error handling in stress situations. Adding this kind of routing unconditionally would complicate fixing those. I don't think that's necessary a blocker for merging the RFC though. Merging implementation for the RFC can be delayed until OPEA is performance & deployment wise otherwise in good shape. |
I actually have a different take on the optimization:
However, I will say that we've developed the features of this router with simplicity in mind. The router is a simple classifier making a decision. It's not nearly as complex as an LLM and while it does provide benefit, it's not an essential part of the workflow when developing/debugging. Testing of the router can be done independently. I like the discussion though... |
Thanks, that was a great response! I hadn't even considered that it could transparently route queries to another cluster, that's a good edge use-case. A higher level alternative e.g. to using PoCL remote with OneAPI driver for AI workloads:
Those results depend a lot on what it was tested on / how much training prompts differ from quality testing prompts. Are those sets available for evaluation?
LLM routing could interfere with LLM scale-up routing optimizations like prefix caching: https://www.kubeai.org/blog/2025/02/26/llm-load-balancing-at-scale-chwbl/ But if it can be turned off when it does not help, that's OK. |
MTBench was used for the accuracy and performance claim. Yes, it's available. But even better would be using quality tools on the customer's actual data. It could query both models during a testing phase and determine when the weaker model gave a good enough answer. Currently - we have the matrix factorization model with the embedding layer (only) updated for Huggingface embeddings - this makes it more useful to remove the OpenAI dependency. Additionally, we have a method to fully train our own matrix factorization model based on customer data (a future feature) so we can have even higher quality decision making. I see the merging is blocked - how is this conversation resolved? I like the back and forth, but I really want to understand if something is really blocking the merger. |
Agreed! Currently, all you need another gatekeeper to approve your PR and once that's in place, we can merge it. |
I'm not myself a gatekeeper in this project, I'm just reviewing it. That and comments from other non-gatekeepers, is just input for the required 2 gatekeeper approvals i.e. people with write access (shield icon in reviewer lists?). |
hi @haim-barad, Dose the routing Agent use the GenAIComps agent component ? where do you want to put the routing agent, GenAIComps or GenAIExamples? |
We plan on the router code to go into GenAIComps and some examples to go into GenAIExamples (Jupyter notebooks). |
RFC for Routing Agent