Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bugfix]: only log extra fields for successfully deserialized objects #11367

Closed

Conversation

gcalmettes
Copy link
Contributor

@gcalmettes gcalmettes commented Dec 20, 2024

This PR prevents unnecessary warning logging of detected extra fields present in the request for objects that are not validated by Pydantic.

In #10463 we allowed extra fields to be present in the request payload to comply with the OpenAI behavior. In order to still provide the users useful debug log if necessary, any extra field detected is warning-logged.

However, with the current implementation of the extra-field logging process, the extra fields detected are logged even if the pydantic validation process does not go through: this is a problem for objects that are tested against multiple BaseModel, as we log the detected extra fields, even though they are not relevant.

For example, when calling the /v1/embeddings endpoint, the request payload will be validated for a EmbeddingRequest object, which can be either a EmbeddingCompletionRequest or a EmbeddingChatRequest object.
This leads to warning logs being generated even for valid embedding requests, since input which is a valid field for a EmbeddingCompletionRequest is not a valid field for a EmbeddingChatRequest (which expects the messages field instead), which can mislead the user that something was wrong with their perfectly fine request.

# curl -X POST  -H "Content-Type: application/json; charset=utf-8"  --url "http://localhost:8003/v1/embeddings"  --data '{"model": "bge-multilingual-gemma2", "input": ["Hello world"]}'

WARNING 12-20 00:55:43 protocol.py:52] The following fields were present in the request but ignored: {'input'}

This PR changes the logic of logging those extra fields, by only logging the extra fields if the deserialization went through (object validated). This will only log the extra fields for validated objects:

Example with the following payload
{"model": "bge-multilingual-gemma2", "input": ["Hello world"], "another_field": "not_supported"}

BEFORE (detects extra fields both for the EmbeddingCompletionRequest and EmbeddingChatRequest objects):

WARNING 12-20 00:59:23 protocol.py:52] The following fields were present in the request but ignored: {'another_field'}
WARNING 12-20 00:59:23 protocol.py:52] The following fields were present in the request but ignored: {'another_field', 'input'}

AFTER (detects extra fields only for EmbeddingCompletionRequest which was validated):

WARNING 12-20 01:04:43 protocol.py:59] The following fields were present in the request but ignored: {'another_field'}

cc: @mgoin @DarkLight1337

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Dec 20, 2024
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
@gcalmettes gcalmettes force-pushed the fix/extra-fields-logging branch from bdfe0af to fc75172 Compare December 20, 2024 09:14
@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 20, 2024

This feels a bit hacky, is this a common pattern in Pydantic? I think that the assumption that the two methods are being called in sequence might not be valid in a multi-threaded setting.

@gcalmettes
Copy link
Contributor Author

This feels a bit hacky, is this a common pattern in Pydantic? I am not sure whether this implementation is thread safe or not.

Oh yes, it is definitely a bit hacky !

The usual behavior in Pydantic is that the extra fields are silently ignored, since they are not part of the schema.

The need to log the extra fields was just to provide a way to allow the user with a log stating which fields were ignored, in order to debug a request (for example if the user had a typo in a field). And as such we implemented the hack described in the pydantic repo (see here). However this hack does not take in account objects that are union of several models.

@gcalmettes
Copy link
Contributor Author

closing this since #12420 fixes it way more elegantly

@gcalmettes gcalmettes closed this Jan 29, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants