[Bugfix]: only log extra fields for successfully deserialized objects #11367

gcalmettes · 2024-12-20T09:06:36Z

This PR prevents unnecessary warning logging of detected extra fields present in the request for objects that are not validated by Pydantic.

In #10463 we allowed extra fields to be present in the request payload to comply with the OpenAI behavior. In order to still provide the users useful debug log if necessary, any extra field detected is warning-logged.

However, with the current implementation of the extra-field logging process, the extra fields detected are logged even if the pydantic validation process does not go through: this is a problem for objects that are tested against multiple BaseModel, as we log the detected extra fields, even though they are not relevant.

For example, when calling the /v1/embeddings endpoint, the request payload will be validated for a EmbeddingRequest object, which can be either a EmbeddingCompletionRequest or a EmbeddingChatRequest object.
This leads to warning logs being generated even for valid embedding requests, since input which is a valid field for a EmbeddingCompletionRequest is not a valid field for a EmbeddingChatRequest (which expects the messages field instead), which can mislead the user that something was wrong with their perfectly fine request.

# curl -X POST  -H "Content-Type: application/json; charset=utf-8"  --url "http://localhost:8003/v1/embeddings"  --data '{"model": "bge-multilingual-gemma2", "input": ["Hello world"]}'

WARNING 12-20 00:55:43 protocol.py:52] The following fields were present in the request but ignored: {'input'}

This PR changes the logic of logging those extra fields, by only logging the extra fields if the deserialization went through (object validated). This will only log the extra fields for validated objects:

Example with the following payload
{"model": "bge-multilingual-gemma2", "input": ["Hello world"], "another_field": "not_supported"}

BEFORE (detects extra fields both for the EmbeddingCompletionRequest and EmbeddingChatRequest objects):

WARNING 12-20 00:59:23 protocol.py:52] The following fields were present in the request but ignored: {'another_field'}
WARNING 12-20 00:59:23 protocol.py:52] The following fields were present in the request but ignored: {'another_field', 'input'}

AFTER (detects extra fields only for EmbeddingCompletionRequest which was validated):

WARNING 12-20 01:04:43 protocol.py:59] The following fields were present in the request but ignored: {'another_field'}

cc: @mgoin @DarkLight1337

github-actions · 2024-12-20T09:06:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

DarkLight1337 · 2024-12-20T11:39:43Z

This feels a bit hacky, is this a common pattern in Pydantic? I think that the assumption that the two methods are being called in sequence might not be valid in a multi-threaded setting.

gcalmettes · 2024-12-20T11:47:08Z

This feels a bit hacky, is this a common pattern in Pydantic? I am not sure whether this implementation is thread safe or not.

Oh yes, it is definitely a bit hacky !

The usual behavior in Pydantic is that the extra fields are silently ignored, since they are not part of the schema.

The need to log the extra fields was just to provide a way to allow the user with a log stating which fields were ignored, in order to debug a request (for example if the user had a typo in a field). And as such we implemented the hack described in the pydantic repo (see here). However this hack does not take in account objects that are union of several models.

gcalmettes · 2025-01-29T09:48:38Z

closing this since #12420 fixes it way more elegantly

mergify bot added the frontend label Dec 20, 2024

fix: only log extra fields for successfully deserialized objects

fc75172

Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>

gcalmettes force-pushed the fix/extra-fields-logging branch from bdfe0af to fc75172 Compare December 20, 2024 09:14

gcalmettes closed this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix]: only log extra fields for successfully deserialized objects #11367

[Bugfix]: only log extra fields for successfully deserialized objects #11367

gcalmettes commented Dec 20, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

gcalmettes commented Dec 20, 2024

gcalmettes commented Jan 29, 2025

[Bugfix]: only log extra fields for successfully deserialized objects #11367

[Bugfix]: only log extra fields for successfully deserialized objects #11367

Conversation

gcalmettes commented Dec 20, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 20, 2024

DarkLight1337 commented Dec 20, 2024 • edited Loading

gcalmettes commented Dec 20, 2024

gcalmettes commented Jan 29, 2025

gcalmettes commented Dec 20, 2024 •

edited by github-actions bot

Loading

DarkLight1337 commented Dec 20, 2024 •

edited

Loading