Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Inference with the same inputs changes to different output values after many model invocations #28824

Open
3 tasks done
Dobiasd opened this issue Feb 4, 2025 · 0 comments
Labels
bug Something isn't working support_request

Comments

@Dobiasd
Copy link

Dobiasd commented Feb 4, 2025

OpenVINO Version

2024.6.0

Operating System

Other (Please specify in description)

Device used for inference

CPU

Framework

None

Model used

ResNet50

Issue description

When running the same forward pass many times, the output suddenly changes to different values.

Here is a minimal (dockerized) example to reproduce the problem:

main.py

#!/usr/bin/env python3

import datetime
import logging
import threading
import time

import keras
import numpy as np
import openvino as ov
import requests
from flask import Flask
from flask_restful import Resource, Api
from openvino import CompiledModel


def load_model() -> CompiledModel:
    base = keras.applications.ResNet50(input_shape=(224, 224, 3), weights="imagenet", include_top=True)
    image_input = keras.layers.Input(shape=(224, 224, 3), name="input_layer")
    ov_model = ov.convert_model(keras.Model(inputs=image_input, outputs=base(image_input), name="image_model"))
    return ov.Core().compile_model(ov_model, "CPU")


np.random.seed(0)
image = np.random.rand(224, 224, 3)
model = load_model()
iteration = 0


class Controller(Resource):
    def get(self):
        preprocessed_images = [image.copy()]
        result = model(np.array(preprocessed_images))[0]
        global iteration
        iteration = iteration + 1
        print(f"{datetime.datetime.now()}: {iteration}, {result[0, 0]}", flush=True)


logging.getLogger("werkzeug").setLevel(logging.WARNING)
app = Flask("foo")
Api(app).add_resource(Controller, "/foo/")

threading.Thread(target=lambda: app.run(port=8080), daemon=True, name="foo").start()
time.sleep(1)
while True:
    requests.get("http://127.0.0.1:8080/foo", timeout=30)

Command:

docker build --rm --progress=plain .

Example output 1 (ran on Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz):

[...]
#8 15.25 2025-02-04 17:30:19.986595: 1, 0.00015759124653413892
#8 15.28 2025-02-04 17:30:20.018549: 2, 0.00015759124653413892
#8 15.31 2025-02-04 17:30:20.047465: 3, 0.00015759124653413892
[...]
#8 465.6 2025-02-04 17:37:50.374849: 14762, 0.00015759124653413892
#8 465.7 2025-02-04 17:37:50.406194: 14763, 0.00015759124653413892
#8 465.7 2025-02-04 17:37:50.436870: 14764, 0.00015759124653413892
#8 466.0 2025-02-04 17:37:50.735640: 14765, 0.00015886615437921137
#8 466.0 2025-02-04 17:37:50.770898: 14766, 0.00015886615437921137
#8 466.1 2025-02-04 17:37:50.800891: 14767, 0.00015886615437921137
[...]

Example output 2 (ran on Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz):

#8 15.12 2025-02-04 18:54:19.707492: 1, 0.00015759124653413892
#8 15.15 2025-02-04 18:54:19.738060: 2, 0.00015759124653413892
#8 15.18 2025-02-04 18:54:19.766186: 3, 0.00015759124653413892
[...]
#8 458.6 2025-02-04 19:01:43.206777: 14762, 0.00015759124653413892
#8 458.6 2025-02-04 19:01:43.236077: 14763, 0.00015759124653413892
#8 458.7 2025-02-04 19:01:43.265731: 14764, 0.00015759124653413892
#8 459.0 2025-02-04 19:01:43.566637: 14765, 0.0
#8 459.0 2025-02-04 19:01:43.600268: 14766, 0.0
#8 459.0 2025-02-04 19:01:43.629758: 14767, 0.0
[...]

Example output 3 (ran on Intel(R) Core(TM) i7-14700K)

[...]
#8 8.141 2025-02-04 17:43:08.254667: 1, 0.000157591188326478
#8 8.153 2025-02-04 17:43:08.266280: 2, 0.000157591188326478
#8 8.164 2025-02-04 17:43:08.277242: 3, 0.000157591188326478
[...]
#8 173.4 2025-02-04 17:45:53.514144: 14806, 0.000157591188326478
#8 173.4 2025-02-04 17:45:53.525131: 14807, 0.000157591188326478
#8 173.4 2025-02-04 17:45:53.536286: 14808, 0.000157591188326478
#8 173.5 2025-02-04 17:45:53.617902: 14809, 0.00015934312250465155
#8 173.5 2025-02-04 17:45:53.629846: 14810, 0.00016571232117712498
#8 173.5 2025-02-04 17:45:53.640727: 14811, 0.00016571232117712498
#8 173.5 2025-02-04 17:45:53.651473: 14812, 0.00016571232117712498
[...]

Example output 4 (ran on Intel(R) Core(TM) i7-14700K)

[...]
#8 8.591 2025-02-04 18:16:08.593288: 1, 0.000157591188326478
#8 8.601 2025-02-04 18:16:08.604064: 2, 0.000157591188326478
#8 8.612 2025-02-04 18:16:08.614221: 3, 0.000157591188326478
[...]
#8 169.6 2025-02-04 18:18:49.633181: 14854, 0.000157591188326478
#8 169.6 2025-02-04 18:18:49.643533: 14855, 0.000157591188326478
#8 169.7 2025-02-04 18:18:49.654100: 14856, 0.000157591188326478
#8 169.7 2025-02-04 18:18:49.728072: 14857, 1.0
#8 169.7 2025-02-04 18:18:49.740769: 14858, 1.0
#8 169.7 2025-02-04 18:18:49.751848: 14859, 1.0

[...]

This problem haunted us in PROD (more complex application, different model, and different data, of course), and it was quite some work to distill this minimal example. I hope it helps to find and fix the cause of the problem.

Step-by-step reproduction

No response

Relevant log output

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@Dobiasd Dobiasd added bug Something isn't working support_request labels Feb 4, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working support_request
Projects
None yet
Development

No branches or pull requests

1 participant