Fix error showing time spent in llama perf context print #1898
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses the issue reported here: #1830 - After the 0.3.0 update, llama_perf_context_print() failed to correctly display inference time, tokens per second, and other related data.
After some investigation, I found that this change: f8fcb3e, caused this issue due to a commit in the upstream llama.cpp repo: ggml-org/llama.cpp@0abc6a2. When designing the
no_perf
parameter, although it defaults to false, it is set to true in the llama_context_default_params() function for external program calls, leading to incorrect calculation of performance metrics when calling llama_synchronize(). As a result, llama-cpp-python displays incorrect information when using llama_perf_context_print().In addition to adding the
no_perf
field inllama_cpp.py
, we should also setno_perf
to false inllama.py
. Since llama-cpp-python project always callsllama_perf_context_print()
during usage, I don't see a reason not to collect this information. Of course, if we want to maintain consistency with llama.cpp's settings, we can add an API to allow users to set theno_perf
value, providing a way to toggle performance statistics.