Fix error showing time spent in llama perf context print #1898

shakalaca · 2025-01-18T06:35:45Z

This PR addresses the issue reported here: #1830 - After the 0.3.0 update, llama_perf_context_print() failed to correctly display inference time, tokens per second, and other related data.

After some investigation, I found that this change: f8fcb3e, caused this issue due to a commit in the upstream llama.cpp repo: ggml-org/llama.cpp@0abc6a2. When designing the no_perf parameter, although it defaults to false, it is set to true in the llama_context_default_params() function for external program calls, leading to incorrect calculation of performance metrics when calling llama_synchronize(). As a result, llama-cpp-python displays incorrect information when using llama_perf_context_print().

In addition to adding the no_perf field in llama_cpp.py, we should also set no_perf to false in llama.py. Since llama-cpp-python project always calls llama_perf_context_print() during usage, I don't see a reason not to collect this information. Of course, if we want to maintain consistency with llama.cpp's settings, we can add an API to allow users to set the no_perf value, providing a way to toggle performance statistics.

Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements.

…ontext_print

shakalaca and others added 3 commits January 18, 2025 10:37

feat: Sync with llama.cpp

83ed554

Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements.

fix: Display performance metrics by default

d2f0d12

Merge branch 'main' into fix_error_showing_time_spent_in_llama_perf_c…

4e167f3

…ontext_print

abetlen merged commit 4442ff8 into abetlen:main Jan 29, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error showing time spent in llama perf context print #1898

Fix error showing time spent in llama perf context print #1898

shakalaca commented Jan 18, 2025

Fix error showing time spent in llama perf context print #1898

Fix error showing time spent in llama perf context print #1898

Conversation

shakalaca commented Jan 18, 2025