-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[FR] Support Dynamic PMU detection #1377
Comments
i think @mtrofin was looking at something similar... |
OK, let me know if there is already something in progress, I think I might be able to get something into a PR form by the weekend if you guys like it? |
I was looking to build internal switching, i.e. assuming the limit is N, but the user wants P = kN+r counters, allow them to specify P counters and then, internally, execute the workload k+1 times. I believe this FR is orthogonal. One recommendation: please ensure the storage in PerfCounterValues is still inlined, to avoid risk of additional cache misses. |
@mtrofin thanks for pinging back. Is this something you started work on or just planning to at this point in time? |
I don't have anything done. |
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" adn dynamically allocate slots according to the specific counter requested - In this PR/commit, it is still impossible to get more detailed (x86) information about each specific counter properties in terms of being a fixed/non-fixed counter, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters was bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate 64 in-place counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" adn dynamically allocate slots according to the specific counter requested - In this PR/commit, it is still impossible to get more detailed (x86) information about each specific counter properties in terms of being a fixed/non-fixed counter, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters was bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate 64 in-place counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" and dynamically allocate "slots" according to the specific counters requested. - per-PMU information is obtained, where each PMU reports its own capabilities in the form of fixed/non-fixed counter limits. - In this PR/commit, it is *still* impossible to get more detailed (x86-only) counter information in terms of fixed/non-fixed counter association, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters is bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate/inlline up-to 64 counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" and dynamically allocate "slots" according to the specific counters requested. - per-PMU information is obtained, where each PMU reports its own capabilities in the form of fixed/non-fixed counter limits. - In this PR/commit, it is *still* impossible to get more detailed (x86-only) counter information in terms of fixed/non-fixed counter association, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters is bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate/inlline up-to 64 counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" and dynamically allocate "slots" according to the specific counters requested. - per-PMU information is obtained, where each PMU reports its own capabilities in the form of fixed/non-fixed counter limits. - In this PR/commit, it is *still* impossible to get more detailed (x86-only) counter information in terms of fixed/non-fixed counter association, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters is bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate/inlline up-to 64 counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" and dynamically allocate "slots" according to the specific counters requested. - per-PMU information is obtained, where each PMU reports its own capabilities in the form of fixed/non-fixed counter limits. - In this PR/commit, it is *still* impossible to get more detailed (x86-only) counter information in terms of fixed/non-fixed counter association, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters is bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate/inlline up-to 64 counter slots (64-bits each) per measurement instance - Closes google#1377
- Instead of allowing for up to 3 counters, libpfm's internal capabilities of reporting PMU info are used to manage a per-PMU "registry" and dynamically allocate "slots" according to the specific counters requested. - per-PMU information is obtained, where each PMU reports its own capabilities in the form of fixed/non-fixed counter limits. - In this PR/commit, it is *still* impossible to get more detailed (x86-only) counter information in terms of fixed/non-fixed counter association, due to what seems to be a lack of API surface on libpfm itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/ - The maximal number of counters is bumped from 3 to 63, which together with the current padding "scheme" means we pre-allocate/inlline up-to 64 counter slots (64-bits each) per measurement instance - Closes google#1377
Currently, on modern HW, where multiple PMU counters can be recorded for single run (example: Icaleake with 16 concurrent PMU counters, the code
perf_counters.cc
hard codes a limit of 3 counters globally.I'd like to use libpfm's internal API to detect at runtime the PMU that each requested counter is associated with,
and internally track how many counters are "consumed" from each PMU given the information retrieved from calling
pfm_get_pmu_info()
instead of the current hard-coded limit of 3 built into the code.I opening this issue in preparation of providing a PR that would implement such logic, and wanted to see if this is
something that needs more discussion / blessing before submitting a PR.
I have already started some preliminary work on tracking the requested counters vs. the availability of each PMU.
The text was updated successfully, but these errors were encountered: