Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FR] Support Dynamic PMU detection #1377

Open
damageboy opened this issue Mar 24, 2022 · 5 comments · May be fixed by #1380
Open

[FR] Support Dynamic PMU detection #1377

damageboy opened this issue Mar 24, 2022 · 5 comments · May be fixed by #1380
Assignees

Comments

@damageboy
Copy link

Currently, on modern HW, where multiple PMU counters can be recorded for single run (example: Icaleake with 16 concurrent PMU counters, the code perf_counters.cc hard codes a limit of 3 counters globally.

I'd like to use libpfm's internal API to detect at runtime the PMU that each requested counter is associated with,
and internally track how many counters are "consumed" from each PMU given the information retrieved from calling
pfm_get_pmu_info() instead of the current hard-coded limit of 3 built into the code.

I opening this issue in preparation of providing a PR that would implement such logic, and wanted to see if this is
something that needs more discussion / blessing before submitting a PR.
I have already started some preliminary work on tracking the requested counters vs. the availability of each PMU.

@dmah42 dmah42 changed the title [FR] Support Dyanamic PMU detection [FR] Support Dynamic PMU detection Mar 24, 2022
@dmah42
Copy link
Member

dmah42 commented Mar 24, 2022

i think @mtrofin was looking at something similar...

@damageboy
Copy link
Author

OK, let me know if there is already something in progress, I think I might be able to get something into a PR form by the weekend if you guys like it?

@mtrofin
Copy link
Contributor

mtrofin commented Mar 24, 2022

I was looking to build internal switching, i.e. assuming the limit is N, but the user wants P = kN+r counters, allow them to specify P counters and then, internally, execute the workload k+1 times.

I believe this FR is orthogonal. One recommendation: please ensure the storage in PerfCounterValues is still inlined, to avoid risk of additional cache misses.

@damageboy
Copy link
Author

@mtrofin thanks for pinging back.
Yeah, your feature suggestion to do re-execute the workload until you get all the requested counters is both orthogonal and yet connected (in the sense that both would change the existing code base).

Is this something you started work on or just planning to at this point in time?
I already have some basic code that tracks the counters and aggregates them into per-PMU and fixed/non-fixed counter "counts" for the "budgeting" aspects of this, I wouldn't mind re-writing if you have a more mature branch?

@mtrofin
Copy link
Contributor

mtrofin commented Mar 29, 2022

I don't have anything done.

damageboy added a commit to damageboy/benchmark that referenced this issue Apr 2, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" adn dynamically allocate slots according to the specific
  counter requested
- In this PR/commit, it is still impossible to get more detailed (x86)
  information about each specific counter properties in terms of being
  a fixed/non-fixed counter, due to what seems to be a lack of API
  surface on libpfm itself:
  https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters was bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate
  64 in-place counter slots (64-bits each) per measurement instance
- Closes google#1377
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 2, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" adn dynamically allocate slots according to the specific
  counter requested
- In this PR/commit, it is still impossible to get more detailed (x86)
  information about each specific counter properties in terms of being
  a fixed/non-fixed counter, due to what seems to be a lack of API
  surface on libpfm itself:
  https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters was bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate
  64 in-place counter slots (64-bits each) per measurement instance
- Closes google#1377
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 2, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" and dynamically allocate "slots" according to the specific
  counters requested.
- per-PMU information is obtained, where each PMU reports its own
  capabilities in the form of fixed/non-fixed counter limits.
- In this PR/commit, it is *still* impossible to get more detailed
  (x86-only) counter information in terms of fixed/non-fixed counter
  association, due to what seems to be a lack of API surface on libpfm
  itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters is bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate/inlline up-to
  64 counter slots (64-bits each) per measurement instance
- Closes google#1377
@damageboy damageboy linked a pull request Apr 2, 2022 that will close this issue
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 4, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" and dynamically allocate "slots" according to the specific
  counters requested.
- per-PMU information is obtained, where each PMU reports its own
  capabilities in the form of fixed/non-fixed counter limits.
- In this PR/commit, it is *still* impossible to get more detailed
  (x86-only) counter information in terms of fixed/non-fixed counter
  association, due to what seems to be a lack of API surface on libpfm
  itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters is bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate/inlline up-to
  64 counter slots (64-bits each) per measurement instance
- Closes google#1377
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 5, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" and dynamically allocate "slots" according to the specific
  counters requested.
- per-PMU information is obtained, where each PMU reports its own
  capabilities in the form of fixed/non-fixed counter limits.
- In this PR/commit, it is *still* impossible to get more detailed
  (x86-only) counter information in terms of fixed/non-fixed counter
  association, due to what seems to be a lack of API surface on libpfm
  itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters is bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate/inlline up-to
  64 counter slots (64-bits each) per measurement instance
- Closes google#1377
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 5, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" and dynamically allocate "slots" according to the specific
  counters requested.
- per-PMU information is obtained, where each PMU reports its own
  capabilities in the form of fixed/non-fixed counter limits.
- In this PR/commit, it is *still* impossible to get more detailed
  (x86-only) counter information in terms of fixed/non-fixed counter
  association, due to what seems to be a lack of API surface on libpfm
  itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters is bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate/inlline up-to
  64 counter slots (64-bits each) per measurement instance
- Closes google#1377
damageboy added a commit to damageboy/benchmark that referenced this issue Apr 6, 2022
- Instead of allowing for up to 3 counters, libpfm's internal
  capabilities of reporting PMU info are used to manage a per-PMU
  "registry" and dynamically allocate "slots" according to the specific
  counters requested.
- per-PMU information is obtained, where each PMU reports its own
  capabilities in the form of fixed/non-fixed counter limits.
- In this PR/commit, it is *still* impossible to get more detailed
  (x86-only) counter information in terms of fixed/non-fixed counter
  association, due to what seems to be a lack of API surface on libpfm
  itself: https://sourceforge.net/p/perfmon2/mailman/message/37631173/
- The maximal number of counters is bumped from 3 to 63, which together
  with the current padding "scheme" means we pre-allocate/inlline up-to
  64 counter slots (64-bits each) per measurement instance
- Closes google#1377
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants