Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding PsetHash from Edmprovdump fails with CMSSW_15_1 #8953

Open
aspiringmind-code opened this issue Feb 27, 2025 · 13 comments
Open

Adding PsetHash from Edmprovdump fails with CMSSW_15_1 #8953

aspiringmind-code opened this issue Feb 27, 2025 · 13 comments
Assignees

Comments

@aspiringmind-code
Copy link
Contributor

See here
This line assumes that the hash value will be at the end of the line dumped by edmProvdump. Tests fail therefore. A robust way to get the hash value should be worked out.

@belforte
Copy link
Member

see also discussion in cms-sw/cmssw#47355 (comment)

aspiringmind-code added a commit to aspiringmind-code/CRABServer that referenced this issue Feb 27, 2025
@belforte belforte changed the title Adding PsetHash from Edmprovdump fails Adding PsetHash from Edmprovdump fails with CMSSW_15_1 Feb 27, 2025
@makortel
Copy link

I want to have a better and more stable way to communicate what CRAB needs. To me the framework job report XML would sound a good way to convey the necessary information (rather than parsing output of any additional script). Would you agree?

I'd also want to understand better if it is really the PSet hash that is needed here, or e.g. reduced process history ID (which is what the framework internally uses for the segregation of different processing histories).

In particular I'd advise against making any quick patches, because I'll change the edmProvDump output (especially this part) more drastically, and then we'd be here again. I'm already restoring the edmProvDump output (temporarily) in cms-sw/cmssw#47473 so that we'd have more time to make a proper fix.

@belforte
Copy link
Member

Thanks Matti,

I agree that FJR is preferrable. Even if we may have to do the change in WMCore code https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/FwkJobReport/Report.py

As to "what is needed", I'd say that the use case is simple, even if possibly not technically defined: limit the possibilities to users to shoot their own feed and come complaining with use with obscure problems later. That said, we need guidance from FWK Core developers on what to check.

@aspiringmind-code
Copy link
Contributor Author

@makortel and @belforte Thank you for your insight. Yes, we'd like to not rely on parsing like this. I'd refrain from doing the regex change as a quick fix. And wait for the final edmProvDump output format to make necessary changes on our end.

@makortel
Copy link

makortel commented Feb 27, 2025

So if I understood the present situation correctly, CRAB uses the PSet hash as part the dataset name published in DBS as described in https://twiki.cern.ch/twiki/bin/view/CMSPublic/Crab3DataHandling#Output_dataset_names_in_DBS, but not for anything else.

I saw references to "PSet hash" in the WMCore repository as well, but it was unclear to me if WMCore uses the PSet hash only to fill the field in DBS or also for something else, and from where WMCore gets the values.

Listing the various options of "IDs"

  1. The PSet hash is an ID for the configuration itself (including the process name).
  2. The reduced ProcessConfiguration ID adds the major and minor numbers from CMSSW version into the hashing (i.e. cuts the version string to CMSSW_15_0; e.g. CMSSW_15_0_1 and CMSSW_15_0_10 are considered the same, and CMSSW_15_0_1 and CMSSW_15_1_1 are considered different).
  3. The reduced ProcessHistory ID effectively adds the ProcessConfiguration IDs of all the earlier processes in the chain. A single file may contain events with different reduced ProcessHistory IDs (and I think we have seen those).

The CMSSW framework uses the 3) to segregate data. But given that a single file can contain events with different ProcessHistory IDs (e.g. merged MiniAOD files that contain data taken with different HLT configurations), that doesn't seem like a good fit here.

Between 1 and 2 the question is then if CRAB should use the CMSSW major/minor version to segregate datasets.

For complex PSets there is a good chance that some component added a new (tracked) parameter or a value of some existing parameter between CMSSW major or minor versions (e.g. between CMSSW_14_2 and CMSSW_15_0). But for simple PSets the PSet hash alone could have the same value between CMSSW versions, that could then result in a user publishing files produced with two CMSSW versions into the same DBS dataset. But I have no feeling how (im)probable such a case would be.

@belforte
Copy link
Member

Thanks @makortel
I stand corrected. The PSet hash is part of the dictionary needed to publish one block in DBS[1]. CRAB uses the one discussed here. And I do not know if /how that information is used later on.
So WMCore must be filling this field to, or DBS would complain. And I guess @amaltaro or @khurtado are the ones to know where they get it and tell if/how WMCore will be impacted by the change. Maybe this value in DBS is always dummy [2] ? I can't list it with DAS(goclient) to check what's there.

Do I understand correctly that current edmProdDump prints out ID number 1. above ?
If yes, I would stay with that and do not take chances with getting more strict. We have no evidence of problems.

OTOH, if you are going to put this in the FJR XML, which is already a very large file, maybe you can write all the three possibility and free yourself :-)

[1]
https://github.com/dmwm/dbs2go/blob/master/docs/apis.md
https://github.com/dmwm/dbs2go/blob/master/dbs/bulkblocks.go

[2]
have we reached the point where current DBS developers are remote enough from original developers that we do not know what the various metadata files mean ?

@makortel
Copy link

Do I understand correctly that current edmProdDump prints out ID number 1. above ?

Correct.

If yes, I would stay with that and do not take chances with getting more strict. We have no evidence of problems.

Ok.

OTOH, if you are going to put this in the FJR XML, which is already a very large file, maybe you can write all the three possibility and free yourself :-)

That would indeed a possibility (except maybe for 3, for which there may be multiple values).

@belforte
Copy link
Member

Indeed it sounds like 3. is only useful for the framework.

One last point. I presume that we will keep using edmProvDump for up to CMSSW_14 (and maybe 15_0 ?) and switch to new way from 15_1 onward, right ?

@makortel
Copy link

I presume that we will keep using edmProvDump for up to CMSSW_14 (and maybe 15_0 ?) and switch to new way from 15_1 onward, right ?

You can continue using edmProvDump for sure up to CMSSW_14_X_Y. I don't foresee any need to touch edmProvDump there. And the new way with FWJR will be part of 15_1_X.

Fate of 15_0_X is a good question. Until this issue I was planning to backport the (present and upcoming) edmProvDump changes there (although it's more of it being nice to be able to print the new provenance information we're adding in 15_0_X with a 15_0_X release, than being strictly necessary). I can backport the upcoming change in FWJR to 15_0_X, and I can delay the (possible) backport of edmProvDump changes until we have a new version of CRAB in production. But that would mean that CRAB would have to use edmProvDump until some X in 15_0_X, and FWJR from X+1 onwards. How feasible would that be?

@belforte
Copy link
Member

. But that would mean that CRAB would have to use edmProvDump until some X in 15_0_X, and FWJR from X+1 onwards. How feasible would that be?

it still seems a simple if to me. No problem. I am sure @aspiringmind-code will have no problem implementing it.
To ease the transition can we put in a small hack to use e.g. edmProvDump from CMSSW_14 in such cases, while a proper change to WMCore and CRAB production releases goes through the pipeline. Will it work on files produced by 15_x ?

So we can decouple our and your timelines.

@makortel
Copy link

I'd rather not use edmProvDump from an earlier release. It might work, but is not guaranteed (in this particular case I'd think it could work, but there are subtleties that I'd prefer not to think about).

One option for a transition would be that CRAB would first try to use the PSet hash from the framework job report, and if it isn't there, then use edmProvDump. That an explicit release checks could be avoided, and even backports to earlier CMSSW release cycles would work automatically (as long as the fwjr change is made before or at the same release as the edmProvDump change). There would still be some timeline coupling (framework needs to decide the XML schema change, then CRAB can do their change, and only after the deployment of new CRAB can framework deploy more changes to edmProvDump). I'd be fine with that though.

@belforte
Copy link
Member

belforte commented Mar 4, 2025

Sorry for late reply @makortel
Sure. I am fine with going in the order which you describe

  1. define how CRAB will get PSet has from FJR
  2. implement this in CRAB+WMCore, with fallback to edmProvDump
  3. deploy in production
  4. update edmProvDump to new format

@aspiringmind-code @sinonkt I think it is better if you decide who among you is going to do 2. and use the time until 1. is ready to understand where changes will have to be done.

@makortel
Copy link

makortel commented Mar 6, 2025

We are planning to proceed with the following addition to the FJR XML file (under the <FrameworkJobReport> element)

<Process>
  <Name>PROD1</Name>
  <ReducedConfigurationID>783c5ab60bfe18ac97262801adef9de4</ReducedConfigurationID>
  <ParameterSetID>1f94f8e1ac0ae9d7dade1c99cc5862b3</ParameterSetID>
</Process>

The ParameterSetID is the element whose content is the "PSet hash". I'm working to open a CMSSW PR soon (I'll let you know here).

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants