Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CMS generator cards for 2015 files #97

Closed
katilp opened this issue May 19, 2021 · 3 comments
Closed

CMS generator cards for 2015 files #97

katilp opened this issue May 19, 2021 · 3 comments

Comments

@katilp
Copy link
Member

katilp commented May 19, 2021

(edited 11.10.2021)

The generator "gridpacks" are stored in /cvmfs/cms.cern.ch/phys_generator/gridpacks/
However, note that not all the LHE cards 2015 are stored yet there.

Take an example dataset from https://github.com/cernopendata/data-curation/blob/master/cms-YYYY-simulated-datasets/inputs/CMS-2015-mc-datasets.txt

Find the generator cards "by-hand" with:

Case no LHE:

Three options:

Through "fragments" stored in McM

Advantage: gets directly the relevant information

  1. search the dataset in McM (request -> output dataset) (example query MiniAODSIM)
  2. find the parent name and redo 1.
  3. if GEN-SIM, find the generator parameters in "Name of fragment"

As the metadata script reads the full dictionary we should have this information already

Config files

Advantage: already available as config for GEN-SIM step

Disadvantage: shows the full config file, not only the cards

The steps 1,2 as above, then

From edmProvDump

Advantage: get the information directly from the file

Disadvantage: to be done in a CMSSW release area, formatting not the best for the display

  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
    edmProvDump -f "generator SIM" root://eospublic.cern.ch/$file | grep -A9999 "generator SIM"

Case LHE:

Case no gridpack

  • MINIAODSIM has mcdb_id > 1 in McM
  • in use before gridpack was adopted
  1. Example McM query for a MINIAODSIM
  2. Take "Mcdb id", in the dictionary: "mcdb_id": 15839
  3. Find the lhe file in /eos/cms/store/lhe/$mcdb_id
  4. Extract file and read the header with
xz -d -c /eos/cms/store/lhe/$mcdb_id/* > lhe.lhe
  awk '/<header>/,/<\/header>/' lhe.lhe > lhe_header

Case gridpack

  • mcdb_1 = 0
  • Find the gridpack address, two options
From McM dictionary
From edmProvDump
  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
edmProvDump -f "externalLHEProducer LHE" root://eospublic.cern.ch/$file | grep gridpacks > line
gp=$(sed "s/'/ /g" line | awk '{print $6}')
Extract cards once $gp address is know
if [[ $file == *"madgraph"* ]]; then
      tar -xf $gp ./process/madevent/Cards/run_card.dat
      tar -xf $gp ./process/madevent/Cards/proc_card*.dat
      tar -xf $gp ./process/madevent/Cards/param_card.dat
      mv ./process/madevent/Cards/*.dat $dir/
 elif [[ $file == *"powheg"* ]]; then
      tar -xf $gp *.input
      mv *.input $dir
 elif [[ $file == *"amcatnlo"* ]]; then
      tar -xf $gp process/Cards/run_card.dat
      tar -xf $gp process/Cards/proc_card*.dat
      tar -xf $gp process/Cards/param_card.dat
      mv process/Cards/*.dat $dir/
fi
@katilp
Copy link
Member Author

katilp commented Oct 25, 2021

@OsamaMomani as we discussed, this could be done separated from the other script, and in this case, the pseudocode is

input: list of datasets

  • find the datasets which have the LHE step (the ones without it should already have generator parameters)
  • if mcdb_id > 0:
    • get the generator parameters from the "LHE header" and name it according to the recid with
      xz -d -c /eos/cms/store/lhe/$mcdb_id/* > lhe.lhe
      awk '/<header>/,/<\/header>/' lhe.lhe > $recid_lhe_header
      
    or with similar
    • write the header to a generator parameter store
  • else:
    • get the gridpack address (it is a string starting with /cvmfs/cms.cern.ch/phys_generator/gridpacks/) from the fragment or from the dictionary
    • write the whole gridpack in a gridpack store (add the recid to the name, or have them as folders)
    • get the generator parameters from the gridpack tar file with the codesnippet at the end of the previous comment ($file can be the dataset name)
    • save the generator parameter files to files starting with recid and record them the generator parameter store

@katilp
Copy link
Member Author

katilp commented Nov 21, 2021

Exceptions/additions:

mcdb_id, but no <header>

eg: /GluGluToContinToZZTo2e2nu_13TeV_MCFM701_pythia8/RunIIWinter15pLHE-MCRUN2_71_V1-v1/LHE -> mcdb_id = 14301

$ ls /eos/cms/store/lhe/15401/
gg_ZZ_2El2Nu_13TeV_NNPDF30_lo_as_0130_MCFM70.lhe
-bash-4.2$ head /eos/cms/store/lhe/15401/gg_ZZ_2El2Nu_13TeV_NNPDF30_lo_as_0130_MCFM70.lhe
<LesHouchesEvents version="1.0">
<!--
file generated with MCFM version 7.0
Input file input.DAT contained:
#  Cross-section is:             14.9316     +/-            0.136621E-01)

 #  Contribution from parton sub-processes:
#         GG     |        0.0000        0.00%
#         GQ     |        0.0000        0.00%
#         GQB    |        0.0000        0.00%
  • take what is in the comment, i.e. between <!-- .... -->
  • keyword for identification "generators": ["MCFM701"]

swp, swo files

/GluGluWWTo2L2Nu_MCFM_13TeV/RunIIWinter15pLHE-MCRUN2_71_V1-v1/LHE

$ ls /eos/cms/store/lhe/15275
ggWWbx_lord_NNPDF30_proc127_ll_500kevents.lhe  ggWWbx_l.swo  ggWWbx_l.swp

to be investigated

no files in ./process/madevent/Cards for madgraph

/BulkGravTohhTohtatahbb_narrow_M-1000_13TeV-madgraph/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.2.2/exo_diboson/Spin-2/BulkGraviton_hh_htatahbb/narrow/v3/BulkGraviton_hh_htatahbb_narrow_M1000_tarball.tar.xz

./process/madevent/Cards/run_card.dat: Not found in archive

  • Remove ./ and use tar -xf $gp process/madevent/Cards/run_card.dat etc

no powheg.input for powheg

/GluGluHToBB_M125_13TeV_powheg_pythia8/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/powheg/V2/gg_H_quark-mass-effects_NNPDF30_13TeV_M125/v2/gg_H_quark-mass-effects_NNPDF30_13TeV_M125_tarball.tar.gz

$ tar -tf $gp | grep powheg.input
./powheg.input
  • Take ./powheg.input instead of powheg.input

corrupt compress?/timeout?

/RSGravToWW_width0p2_M-3000_13TeV-madgraph/RunIIWinter15wmLHE-MCRUN2_71_V1-v1/LHE

/cvmfs/cms.cern.ch/phys_generator/gridpacks/slc6_amd64_gcc481/13TeV/madgraph/V5_2.3.3/exo_diboson/Spin-2/RSGraviton_WW_inclu/wide/v1/RSGraviton_WW_inclu_width0.2_M3000_tarball.tar.xz

Files are there and get extracted but tar fails in exit. If I kill the process before it fails:

$ tar -xf $gp ./process/madevent/Cards/run_card.dat
^C
$ ls process/madevent/Cards/run_card.dat
process/madevent/Cards/run_card.dat
$ head process/madevent/Cards/run_card.dat
#*********************************************************************
#                       MadGraph5_aMC@NLO                            *
#                                                                    *
#                     run_card.dat MadEvent                          *
#                                                                    *
#  This file is used to set the parameters of the run.               *
#                                                                    *
#  Some notation/conventions:                                        *
#                                                                    *
#   Lines starting with a '# ' are info or comments                  *

Multiple lhe files

e.g.

$ ls /eos/cms/store/lhe/15453/
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_0.lhe   Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_2.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_6.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_10.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_3.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_7.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_11.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_4.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_8.lhe
Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_1.lhe   Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_5.lhe  Higgs0L1ToWWTo2L2Nu_M-125_13TeV-powheg2-JHUgenV6_9.lhe
  • take ..._0.lhe

@OsamaMomani
Copy link
Member

folder structure

lhe_generators
│
├── mcdb
│      ├── {mcdb_id}_header (files)
│        .....
├── gridpacks
        ├── {recid} (folders)
               ├── (one or more .dat|.input files)
         .....

~3478 mcdb files (tens of MB)
~3636 gridpacks folders (few hundreds of MB)

in opendata portal

  • check if record has lhe then check mcdb or gridpacks
    • if mcdb provide a link to /eos/..../lhe_generators/mcdb/{mcdb_id}_header
    • if gridpacks provide links to files inside /eos/..../lhe_generators/gridpacks/{recid}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants