Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: create tables for HEPData #215

Merged
merged 53 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
6cb7dfc
reana.yaml file
Nov 6, 2023
6596b4d
Separate the samples in fileset for paralelisation
Nov 21, 2023
c514b3d
Merge step with histograms_merdeg.root
Nov 21, 2023
104f799
A lot of changes
Dec 8, 2023
ad3b0b5
Snakemake multi cascading
AndriiPovsten Feb 7, 2024
486faec
Snakemake multicascading and the submission.yaml for HEPData
AndriiPovsten Feb 7, 2024
7703d16
Snakemake multicascading
AndriiPovsten Feb 7, 2024
b69b21f
Without HEPData workspace
AndriiPovsten Feb 7, 2024
5df86b5
The HEPData folder with submission files for the cabinetry submission
AndriiPovsten Feb 7, 2024
e078f4b
Merge pull request #1 from AndriiPovsten/new_branch
AndriiPovsten Feb 7, 2024
be5c7a8
Better naming for the files with some suggestions for the Snakefile
AndriiPovsten Feb 29, 2024
f5bde77
Merge pull request #2 from AndriiPovsten/new_branch
AndriiPovsten Feb 29, 2024
a4911f3
The separate REANA folder
AndriiPovsten Feb 29, 2024
a8d3786
Merge pull request #3 from AndriiPovsten/new_branch
AndriiPovsten Feb 29, 2024
e2990a1
Test the file processing locally
AndriiPovsten Mar 1, 2024
2e4b13e
Merge pull request #4 from AndriiPovsten/Reproducibility_REANA
AndriiPovsten Mar 1, 2024
a0612d7
HEPData submission
AndriiPovsten Mar 11, 2024
87f510c
Merge pull request #5 from AndriiPovsten/main
AndriiPovsten Mar 11, 2024
af1906e
Change the naming
AndriiPovsten Mar 19, 2024
7ec7e4f
Cleaner Snakefile and main analysis notebook
AndriiPovsten Mar 19, 2024
fff6740
Merge pull request #7 from AndriiPovsten/new_branch
AndriiPovsten Mar 19, 2024
3edac4b
HEPData in the utils
AndriiPovsten Mar 21, 2024
3db44d4
The ultimate hepdata function for both current models
AndriiPovsten Mar 21, 2024
c51264c
HEPData submission with function in utils folder
AndriiPovsten Mar 22, 2024
0e052b7
Updated HEP_data fucntion
AndriiPovsten May 16, 2024
a37fd02
Shorter Snakefile version
AndriiPovsten May 16, 2024
dcafae7
Change the folder name
AndriiPovsten May 16, 2024
a439d95
adding the environment folder for local run
AndriiPovsten May 16, 2024
89e5585
Resolved conflicts in ttbar_analysis_pipeline.ipynb
AndriiPovsten May 16, 2024
8fe4084
Merge branch 'hepdata' into Reproducibility_REANA
AndriiPovsten May 16, 2024
97ccbdc
Merge pull request #10 from AndriiPovsten/Reproducibility_REANA
AndriiPovsten May 16, 2024
53bc00a
HEP data function
AndriiPovsten May 16, 2024
4a22831
Leave only the HEP data related files
AndriiPovsten May 17, 2024
04ddb39
Updated README with explanation of submission process
AndriiPovsten May 17, 2024
1911dd3
Deleted unecessary files, updated ttbar.py file
AndriiPovsten May 17, 2024
e281aef
Restore accidentally deleted files
AndriiPovsten May 20, 2024
6407587
Changes after comments
AndriiPovsten May 20, 2024
c0063bb
Getting rid of hidden files
AndriiPovsten May 23, 2024
db2439e
Getting rid of submodule
AndriiPovsten May 23, 2024
eb62980
original script file
AndriiPovsten May 31, 2024
625e224
new line
AndriiPovsten May 31, 2024
d813c29
right sync script
AndriiPovsten May 31, 2024
36c6a2c
the right model_prediction_ml object
AndriiPovsten May 31, 2024
cab75f5
== warning to is
AndriiPovsten May 31, 2024
86e78cf
new empty line
AndriiPovsten May 31, 2024
afc740b
getting rid of hist.intervals and submission object
AndriiPovsten May 31, 2024
80f547b
Updated HEPData submission with having as a units of measurements Var…
AndriiPovsten Jun 19, 2024
7a81bed
Get rid of unecesary hist.intervals in hepdata.py
AndriiPovsten Jun 28, 2024
c43da33
cleaned gitignore
AndriiPovsten Jun 28, 2024
530f484
change the name of the hep data function
AndriiPovsten Jul 5, 2024
efab560
Update analyses/cms-open-data-ttbar/README.md
alexander-held Jul 12, 2024
46da882
Update analyses/cms-open-data-ttbar/README.md
alexander-held Jul 12, 2024
52c9e03
Update analyses/cms-open-data-ttbar/README.md
alexander-held Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
27 changes: 27 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,30 @@ analyses/cms-open-data-ttbar/metrics

# dask
dask-worker-space/
analyses/cms-open-data-ttbar/first_option.py
analyses/cms-open-data-ttbar/reana_higgs_tautau.yaml
analyses/cms-open-data-ttbar/reana_samples.yaml
analyses/cms-open-data-ttbar/Sample_execution.py
analyses/cms-open-data-ttbar/out.ipynb
analyses/cms-open-data-ttbar/.snakemake/log/2023-11-13T230445.475964.snakemake.log
analyses/cms-open-data-ttbar/samples.py
analyses/cms-open-data-ttbar/uproot.py
.bash_history
.cache/matplotlib/fontlist-v330.json
.ipython/profile_default/startup/README
analyses/cms-open-data-ttbar/loops.py
.DS_Store
analyses/cms-open-data-ttbar/ca-certificates/cerngridca.crt
analyses/cms-open-data-ttbar/ca-certificates/cernroot.crt
HEPData_cabinetry/.DS_Store
analyses/cms-open-data-ttbar/Dockerfile_root_certificates
.DS_Store
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_single_top_s_chan__nominal_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_single_top_t_chan__nominal_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_single_top_tW__nominal_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_ttbar__ME_var_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_ttbar__nominal_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_ttbar__PS_var_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_ttbar__scaledown_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_wjets__nominal_paths.txt
analyses/cms-open-data-ttbar/Reproducibility_REANA/sample_ttbar__scaleup_paths.txt
1 change: 1 addition & 0 deletions AGC_clean/analysis-grand-challenge
Submodule analysis-grand-challenge added at a0612d
Binary file added analyses/.DS_Store
Binary file not shown.
6 changes: 6 additions & 0 deletions analyses/cms-open-data-ttbar/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This is the running Analysis Grang Challenge using REANA
REANA is a reproducible analysis platform for
For this you firstly need to install reana virtual environment
Further install reana-client
you can execute the ttbar analysis.ipynb using this command
Now you can check your repository.
38 changes: 38 additions & 0 deletions analyses/cms-open-data-ttbar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ This directory is focused on running the CMS Open Data $t\bar{t}$ analysis throu
| models/ | Contains models used for ML inference task (when `USE_TRITON = False`) |
| utils/ | Contains code for bookkeeping and cosmetics, as well as some boilerplate. Also contains images used in notebooks. |
| utils/config.py | This is a general config file to handle different options for running the analysis. |
| utils/hepdata.py | This is the .py file with function which would create a tables which would be submitted and stored into the [HEP_DATA website](https://www.hepdata.net) (use `HEP_DATA = True`) |

#### Instructions for paired notebook

Expand Down Expand Up @@ -51,3 +52,40 @@ argument is the appropriate reference file for the number of files per process a
For full usage help see the output of `python validate_histograms.py --help`.

`validate_histograms.py` can also be used to create new references by passing the `--dump-json` option.

#### HEP data creation and submision.
For proper submission, you need to modify the `submission.yaml` with proper explanation of variables and your table.
To submit the created histograms to HEP data,, you'll need to install the necessary packages and make some modifications to `ttbar_analysis_pipeline.ipynb` notebook.
``` console
pip install hepdata_lib hepdata-cli
```
and the root (for current version of it is mandatory)
```console
mamba install -c conda-forge root htcondor=10.8 -y
```
Next, modify the notebook to enable the submission in one run. You'll need to create a zip archive of your data for uploading.

```python
import shutil
folder_path = "hepdata_model" #name of the folder which was created wiht hepdata syntax
zip_filename = "hepdata_model.zip"
temp_folder = "temp_folder"
# Create a temporary folder without unwanted files
shutil.copytree(folder_path, temp_folder, ignore=shutil.ignore_patterns('.ipynb_checkpoints'))
# Create the archive from the temporary folder
shutil.make_archive(zip_filename, 'zip', temp_folder)
# Remove the temporary folder
shutil.rmtree(temp_folder)
```

```python
from getpass import getpass
import os
# Get the password securely
password = getpass("Enter your password: ")

command = f"hepdata-cli upload '/home/cms-jovyan/analysis-grand-challenge/analyses/cms-open-data-ttbar/hepdata_model.zip.zip' -e yourname.yoursurname@cern.ch"
os.system(f'echo {password} | {command}') #insert your passport in the actived window
```
If the submission is successful, you'll see your uploaded data in the provided link.

2 changes: 1 addition & 1 deletion analyses/cms-open-data-ttbar/cabinetry_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,4 +121,4 @@ NormFactors:
- Name: "ttbar_norm"
Samples: "ttbar"
Nominal: 1.0
Bounds: [0, 10]
Bounds: [0, 10]
2 changes: 1 addition & 1 deletion analyses/cms-open-data-ttbar/cabinetry_config_ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,4 +214,4 @@ NormFactors:
- Name: "ttbar_norm"
Samples: "ttbar"
Nominal: 1.0
Bounds: [0, 10]
Bounds: [0, 10]
15 changes: 15 additions & 0 deletions analyses/cms-open-data-ttbar/ttbar_analysis_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1299,6 +1299,21 @@
" utils.plotting.plot_data_mc(model_prediction, model_prediction_postfit, data_ml, config_ml)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54967894",
"metadata": {},
"outputs": [],
"source": [
"if utils.config[\"preservation\"][\"HEP_DATA\"] == True:\n",
" import utils.hepdata\n",
" #Submission of model prediction\n",
" utils.hepdata.submission_hep_data(model, model_prediction, \"hepdata_model\")\n",
" #Submission of model_ml prediction\n",
" utils.hepdata.submission_hep_data(model_ml, model_prediction_ml,\"hepdata_model_ml\")"
]
},
{
"cell_type": "markdown",
"id": "a2ce2d14",
Expand Down
Loading
Loading