ServiceX_DID_finder_CERNOpenData

Access datasets for ServiceX from CERN Open Data Portal.

Finding datasets

The Cern Open Data Portal is CERN's portal to all of its open data. As of this writing, that were 1000's of datasets, in all sorts of formats. Some number of the formats are understood by ServiceX: flat ROOT files from all experiments and CMS Run 1 AOD files.

Use the search bar to find a dataset - for example this CMS dataset of simulated Higgs to 4 lepton dataset (H → ZZ → ℓℓℓℓ).

On the web page you can quickly see what type of output supplied by looking at the file list below. You'll have to use some context information to know what kind of ROOT files these are: these are CMS Run 1 AOD files. As such, you'll also have to use the proper ServiceX backend to process these files.

Once you've figured this part out, you can specify the dataset with a DID: cernopendata://1507. The finder will translate the 1507 into the list of files that will be fed to ServiceX transformers as long as this DID finder is running inside the transformer.

Deploying the DID Finder

You'll need to create a k8 deployment file in order to run this DID finder. Here is a (tested) sample, built to be part of the ServiceX distribution:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-did-finder-cernopendata
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Release.Name }}-did-finder-cernopendata
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}-did-finder-cernopendata
    spec:
      containers:
      - name: {{ .Release.Name }}-did-finder-cernopendata
        image: {{ .Values.didFinderCERNOpenData.image }}:{{ .Values.didFinderCERNOpenData.tag }}
        imagePullPolicy: {{ .Values.didFinderCERNOpenData.pullPolicy }}
        env:
          - name: INSTANCE_NAME
            value: {{ .Release.Name }}
        args:
          - --rabbit-uri
          - amqp://user:{{ .Values.rabbitmq.auth.password }}@{{ .Release.Name }}-rabbitmq:5672

The last argument to --rabbit-uri is perhaps the most crucial - it defines the rabbit queue this DID finder listens on.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
samples		samples
src/servicex_did_finder_cernopendata		src/servicex_did_finder_cernopendata
tests		tests
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
README.rst		README.rst
requirements.txt		requirements.txt
requirements_test.txt		requirements_test.txt
tag_and_release.sh		tag_and_release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ServiceX_DID_finder_CERNOpenData

Finding datasets

Deploying the DID Finder

About

Releases 1

Packages

Contributors 3

Languages

ssl-hep/ServiceX_DID_Finder_CERNOpenData

Folders and files

Latest commit

History

Repository files navigation

ServiceX_DID_finder_CERNOpenData

Finding datasets

Deploying the DID Finder

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages