-
Notifications
You must be signed in to change notification settings - Fork 7
Running the DAQ
To run the PADME DAQ system the shifter must log on l0padme1 as daq. The password can be obtained from the Run Coordinator (or from most members of the collaboration). After logging on, cd to directory DAQ.
[padme@padmecr4 ~]$ ssh -Y daq@l0padme1
daq@l0padme1's password:
Last login: Mon Oct 8 09:39:46 2018 from padmecr4.lnf.infn.it
[daq@l0padme1 ~]$ cd DAQ
[daq@l0padme1 DAQ]$
N.B. if you are doing a remote shift (i.e. from outside the laboratory), you must first connect to the INFN-LNF VPN following the procedure described in the Guide for remote shifters.
All DAQ procedures are handled through the RunControl server. This is a daemon process running on l0padme1.
To verify if the process is running:
[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl
UID PID PPID C STIME TTY TIME CMD
...
daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl --server
...
If it is not running, please restart it:
[daq@l0padme1 DAQ]$ ./RunControl --server
Starting RunControlServer in background
All output from the RunControl server process is written to the log/RunControlServer.log
file in the DAQ directory. Looking into this file can help troubleshooting DAQ problems.
The RunControl client is used to issue commands to the RunControl server (start a new run, stop the run, ...). To start the client:
[daq@l0padme1 DAQ]$ ./RunControl
Connecting to RunControl server on host localhost port 10000
SEND (q or Q to Quit):
This will start the RunControl client in text mode. All commands will be given from this terminal. The help
command can be used to get a list of available commands at any point of the RunControl procedure.
SEND (q or Q to Quit): help
Sending help
Available commands:
help Show this help
get_state Show current state of RunControl
get_setup Show current setup name
get_setup_list Show list of available setups
get_board_list Show list of boards in use with current setup
get_board_config_daq <b> Show current configuration of board DAQ process <b>
get_board_config_zsup <b> Show current configuration of board ZSUP process <b>
get_trig_config Show current configuration of trigger process
get_run_number Return last run number in DB
change_setup <setup> Change run setup to <setup>
new_run Initialize system for a new run
shutdown Tell RunControl server to exit (use with extreme care!)
SEND (q or Q to Quit):
Before starting a new run it is wise to verify which setup is currently loaded and change it if needed. For the time being, unless told otherwise by the Run Coordinator, the correct setup is full202007
which will enable all ADC boards and acquire data from all PADME detectors. Before starting any run, please make sure that the setup is correct:
SEND (q or Q to Quit): get_setup
Sending get_setup
full202007
If asked by the Run Coordinator, you can change the setup, e.g.
SEND (q or Q to Quit): get_setup_list
Sending get_setup_list
['full202007', 'ecal_sac_cosmics', 'ecal_sac_cosmics_nozsup', 'target_sac_201907', 'test201907', 'test2020_nozsup']
SEND (q or Q to Quit): change_setup ecal_sac_cosmics
Sending change_setup ecal_sac_cosmics
ecal_sac_cosmics
The change_setup
command is also used to reload a setup if any of its files did change (WARNING: only the Run Coordinator is allowed to edit the setup files):
SEND (q or Q to Quit): get_setup
Sending get_setup
full202007
SEND (q or Q to Quit): change_setup full202007
Sending change_setup full202007
full202007
SEND (q or Q to Quit): new_run
Sending new_run
Current setup is full202007
Available run types: CALIBRATION,COSMICS,DAQ,FAKE,OTHER,RANDOM,TEST,TESTBEAM
Run type: DAQ
Sending DAQ
New run will be of type DAQ
New run will have run number 30046
Note: supported run types are TEST, DAQ, CALIBRATION, COSMICS, RANDOM, OTHER. The system also supports the FAKE and TESTBEAM run types but these are only to be used by experts.
N.B. Uppercase is mandatory.
Shift crew: Emanuele
Sending Emanuele
Start of run comment: My first test run
Sending My first test run
Both "Shift crew" and "Start of run comment" accept free format text of (almost) indefinite length. Try to be as detailed as possible in describing the run conditions (beam status, HV status, ADC boards included, special conditions, etc...).
Now the run initialization procedure can start. Expect a delay of several seconds before the first message is shown.
New run initialization start
level1 0 ready
level1 1 ready
...
merger ready
trigger init
adc 0 zsup_init
adc 1 zsup_init
...
adc 28 zsup_init
adc 0 daq_init
adc 1 daq_init
...
adc 28 daq_init
trigger ready
adc 0 ready
adc 1 ready
...
adc 28 ready
adc all ready
New run initialization completed correctly
init_ready
The initialization procedure for the full experiment (29 ADC boards) takes up to 2 minutes, so wait patiently.
In some occasions the initialization procedure can time-out, fail or get stuck. In any of this happens, please check the DAQ Troubleshooting page for the correct recovery procedure. If all recovery procedures fail, it is time to call an expert.
SEND (q or Q to Quit): start_run
Sending start_run
Run started correctly
run_started
SEND (q or Q to Quit): stop_run
Sending stop_run
End of run comment: My end of run
Sending My end of run
adc 0 daq_terminate_ok
adc 0 zsup_terminate_ok
...
adc 28 daq_terminate_ok
adc 28 zsup_terminate_ok
trigger terminate_ok
merger terminate_ok
level1 0 terminate_ok
level1 1 terminate_ok
...
Run terminated correctly
terminate_ok
Only a single RunControl client can connect to the RunControl server at any given time. If you want to move the client from one terminal to another, issue the Q
command on the original client and then start the new one with the usual command: this procedure will not affect the RunControl server in any way (e.g. if a run is in progress it will keep taking data).
Please note that the client MUST NOT be stopped while the new_run
procedure is in progress: this would leave the RunControl server in an indefinite state and will require stopping and restarting it.
If you are leaving after your shift and no one is coming after you, please close the client window: in this way, anybody will be able to take over and manage the RunControl even if they are not physically at the laboratory.
Use this procedure only if you want to stop the main RunControl server. This should be done only if the initialization or stop_run procedures fail and/or the system gets in a pathological state (no response to the client).
SEND (q or Q to Quit): shutdown
Sending shutdown
exiting
Server's gone. I'll take my leave as well...
Closing socket
If the server is stuck and does not respond to user commands (this can happen in rare cases, e.g. during the procedure to connect to the database), it can be killed with the Unix kill
(or possibly kill -9
) command:
[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl
UID PID PPID C STIME TTY TIME CMD
...
daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl --server
...
[daq@l0padme1 DAQ]$ kill 177988
All active processes created during the DAQ produce individual log files which can be very useful to verify if the DAQ is running smoothly. All log files for a given run are stored in a single directory named after the run (e.g. run_0000000_20181005_094240/log
). This directory is created inside the DAQ/runs
subdirectory (i.e. DAQ/runs/run_0000000_20181005_094240/log
for the previous example).
To check if the trigger board is correctly receiving the trigger from the BTF:
[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_trigger.log
... Some setup messages ...
2020/06/13 13:31:56 - Starting trigger generation
- Setting process status to RUNNING (5)
DBINFO - 2020/06/13 13:31:56 - process_set_status 5
DBINFO - 2020/06/13 13:31:56 - process_set_time_start 2020/06/13 13:31:56
- Enabling requested triggers.
trig_set_register cmd = 0x010203841002
Current trigger mask: 0x02
- Trigger 0 0x028e01061ebf9fcc 26285678540 0x01 142 0 1
- Trigger 100 0x02f2010627b72903 26436118787 0x01 242 0 1 1880.503ms 1671.407ms 53.18Hz
- TrigMsk 1605591101 0(93,92,48.92Hz) 1(4,4,2.13Hz) 3(2,2,1.06Hz) 7(2,2,1.06Hz)
- Trigger 200 0x0256010630964964 26584959332 0x01 86 0 1 1860.507ms 1850.141ms 53.75Hz
- TrigMsk 1605591102 0(184,91,48.91Hz) 1(10,6,3.22Hz) 3(4,2,1.07Hz) 7(3,1,0.54Hz)
... Trigger number keeps growing steadly ...
To check if the event merger is receiving data from all boards with the correct synchronization:
[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_merger.log
... Some setup messages ...
Board 19 has id 23 and SN 223
Board 20 has id 27 and SN 182
- Written 100 events
Event 100 size 43825 time 1592055124.879055715s clock 620781087 status 0001 trigger mask 02 fifo 00 auto 01 missing boards 00000000
- Written 200 events
Event 200 size 43825 time 1592055133.315418273s clock 1296144461 status 0001 trigger mask 02 fifo 00 auto 01 missing boards 00000000
- Written 300 events
Event 300 size 43825 time 1592055140.345031194s clock 1858490107 status 0001 trigger mask 02 fifo 00 auto 01 missing boards 00000000
... Number of written events keeps growing steadily ...
Any problem in the DAQ will immediately show up in the event merger log file. As most of the times problems with the merger process are related to problems with the ADC boards, the fastest recovery procedure is to reset all VME and NIM crates (see procedures to Reset VME crates and to Reset NIM Crates and Vetos) and do the Clean-up Procedure.
An example of problems linked to the trigger board loosing packets looks like this:
... all good up to now ...
- Written 7000 events
- Written 7100 events
*** Board 0 - Board time 357818173696 less than Trigger time 394468561288: skip event and try to recover
*** Board 0 - Board time 357838171793 less than Trigger time 394468561288: skip event and try to recover
*** Board 0 - Board time 357878168337 less than Trigger time 394468561288: skip event and try to recover
... problem messages keep repeating over and over ...
N.B. in some occasions, a single trigger event is not reported to the DAQ system by the Trigger Board. In this case the system will report the standard error messages but will be able to automatically recover from the problem: only stop the run if you see the error messages repeating over and over.
© 2015 PADME Collaboration