-
Notifications
You must be signed in to change notification settings - Fork 7
Running the DAQ
To run the PADME DAQ system the shifter must log on l0padme1 as daq. The password should be written somewhere in the Control Room (or ask anybody from the collaboration). After logging on, cd to directory DAQ.
[padme@padmecr4 ~]$ ssh -Y daq@l0padme1
daq@l0padme1's password:
Last login: Mon Oct 8 09:39:46 2018 from padmecr4.lnf.infn.it
[daq@l0padme1 ~]$ cd DAQ
[daq@l0padme1 DAQ]$
All DAQ procedures are handled through the RunControl server. This is a daemon process running on l0padme1.
To verify if the process is running:
[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl
UID PID PPID C STIME TTY TIME CMD
...
daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl --server
...
If it is not running, please restart it:
[daq@l0padme1 DAQ]$ ./RunControl --server
Starting RunControlServer in background
All output from the RunControl server process is written to the log/RunControlServer.log
file in the DAQ directory. Looking into this file can help troubleshooting DAQ problems.
The RunControl client is used to give command to the RunControl server (start a new run, stop the run, ...). To start it:
[daq@l0padme1 DAQ]$ ./RunControl --no-gui
Connecting to RunControl server on host localhost port 10000
SEND (q or Q to Quit):
This will start the RunControl client in text mode (a GUI is foreseen but is not available yet). All commands will be given from this terminal. The help
command can be used to know which commands are available at any point of the RunControl procedure.
SEND (q or Q to Quit): help
Sending help
Available commands:
help Show this help
get_state Show current state of RunControl
get_setup Show current setup name
get_setup_list Show list of available setups
get_board_list Show list of boards in use with current setup
get_board_config_daq <b> Show current configuration of board DAQ process <b>
get_board_config_zsup <b> Show current configuration of board ZSUP process <b>
get_trig_config Show current configuration of trigger process
get_run_number Return last run number in DB
change_setup <setup> Change run setup to <setup>
new_run Initialize system for a new run
shutdown Tell RunControl server to exit (use with extreme care!)
SEND (q or Q to Quit):
Before starting a new run it is wise to verify which setup is currently loaded and change it if needed. For the time being, unless told otherwise by the Run Coordinator, the correct setup is full201809
which will enable all ADC boards and acquire data from all PADME detectors. Before starting any run, please make sure that the setup is correct:
SEND (q or Q to Quit): get_setup
Sending get_setup
full201809
If asked by the Run Coordinator, you can change the setup:
SEND (q or Q to Quit): get_setup_list
Sending get_setup_list
['full201809', 'sac201807', 'single201810', 'target201809', 'test201806', 'test201809', 'veto201809']
SEND (q or Q to Quit): change_setup target201809
Sending change_setup target201809
target201809
The change_setup
command is also used to reload a setup if any of its files did change (WARNING: only the RunCoordinator is allowed to edit the setup files):
SEND (q or Q to Quit): get_setup
Sending get_setup
full201809
SEND (q or Q to Quit): change_setup full201809
Sending change_setup full201809
full201809
SEND (q or Q to Quit): new_run
Sending new_run
Run number (next or dummy): dummy
Sending dummy
new_run - new run will have number 0
WARNING: for the time being, only dummy is supported.
Run type: TEST
Sending TEST
new_run - new run will have type TEST
Note: supported run types are TEST, DAQ, CALIBRATION, COSMICS, RANDOM. Uppercase is mandatory.
Shift crew: Emanuele
Sending Emanuele
Start of run comment: My first test run
Sending My first test run
Both "Shift crew" and "Start of run comment" accept free format text of (almost) indefinite length. Try to be as detailed as possible in describing the run conditions (beam status, HV status, ADC boards included, special conditions, etc...).
Now the run initialization procedure can start. Expect a delay of several seconds before the first message is shown.
level1 0 ready
merger ready
trigger ready
adc 0 zsup_ready
adc 0 init
adc 0 ready
New run initialization completed correctly
init_ready
Warning: the initialization procedure for the full experiment (29 ADC boards) takes up to 2 minutes, so wait patiently.
SEND (q or Q to Quit): start_run
Sending start_run
Run started correctly
run_started
SEND (q or Q to Quit): stop_run
Sending stop_run
End of run comment: My end of run
Sending My end of run
adc 0 daq_terminate_ok
adc 0 zsup_terminate_ok
trigger terminate_ok
merger terminate_ok
level1 0 terminate_ok
Run terminated correctly
terminate_ok
Only a single RunControl client can connect to the RunControl server at any given time. If you want to move the client from one terminal to another, issue the Q
command on the original client and then start the new one with the usual command: this procedure will not affect the RunControl server in any way (e.g. if a run is in progress it will keep taking data).
Please note that the client MUST NOT be stopped while the new_run
procedure is in progress: this would leave the RunControl server in an indefinite state and will require stopping and restarting it.
Use this procedure only if you want to stop the main RunControl server. This should be done only if the stop_run procedure fails and/or the system gets in a pathological state.
SEND (q or Q to Quit): shutdown
Sending shutdown
exiting
Server's gone. I'll take my leave as well...
Closing socket
If the server is stuck and does not respond, it can be killed with the standard kill
(or kill -9
if needed) command:
[daq@l0padme1 DAQ]$ ps -fu daq
UID PID PPID C STIME TTY TIME CMD
...
daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl --server
...
[daq@l0padme1 DAQ]$ kill 177988
If the server get stuck or the run initialization fails, there is a long timeout before the system will give back control to the user. This will be improved. In the meanwhile one can stop both the client and the server with CTRL-C
and/or kill
and then apply the standard Clean-up Procedure before restarting the whole DAQ system.
All active processes created during the DAQ produce individual log files which can be very useful to verify if the DAQ is running smoothly. All log files for a given run are stored in a single directory named after the run (e.g. run_0000000_20181005_094240/log
). This directory is created inside the DAQ/runs
subdirectory (i.e. DAQ/runs/run_0000000_20181005_094240/log
for the previous example).
To check if the trigger board is correctly receiving the trigger from the BTF:
[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_trigger.log
... Some setup messages ...
2018/10/05 09:44:54 - Starting trigger generation
- Opening output stream '/home/daq/DAQ/local/streams/run_0000000_20181005_094240/run_0000000_20181005_094240_trigger'
Current masks: trig 0x01 busy 0x00 dummy 0x00 0x00
- Trigger 0 0x3a418cf3f0c8fb 605388065019 0x1 233
- Trigger 100 0x53418d0bc69413 605787952147 0x1 333 4998.588867ms 5s
- Trigger 200 0x6c418d239c546c 606187836524 0x1 433 4998.554688ms 5s
... Trigger number keeps growing steadly ...
To check if the event merger is receiving data from all boards with the correct synchronization:
[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_merger.log
... Some setup messages ...
Board 26 has id 7 and SN 203
Board 27 has id 8 and SN 187
Board 28 has id 9 and SN 188
- Written 100 events
- Written 200 events
... Number of written events keeps growing steadily ...
When the event merger is not synchronized with the event trigger, please check the setup: for example, if we are taking data at 50 Hz with the complete set of detectors, "full201809" is the setup to choose. Another setup at this rate could cause a de-synchronization.
Any problem in the DAQ will immediately show up in the event merger log file. In this case the run should be stopped and the clean-up procedure should be applied before starting a new run.
An example of problems linked to the trigger board loosing packets looks like this:
... all good up to now ...
- Written 7000 events
- Written 7100 events
*** Board 0 - Board time 357818173696 less than Trigger time 394468561288: skip event and try to recover
*** Board 0 - Board time 357838171793 less than Trigger time 394468561288: skip event and try to recover
*** Board 0 - Board time 357878168337 less than Trigger time 394468561288: skip event and try to recover
... problem messages keep repeating over and over ...
© 2015 PADME Collaboration