Skip to content

Running the DAQ

Emanuele Leonardi edited this page Oct 8, 2018 · 39 revisions

Logging in

To run the PADME DAQ system the shifter must log on l0padme1 as daq. The password should be written somewhere in the Control Room (or ask anybody from the collaboration). After logging on, cd to directory DAQ.

[padme@padmecr4 ~]$ ssh -Y daq@l0padme1
daq@l0padme1's password: 
Last login: Mon Oct  8 09:39:46 2018 from padmecr4.lnf.infn.it
[daq@l0padme1 ~]$ cd DAQ
[daq@l0padme1 DAQ]$ 

Starting the RunControl server

All DAQ procedures are handled through the RunControl server. This is a daemon process running on l0padme1.

To verify if the process is running:

[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl
UID         PID   PPID  C STIME TTY          TIME CMD
...
daq      177988      1  0 10:05 ?        00:00:00 /usr/bin/python ./RunControl --server
...

If it is not running, please restart it:

[daq@l0padme1 DAQ]$ ./RunControl --server
Starting RunControlServer in background

All output from the RunControl server process is written to the log/RunControlServer.log file in the DAQ directory. Looking into this file can help troubleshooting DAQ problems.

Starting the RunControl client

The RunControl client is used to issue commands to the RunControl server (start a new run, stop the run, ...). To start the client:

[daq@l0padme1 DAQ]$ ./RunControl --no-gui
Connecting to RunControl server on host localhost port 10000
SEND (q or Q to Quit):

This will start the RunControl client in text mode (a GUI is foreseen but is not available yet). All commands will be given from this terminal. The help command can be used to know which commands are available at any point of the RunControl procedure.

SEND (q or Q to Quit): help
Sending help
Available commands:
help		Show this help
get_state	Show current state of RunControl
get_setup	Show current setup name
get_setup_list	Show list of available setups
get_board_list	Show list of boards in use with current setup
get_board_config_daq <b>	Show current configuration of board DAQ process <b>
get_board_config_zsup <b>	Show current configuration of board ZSUP process <b>
get_trig_config	Show current configuration of trigger process
get_run_number	Return last run number in DB
change_setup <setup>	Change run setup to <setup>
new_run		Initialize system for a new run
shutdown		Tell RunControl server to exit (use with extreme care!)
SEND (q or Q to Quit):

Verifying and changing the setup

Before starting a new run it is wise to verify which setup is currently loaded and change it if needed. For the time being, unless told otherwise by the Run Coordinator, the correct setup is full201809 which will enable all ADC boards and acquire data from all PADME detectors. Before starting any run, please make sure that the setup is correct:

SEND (q or Q to Quit): get_setup
Sending get_setup
full201809

If asked by the Run Coordinator, you can change the setup:

SEND (q or Q to Quit): get_setup_list
Sending get_setup_list
['full201809', 'sac201807', 'single201810', 'target201809', 'test201806', 'test201809', 'veto201809']
SEND (q or Q to Quit): change_setup target201809
Sending change_setup target201809
target201809

The change_setup command is also used to reload a setup if any of its files did change (WARNING: only the RunCoordinator is allowed to edit the setup files):

SEND (q or Q to Quit): get_setup
Sending get_setup
full201809
SEND (q or Q to Quit): change_setup full201809
Sending change_setup full201809
full201809

Initializing a new run

SEND (q or Q to Quit): new_run
Sending new_run
Run number (next or dummy): dummy
Sending dummy
new_run - new run will have number 0

WARNING: for the time being, only dummy is supported.

Run type: TEST
Sending TEST
new_run - new run will have type TEST

Note: supported run types are TEST, DAQ, CALIBRATION, COSMICS, RANDOM. Uppercase is mandatory.

Shift crew: Emanuele
Sending Emanuele
Start of run comment: My first test run
Sending My first test run

Both "Shift crew" and "Start of run comment" accept free format text of (almost) indefinite length. Try to be as detailed as possible in describing the run conditions (beam status, HV status, ADC boards included, special conditions, etc...).

Now the run initialization procedure can start. Expect a delay of several seconds before the first message is shown.

level1 0 ready
merger ready
trigger ready
adc 0 zsup_ready
adc 0 init
adc 0 ready
New run initialization completed correctly
init_ready

Warning: the initialization procedure for the full experiment (29 ADC boards) takes up to 2 minutes, so wait patiently.

Starting a new run after initialization

SEND (q or Q to Quit): start_run
Sending start_run
Run started correctly
run_started

Stopping a run

SEND (q or Q to Quit): stop_run
Sending stop_run
End of run comment: My end of run
Sending My end of run
adc 0 daq_terminate_ok
adc 0 zsup_terminate_ok
trigger terminate_ok
merger terminate_ok
level1 0 terminate_ok
Run terminated correctly
terminate_ok

Moving the DAQ client window to another terminal

Only a single RunControl client can connect to the RunControl server at any given time. If you want to move the client from one terminal to another, issue the Q command on the original client and then start the new one with the usual command: this procedure will not affect the RunControl server in any way (e.g. if a run is in progress it will keep taking data).

Please note that the client MUST NOT be stopped while the new_run procedure is in progress: this would leave the RunControl server in an indefinite state and will require stopping and restarting it.

Exiting from the system

Use this procedure only if you want to stop the main RunControl server. This should be done only if the stop_run procedure fails and/or the system gets in a pathological state.

SEND (q or Q to Quit): shutdown
Sending shutdown
exiting
Server's gone. I'll take my leave as well...
Closing socket

If the server is stuck and does not respond, it can be killed with the standard kill (or kill -9 if needed) command:

[daq@l0padme1 DAQ]$ ps -fu daq
UID         PID   PPID  C STIME TTY          TIME CMD
...
daq      177988      1  0 10:05 ?        00:00:00 /usr/bin/python ./RunControl --server
...
[daq@l0padme1 DAQ]$ kill 177988

Notes

If the server get stuck or the run initialization fails, there is a long timeout before the system will give back control to the user. This will be improved. In the meanwhile one can stop both the client and the server with CTRL-C and/or kill and then apply the standard Clean-up Procedure before restarting the whole DAQ system.

Log files

All active processes created during the DAQ produce individual log files which can be very useful to verify if the DAQ is running smoothly. All log files for a given run are stored in a single directory named after the run (e.g. run_0000000_20181005_094240/log). This directory is created inside the DAQ/runs subdirectory (i.e. DAQ/runs/run_0000000_20181005_094240/log for the previous example).

To check if the trigger board is correctly receiving the trigger from the BTF:

[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_trigger.log 
... Some setup messages ...
2018/10/05 09:44:54 - Starting trigger generation
- Opening output stream '/home/daq/DAQ/local/streams/run_0000000_20181005_094240/run_0000000_20181005_094240_trigger'
Current masks: trig 0x01 busy 0x00 dummy 0x00 0x00
- Trigger 0 0x3a418cf3f0c8fb  605388065019 0x1  233
- Trigger 100 0x53418d0bc69413  605787952147 0x1  333 4998.588867ms 5s
- Trigger 200 0x6c418d239c546c  606187836524 0x1  433 4998.554688ms 5s
... Trigger number keeps growing steadly ...

To check if the event merger is receiving data from all boards with the correct synchronization:

[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_merger.log 
... Some setup messages ...
Board 26 has id 7 and SN 203
Board 27 has id 8 and SN 187
Board 28 has id 9 and SN 188
- Written 100 events
- Written 200 events
... Number of written events keeps growing steadily ...

When the event merger is not synchronized with the event trigger, please check the setup: for example, if we are taking data at 50 Hz with the complete set of detectors, "full201809" is the setup to choose. Another setup at this rate could cause a de-synchronization.

Any problem in the DAQ will immediately show up in the event merger log file. In this case the run should be stopped and the clean-up procedure should be applied before starting a new run.

An example of problems linked to the trigger board loosing packets looks like this:

... all good up to now ...
- Written 7000 events
- Written 7100 events
*** Board  0 - Board time 357818173696 less than Trigger time 394468561288: skip event and try to recover
*** Board  0 - Board time 357838171793 less than Trigger time 394468561288: skip event and try to recover
*** Board  0 - Board time 357878168337 less than Trigger time 394468561288: skip event and try to recover
... problem messages keep repeating over and over ...
Clone this wiki locally