Skip to content

PyCBC Live O4 development

Tito Dal Canton edited this page Sep 13, 2023 · 26 revisions

Placeholder page to track ideas and proposals for PyCBC Live features we want in O4. Note that there is a PyCBC Live O3 development page as well, with open items that should be carried here. Here I am adopting a different format than the O3 page because I felt some items required more explanation than could fit in a table row.

Develop a more realistic example/test

The current example is quite minimal and does not test a variety of things that are used in a production search, notably:

  • Different combinations of detectors.
  • The state and DQ channels (#3261).

Who is interested? Tito

Improve autogating

The O3 implementation of autogating still feels a bit clunky and should be more carefully characterized and checked. For example, what happens if a glitch is close to the boundary of the analysis chunk? What happens when the analysis chunk becomes much shorter (early-warning analysis)?

There are also potentially better ways to remove loud glitches, for example the inpainting method developed by IAS. Can one of these methods be used in low latency?

Who is interested? Stéphanie, Tito

Outcome:

Revisit/reevaluate multidetector coincidence

As explained in the O3 paper, including relatively insensitive detectors in the analysis leads to an increase of the trials factor due to how the current statistic is organized. Can this be improved, for example by excluding detectors from the trials based on their local sensitivity?

It would also be useful to introduce something similar to the single-detector trigger fits used by the offline search, as the background rate varies quite a lot over the search space.

Finally, the p_value combination method described in the O3 paper might be more correctly done as a single combination of all p_values, instead of doing it iteratively for each detector. We should understand if this makes any difference.

Who is interested?

Separate processing of different detectors

The latency of the analysis is currently sensitive to the number of observing detectors, because each MPI worker has to process all detectors. Can we improve this by only processing one detector per MPI worker?

Who is interested? Tito

This is being experimented on Tito's branch https://github.com/titodalcanton/pycbc/tree/live_parallelize_detectors.

Centralized access and conditioning of h(t)

Right now each MPI worker requires access to h(t), and does the same conditioning to it. This has led to issues when the h(t) availability or timing becomes inconsistent across the cluster nodes. Can we read and prepare h(t) on the root process and broadcast it to the workers, while maintaining the same latency?

Who is interested? Bhooshan

✅ Test the early-warning configuration on real data

This has partially been done using O2 replay data and seems to work well, but it has to be looked at more carefully.

Who is interested? Barna, Arthur, Stephanie

Outcome:

  • EW search has been running for many months both on replay and real data.
  • There is certainly room for improvement, but as an initial test, this is done.

✅ iDQ integration

Can iDQ be used to improve the robustness of the search against glitches?

Who is interested? Max

Outcome:

✅ Do not stall the main search when uploading a candidate

In O3, each upload was followed by a few immediate follow-up processes (for example adding plots and comments to GraceDB) which created noticeable spikes in the lag of the analysis. Can these operations be split off to separate threads in a nice way?

Who is interested? Xan

Outcome:

Evaluate and/or improve the SNR optimization

What is the effect of the SNR optimization on the skymaps generated by BAYESTAR? Are there ways to improve the speed or accuracy of pycbc_optimize_snr?

Who is interested? Pierre-Alexandre

Outcome:

Investigate alternatives to MPI

MPI has a number of little quirks and annoyances that make it somewhat inconvenient to use on the CIT cluster. Here is a (probably incomplete) list:

  • Intel's MPI implementation appears to impose a barrier at each gather operation. OpenMPI does not.
  • However, OpenMPI does not work at CIT because it does not like computers with multiple IP addresses on the same bonded network interface (see https://github.com/open-mpi/ompi/issues/5818 for discussion on that).
  • There does not seem to be a way to do fault-tolerant gather operations: if a node dies, the whole analysis hangs and has to be manually killed. Not sure if this is just an mpi4py limitation, or a more general MPI issue.
  • The analysis also hangs at startup if one of the nodes is dead, and has to be manually killed.

Is there a different way to organize the multiprocess/multinode operation and communication, possibly using Condor?

Who is interested? Tito

✅ Bring p_astro calculation into PyCBC Live

Also want to improve the 'semi-analytic' approximations for signal / noise distributions - a draft technical description is being worked on at this Overleaf link

Who is interested? T. Dent, A. Lundgren, …

Outcome: lots of work by Tom and Veronica, e.g.

Low latency approximate PE

Based on the highest SNR (max likelihood) mass and spin point, use a coordinate scheme where the metric is flat or nearly flat to create an expected parameter error region to get parameter uncertainties for source classification, EM predictions etc

Who is interested? Tom, Veronica

Add injection infrastructure

We should be able to run an instance with the injection without storing a large set of frame files. There are various ways in which this can be done:

  1. Running 2 separate instances: with and without injections
  2. Run a separate set of processes with injection and used the correct (without injection) background on the fly.

Who is interested? Bhooshan

Clone this wiki locally