-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Task running order in pipeline #181
Comments
Here's another one that is a bit more pernicious. The ordering of tasks can do very strange things to the way in which things are run. Here's a non-minimal example that goes crazy. You would think that this task would feed each generated mockcatalog into the input of tasks:
- type: draco.core.io.LoadProductManager
out: manager
params:
product_directory: /project/rpp-chime/chime/beam_transfers/chime_4cyl_585-800_I_XX-YY_{anbeam}/
- type: draco.core.io.LoadFITSCatalog
out: cat_for_selfunc
params:
catalogs:
- tag: "rcat"
files:
- "/project/rpp-chime/chime/catalogs/eBOSS_{tracer}_clustering_random-{field}-vDR16.fits"
freq_range: [585.0, 800.0]
- type: draco.synthesis.mockcatalog.SelectionFunctionEstimator
in: cat_for_selfunc
out: selfunc
params:
save: false
output_name: selfunc.h5
- type: draco.core.io.LoadBasicCont
out: source_map
params:
files:
- /project/rpp-chime/chime/stacking/sims/lss/{lss}/maps/map_{tracer}_log.h5
- type: draco.synthesis.mockcatalog.ResizeSelectionFunctionMap
in: [selfunc, source_map]
out: resized_selfunc
params:
smooth: true
save: false
output_name: resized_selfunc.h5
- type: draco.synthesis.mockcatalog.PdfGeneratorWithSelectionFunction
in: [source_map, resized_selfunc]
out: pdf_map
params:
tracer: {tracer}
save: false
- type: draco.synthesis.mockcatalog.MockCatalogGenerator
requires: pdf_map
out: mock_cat
params:
nsource: *{tracer}_{field}
tag: "{{count:04d}}"
ncat: 10000
- type: draco.synthesis.mockcatalog.AddEBOSSZErrorsToCatalog
in: mock_cat
out: mock_cat_zerror
params:
save: false
output_name: "mockcatalog_{{tag}}.h5"
- type: draco.core.io.LoadFilesFromParams
out: ringmap
params:
files:
- "/project/rpp-chime/chime/stacking/sims/analysis_ringmaps/{simbeam}/{anbeam}/{ss}_{lss}/dcringmap_{state}filter.h5"
distributed: true
- type: draco.core.io.LoadFilesFromParams
out: mask
params:
files:
- "/project/rpp-chime/chime/stacking/data/mask_rad_03_from_ringmap_intercyl_dayenu_relaxed_deconvolve_psbeam_fullstack.h5"
distributed: true
- type: draco.analysis.flagging.MaskBeamformedOutliers
in: [ringmap, mask]
out: ringmap_masked
- type: draco.analysis.beamform.RingMapBeamForm
requires: [manager, ringmap_masked]
in: mock_cat_zerror
out: formed_beam
params:
save: false
output_name: "formedbeam_filtered_{{tag}}.h5"
- type: draco.analysis.sourcestack.SourceStack
in: formed_beam
out: stack
params:
freqside: 50 If the tasks for loading the maps, mask and applying it are moved up towards the top of the config it works fine. tasks:
- type: draco.core.io.LoadProductManager
out: manager
params:
product_directory: /project/rpp-chime/chime/beam_transfers/chime_4cyl_585-800_I_XX-YY_psbeam/
- type: draco.core.io.LoadFilesFromParams
out: ringmap
params:
files:
- "/project/rpp-chime/chime/stacking/sims/analysis_ringmaps/psbeam/psbeam/dataweight_compderiv-00-FoG/dcringmap_postfilter.h5"
distributed: true
- type: draco.core.io.LoadFilesFromParams
out: mask
params:
files:
- "/project/rpp-chime/chime/stacking/data/mask_rad_03_from_ringmap_intercyl_dayenu_relaxed_deconvolve_psbeam_fullstack.h5"
distributed: true
- type: draco.analysis.flagging.MaskBeamformedOutliers
in: [ringmap, mask]
out: ringmap_masked
- type: draco.core.io.LoadFITSCatalog
out: cat_for_selfunc
params:
catalogs:
- tag: "rcat"
files:
- "/project/rpp-chime/chime/catalogs/eBOSS_QSO_clustering_random-NGC-vDR16.fits"
freq_range: [585.0, 800.0]
- type: draco.synthesis.mockcatalog.SelectionFunctionEstimator
in: cat_for_selfunc
out: selfunc
params:
save: false
output_name: selfunc.h5
- type: draco.core.io.LoadBasicCont
out: source_map
params:
files:
- /project/rpp-chime/chime/stacking/sims/lss/compderiv-00-FoG/maps/map_QSO_log.h5
- type: draco.synthesis.mockcatalog.ResizeSelectionFunctionMap
in: [selfunc, source_map]
out: resized_selfunc
params:
smooth: true
save: false
output_name: resized_selfunc.h5
- type: draco.synthesis.mockcatalog.PdfGeneratorWithSelectionFunction
in: [source_map, resized_selfunc]
out: pdf_map
params:
tracer: QSO
save: false
- type: draco.synthesis.mockcatalog.MockCatalogGenerator
requires: pdf_map
out: mock_cat
params:
nsource: *QSO_NGC
tag: "{count:04d}"
ncat: 10
- type: draco.synthesis.mockcatalog.AddEBOSSZErrorsToCatalog
in: mock_cat
out: mock_cat_zerror
params:
save: true
output_name: "mockcatalog_{tag}.h5"
- type: draco.analysis.beamform.RingMapBeamForm
requires: [manager, ringmap_masked]
in: mock_cat_zerror
out: formed_beam
params:
save: false
output_name: "formedbeam_filtered_{tag}.h5"
- type: draco.analysis.sourcestack.SourceStack
in: formed_beam
out: stack
params:
freqside: 50 The reason this happens is a little obscure. When a task reports that it has no input data with which to do anything (i.e. raises Line 642 in b578e60
MaskBeamformedOutliers, at that point it knows it currently doesn't have any data it can run on, so it raises the _PipelineMissingData` exception, and that causes it not to advance to the next task (which I think would be the right choice), but to skip straight back to the start of the list.
I think there are a few potential resolutions here:
I think @kiyo-masui is the only person out there who is likely to understand the ins and outs of this one. I think the first one is an easy fix, but I fear it may break things in ways I am not foreseeing. |
In many ways its a miracle that we made it this far (7 years by my count) before my convoluted dependency resolution pipeliner stopped meeting our needs! I've given these a single read, but need to dive deeper before I can comment. Will get back to you. |
Awesome. Thanks for looking through it Kiyo. I'll try and reduce the configs down to something which makes the problem more manifest. I just wanted to dump something in there before I forgot how to reproduce the issue! |
I've been trying to think what to do here, I think beyond changing the
|
Few things: First, just for reference, here is the link to the docs describing the current rules: https://caput.readthedocs.io/en/latest/_autosummary/caput.pipeline.html#execution-order Next, I want to assure you that the current execution ordering rules where developed for ease and simplicity, rather than optimality. I explicitly don't do any inspection of the input and output queues. The only information the current logic uses is the pipeline order, and the "state" of each task (is it at the setup, next, or finish stage). I think making some sort of priority system would be great... I didn't do it just because it was too complicated for the use cases envisaged at the time. I think my choice to use In fact, with the current logic it is a requirement that dependencies are generated further up in the task list than they are consumed. In the example you provided where you suggested the work around would be to move In any case, if you make a change you should build in a legacy mode for execution order. The conservative thing would be to keep the current order as the default and only change the behaviour if a certain key appears in the |
Yeah, definitely agree with this. I'm not in a huge rush to make changes here, but I think we could make some useful improvements to it. Do you think that changing |
I think that is as dangerous as any other change you might do, so you might as well build in the legacy mode from the start. (Luckily its a single if statement in this case... unlike the priority system which might involve substantial refactoring). |
That's fair. So I guess the first thing to do is to abstract the current ordering code to allow that change to be done and have an optional legacy mode which retains the |
Something we ran into for this bug: radiocosmology/draco#133
On days with no data, the The pipeline framework should not be starting |
data_available is set in `_process_config`, instead of in `_process_data`, bc of this bug: radiocosmology/caput#181 (comment) For `BeamFormCat`, `_process_data` is in `setup()`, and `setup()` is not called if there is no data available. Closes #133
data_available is set in `_process_config`, instead of in `_process_data`, bc of this bug: radiocosmology/caput#181 (comment) For `BeamFormCat`, `_process_data` is in `setup()`, and `setup()` is not called if there is no data available. Closes #133
I've found another interesting snippet from the daily config that has some weird behaviour. # Mask out intercylinder baselines before beam forming to minimise cross
# talk. This creates a copy of the input that shares the vis dataset (but
# with a distinct weight dataset) to save memory
- type: draco.analysis.flagging.MaskBaselines
requires: manager
in: sstream_mask
out: sstream_inter
params:
share: vis
mask_short_ew: 1.0
# Load the source catalogs to measure fluxes of
- type: draco.core.io.LoadBasicCont
out: source_catalog
params:
files:
- "{catalogs[0]}"
- "{catalogs[1]}"
- "{catalogs[2]}"
- "{catalogs[3]}"
# Measure the observed fluxes of the point sources in the catalogs
- type: draco.analysis.beamform.BeamFormCat
requires: [manager, sstream_inter]
in: source_catalog
params:
timetrack: 300.0
save: true
output_name: "sourceflux_{{tag}}.h5"
# Mask out day time data
- type: ch_pipeline.analysis.flagging.DayMask
in: sstream_mask
out: sstream_mask1
- type: ch_pipeline.analysis.flagging.MaskMoon
in: sstream_mask1
out: sstream_mask2
# Remove ranges of time known to be bad that may effect the delay power
# spectrum estimate
- type: ch_pipeline.analysis.flagging.DataFlagger
in: sstream_mask2
out: sstream_mask3
params:
flag_type:
- acjump
- bad_calibration_acquisition_restart
- bad_calibration_fpga_restart
- bad_calibration_gains
- decorrelated_cylinder
- globalflag
- rain1mm
# Load the stack that we will blend into the daily data
- type: draco.core.io.LoadBasicCont
out: sstack
params:
files:
- "{blend_stack_file}"
selections:
freq_range: [{freq[0]:d}, {freq[1]:d}]
- type: draco.analysis.flagging.BlendStack
requires: sstack
in: sstream_mask3
out: sstream_blend1
params:
frac: 1e-4 Through this series of tasks, what we would think/want to happen is for Instead, the processing order goes: As a result, we end up having two copies of the sidereal stream (with shared vis dataset), plus a copy of the vis dataset held by I'll update this once I have a proper config that can be run on its own to replicate this |
I'm just going to dump another note in here so I remember it. A fairly significant inefficiency that I've noticed is that tasks are kept around for one full pipeline iteration after their |
I've been finding and thinking about a few issues with the task running order within
caput.pipeline
, and I figured I would start a discussion about potential fixes or mitigations.Products kept around longer than needed
First let's think about an example config:
In this the initial
stream
is generated, each filter task is run which adds two entries to thefiltered_stream
queue and then finally theMakeMap
tasks consumes them. While all of this is technically correct it has kept thefiltered_stream
entries around for longer than was necessary as each task is run in turn. In theoryMakeMap
could have run for the first time immediately afterFilterA
which would allowed it to have dropped the first item infiltered_stream
. This may seem like a small issue, but we are often memory constrained and so running tasks in an order which uses more memory than required is an issue.In the example above this could be fixed by the user by changing the task order in the file to be: MakeStream, FilterA, MakeMap, FilterB; but although this would work above, if we added a FilterC there would be no good place to place the MakeMap task.
I can see a few potential changes that would help here, all of which change add a concept of priority to the task running order which would mean they run ahead of other tasks provided their requirements are satisfied (i.e. they become greedy):
Pairing up/produce output more often than you get input
Lets say I want to stack catalog sources on a set of maps. I might have something like this:
what I want this to do is to stack all pairs of catalogs and maps, but what it will do is to stack
(catalogA, map1)
,(catalogB, map2)
, and then stop because it never gets anything to put withmap3
. In some sense this is fair, there is no way the pipeline could know that is what I want it to do. I could attempt to write a task that sites before Stack that does the pairing for me, but it turns out that it is impossible to even write a task that would allow this pairing up. There's two reasons you can't do this:requires
key and produce an unlimited amount of output but that is violated in the above example. There's no way to make the above work with this restriction, there are five total inputs and six possible pairs.Other issues
I have a few other problems, so I'll try and add them here later on
The text was updated successfully, but these errors were encountered: