Merge v0.2.1 development branch (#324)

* Use nearest-neighbour interpolation in regions where extrapolation is required. (#285) * Fix version drift. * Bump to 0.2.0 * Use nearest-neighbour interpolation for points requiring extrapolation. * Utilise environment variable when dask.address is unset. (#288) * Fix version drift. * Bump to 0.2.0 * Inspect envvar for scheduler address when one isn't specified. * Encode environment varraible as ascii. * Simplify. * Add plotting functionality (#290) * Fix version drift. * Bump to 0.2.0 * Initial commit of basic plotting functionality. * Change naming convention. * Improve transform argument. * Simplify transform selection. * Add rudimentary time and frequency selection. * Checkpoint ploter changes. Can now handle scans and spws, but is very slow. * More work on plotter - can now plot datasets in parallel. * Some tidying. * Slightly improve plot speed. Dominant cost is still saving the figures. * Commit some minor changes which speed up figure saving. * Lots of tiny fixes. * Tiny cosmetic changes. * Add custom tick formatter so that plots are the same size regardless. * Add matplotlib dependency. * Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC. * Rename variable to avoid confusion. * Fix bug affecting recursive grouping. * Avoid copies in grouping code. * Checkpoint work on extending functionality. * Make plotter more powerful. Add colourization option. Begin simplifying interface. * Allow user specification of colourmap. * Add plotsize parameter. * Fix #293 - OOB access caused by `output.subtract_directions` (#294) * Fix version drift. * Bump to 0.2.0 * Fix #293. * Namedbackups (#296) * Fix version drift. * Bump to 0.2.0 * Add optional label and single field selection to backup app * remove item instead of pop@index * do not .remove() from xds_list * Simplify using some existing functionality. --------- Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> Co-authored-by: landmanbester <lbester@ska.ac.za> * Selectively disable MAD flagging criteria (#298) * Fix version drift. * Bump to 0.2.0 * Setting MAD threshold to zero will disable flagging on a given statistic. * Disable mad flagging on off-diagonals by default (#300) * Fix version drift. * Bump to 0.2.0 * Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission. * Fix bug affecting non-standard columns in `input_ms.data_column` (#301) * Fix version drift. * Bump to 0.2.0 * Fix a bug afecting the use of non-standard columns in data column input. * Don't allow restore app to overwrite metadata (#307) * assign to ms to avoid over-writing metadata in restore app * zip datasets in enumerate * add comment to document failure case * use backup_column_name in restore app * Apply OCD. --------- Co-authored-by: landmanbester <lbester@ska.ac.za> Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com> * Fix for summary reporting SOURCE_ID as FIELD_ID (#309) * Fix version drift. * Bump to 0.2.0 * Make summary correctly report FIELD_ID and SOURCE_ID. * Fix receptor summary (#310) * Fix version drift. * Bump to 0.2.0 * Fix incorrect assumption that FEED substable will always have 2 receptors. * Fix similar problem affecting parallactic angle construction. * Update missing column selection for compatibility with upsteam changes. * Fix xarray dims (#318) * Fix version drift. * Bump to 0.2.0 * Move all usage of xds.dims[dim] to xds.sizes[dim] in preparation for change of return type in xds.dims. * Fixes for changes relating to Numba error types. (#319) * Move now-deprecated graph metrics function into the scheduler plugin code. (#320) * Make small changes to enable 3.11 compatibilty. Requires changes in stimela + a release. (#321) * Restringify keys in scheduler plugin. (#322) * Update pyproject.toml. Add poetry.lock. Update docs. (#323) * Drop 3.8. Commit poetry lock file. * Update stimela requirement. * Update docs. * Set min and max versions in pyproject.toml. * Remove python3.8 from test matrix. --------- Co-authored-by: Landman Bester <lbester@sarao.ac.za> Co-authored-by: landmanbester <lbester@ska.ac.za>
ratt-ru · Jan 30, 2024 · be28e2f · be28e2f
1 parent fe1a23d
commit be28e2f
Show file tree

Hide file tree

Showing 51 changed files with 3,735 additions and 176 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -18,7 +18,7 @@ jobs:
     strategy:
       matrix:
         os: [ubuntu-20.04, ubuntu-22.04]
-        python-version: ["3.8", "3.9", "3.10"]
+        python-version: ["3.9", "3.10"]
 
     steps:
       - name: Set up Python ${{ matrix.python-version }}

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -3,8 +3,11 @@ Installation
 
 This page details QuartiCal's recommended installation procedure.
 
-Ubuntu 18.04+
-~~~~~~~~~~~~~
+Ubuntu 18.04+ via pip
+~~~~~~~~~~~~~~~~~~~~~
+
+This is the preferred method of installation. It is simple but may be
+vulnerable to upstream changes.
 
 If you wish to install QuartiCal in a virtual environment (recommended), see
 `Using a virtual environment`_.
@@ -25,6 +28,37 @@ QuartiCal can be installed by running the following:
 		pip3 install -e path/to/repo/
 
 
+Ubuntu 18.04+ via poetry
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Installing via poetry is less simple but should always work.
+
+Firstly, install `poetry <https://python-poetry.org/docs/>`_
+
+Assuming you have cloned the repository from git and checked out the relevant
+tag, run the following from inside the QuartiCal folder:
+
+.. code:: bash
+
+	poetry install
+
+.. note::
+
+	This will automatically install QuartiCal into a new virtual environment
+	matching your system Python. The Python version can be changed prior to
+	installation using:
+
+	.. code:: bash
+
+		poetry env use python3.10
+
+Users can enter the QuartiCal virtual environment using:
+
+	.. code:: bash
+
+		poetry -C path/to/repo shell
+
+
 Using a virtual environment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -13,7 +13,6 @@ classifiers = [
     "License :: OSI Approved :: MIT License",
     "Operating System :: POSIX :: Linux",
     "Programming Language :: Python :: 3",
-    "Programming Language :: Python :: 3.8",
     "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
     "Topic :: Scientific/Engineering :: Astronomy"
@@ -25,29 +24,30 @@ include = [
 ]
 
 [tool.poetry.dependencies]
-python = "^3.8"
-tbump = "^6.10.0"
-columnar = "^1.4.1"
-"ruamel.yaml" = "^0.17.26"
-dask = {extras = ["diagnostics"], version = "^2023.1.0"}
-distributed = "^2023.1.0"
-dask-ms = {extras = ["s3", "xarray", "zarr"], version = "^0.2.16"}
-codex-africanus = {extras = ["dask", "scipy", "astropy", "python-casacore"], version = "^0.3.4"}
-astro-tigger-lsm = "^1.7.2"
-loguru = "^0.7.0"
-requests = "^2.31.0"
-pytest = "^7.3.1"
-omegaconf = "^2.3.0"
-colorama = "^0.4.6"
-stimela = "2.0rc4"
+python = "^3.9"
+astro-tigger-lsm = ">=1.7.2, <=1.7.3"
+codex-africanus = {extras = ["dask", "scipy", "astropy", "python-casacore"], version = ">=0.3.4, <=0.3.4"}
+colorama = ">=0.4.6, <=0.4.6"
+columnar = ">=1.4.1, <=1.4.1"
+dask = {extras = ["diagnostics"], version = ">=2023.5.0, <=2023.12.1"}
+dask-ms = {extras = ["s3", "xarray", "zarr"], version = ">=0.2.16, <=0.2.18"}
+distributed = ">=2023.5.0, <=2023.12.1"
+loguru = ">=0.7.0, <=0.7.2"
+matplotlib = ">=3.5.1, <=3.8.2"
+omegaconf = ">=2.3.0, <=2.3.0"
+pytest = ">=7.3.1, <=7.4.4"
+requests = ">=2.31.0, <=2.31.0"
+"ruamel.yaml" = ">=0.17.26, <=0.17.40"
+stimela = "2.0rc8"
+tbump = ">=6.10.0, <=6.11.0"
 
 [tool.poetry.scripts]
 goquartical = 'quartical.executor:execute'
 goquartical-config = 'quartical.config.parser:create_user_config'
 goquartical-backup = 'quartical.apps.backup:backup'
 goquartical-restore = 'quartical.apps.backup:restore'
 goquartical-summary = 'quartical.apps.summary:summary'
-
+goquartical-plot = 'quartical.apps.plotter:plot'
 
 [build-system]
 requires = ["poetry-core"]

diff --git a/quartical/apps/backup.py b/quartical/apps/backup.py
@@ -1,5 +1,6 @@
 import argparse
 from math import prod, ceil
+from quartical.data_handling.selection import filter_xds_list
 from daskms import xds_from_storage_ms, xds_to_storage_table
 from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr
 from daskms.fsspec_store import DaskMSStore
@@ -10,8 +11,9 @@
 def backup():
     parser = argparse.ArgumentParser(
         description='Backup any Measurement Set column to zarr. Backups will '
-                    'be labelled automatically using the current datetime, '
-                    'the Measurement Set name and the column name.'
+                    'be labelled using a combination of the passed in label '
+                    '(defaults to datetime), the Measurement Set name and '
+                    'the column name.'
     )
 
     parser.add_argument(
@@ -33,19 +35,34 @@ def backup():
         type=str,
         help='Name of column to be backed up.'
     )
+    parser.add_argument(
+        '--label',
+        type=str,
+        help='An explicit label to include in the backup name. Defaults to '
+             'datetime at which the backup was created. Full name will be '
+             'given by [label]-[msname]-[column].bkp.qc.'
+    )
     parser.add_argument(
         '--nthread',
         type=int,
         default=1,
         help='Number of threads to use.'
     )
+    parser.add_argument(
+        '--field-id',
+        type=int,
+        help='Field ID to back up.'
+    )
 
     args = parser.parse_args()
 
     ms_name = args.ms_path.full_path.rsplit("/", 1)[1]
     column_name = args.column_name
 
-    timestamp = time.strftime("%Y%m%d-%H%M%S")
+    if args.label:
+        label = args.label
+    else:
+        label = time.strftime("%Y%m%d-%H%M%S")
 
     # This call exists purely to get the relevant shape and dtype info.
     data_xds_list = xds_from_storage_ms(
@@ -55,8 +72,11 @@ def backup():
         group_cols=("FIELD_ID", "DATA_DESC_ID", "SCAN_NUMBER"),
     )
 
+    # Use existing functionality. TODO: Improve and expose DDID selection.
+    xdso = filter_xds_list(data_xds_list, args.field_id)
+
     # Compute appropriate chunks (256MB by default) to keep zarr happy.
-    chunks = [chunk_by_size(xds[column_name]) for xds in data_xds_list]
+    chunks = [chunk_by_size(xds[column_name]) for xds in xdso]
 
     # Repeat of above call but now with correct chunking information.
     data_xds_list = xds_from_storage_ms(
@@ -67,9 +87,12 @@ def backup():
         chunks=chunks
     )
 
+    # Use existing functionality. TODO: Improve and expose DDID selection.
+    xdso = filter_xds_list(data_xds_list, args.field_id)
+
     bkp_xds_list = xds_to_zarr(
-        data_xds_list,
-        f"{args.zarr_dir.url}::{timestamp}-{ms_name}-{column_name}.bkp.qc",
+        xdso,
+        f"{args.zarr_dir.url}::{label}-{ms_name}-{column_name}.bkp.qc",
     )
 
     dask.compute(bkp_xds_list, num_workers=args.nthread)
@@ -112,9 +135,26 @@ def restore():
     zarr_root, zarr_name = args.zarr_path.url.rsplit("/", 1)
 
     zarr_xds_list = xds_from_zarr(f"{zarr_root}::{zarr_name}")
+    backup_column_name = list(zarr_xds_list[0].data_vars.keys()).pop()
+
+    # This will fail if the column does not exist but if we allow all columns
+    # we need to select out the relevant dims for rechunking below
+    ms_xds_list = xds_from_storage_ms(
+        args.ms_path,
+        columns=(args.column_name,),
+        index_cols=("TIME",),
+        group_cols=("FIELD_ID", "DATA_DESC_ID", "SCAN_NUMBER"),
+    )
+
+    for i, (ds, dsr) in enumerate(zip(ms_xds_list, zarr_xds_list)):
+        dsr = dsr.chunk(ds.chunks)
+        data_array = getattr(dsr, backup_column_name)
+        ms_xds_list[i] = ds.assign(
+            {args.column_name: (data_array.dims, data_array.data)}
+        )
 
     restored_xds_list = xds_to_storage_table(
-        zarr_xds_list,
+        ms_xds_list,
         args.ms_path,
         columns=(args.column_name,),
         rechunk=True