Skip to content

Commit

Permalink
Merge v0.2.1 development branch (#324)
Browse files Browse the repository at this point in the history
* Use nearest-neighbour interpolation in regions where extrapolation is required. (#285)

* Fix version drift.

* Bump to 0.2.0

* Use nearest-neighbour interpolation for points requiring extrapolation.

* Utilise environment variable when dask.address is unset. (#288)

* Fix version drift.

* Bump to 0.2.0

* Inspect envvar for scheduler address when one isn't specified.

* Encode environment varraible as ascii.

* Simplify.

* Add plotting functionality (#290)

* Fix version drift.

* Bump to 0.2.0

* Initial commit of basic plotting functionality.

* Change naming convention.

* Improve transform argument.

* Simplify transform selection.

* Add rudimentary time and frequency selection.

* Checkpoint ploter changes. Can now handle scans and spws, but is very slow.

* More work on plotter - can now plot datasets in parallel.

* Some tidying.

* Slightly improve plot speed. Dominant cost is still saving the figures.

* Commit some minor changes which speed up figure saving.

* Lots of tiny fixes.

* Tiny cosmetic changes.

* Add custom tick formatter so that plots are the same size regardless.

* Add matplotlib dependency.

* Rework construction of plotting dictionary. Add a few utility functions which will likely be useful in other places in QC.

* Rename variable to avoid confusion.

* Fix bug affecting recursive grouping.

* Avoid copies in grouping code.

* Checkpoint work on extending functionality.

* Make plotter more powerful. Add colourization option. Begin simplifying interface.

* Allow user specification of colourmap.

* Add plotsize parameter.

* Fix #293 - OOB access caused by `output.subtract_directions`  (#294)

* Fix version drift.

* Bump to 0.2.0

* Fix #293.

* Namedbackups (#296)

* Fix version drift.

* Bump to 0.2.0

* Add optional label and single field selection to backup app

* remove item instead of pop@index

* do not .remove() from xds_list

* Simplify using some existing functionality.

---------

Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com>
Co-authored-by: landmanbester <lbester@ska.ac.za>

* Selectively disable MAD flagging criteria (#298)

* Fix version drift.

* Bump to 0.2.0

* Setting MAD threshold to zero will disable flagging on a given statistic.

* Disable mad flagging on off-diagonals by default (#300)

* Fix version drift.

* Bump to 0.2.0

* Disable flagging based on off-diagonal correlations in the mad flagger by default. This should make the mad flagger less agressive on data with unmodelled polarised emission.

* Fix bug affecting non-standard columns in `input_ms.data_column` (#301)

* Fix version drift.

* Bump to 0.2.0

* Fix a bug afecting the use of non-standard columns in data column input.

* Don't allow restore app to overwrite metadata (#307)

* assign to ms to avoid over-writing metadata in restore app

* zip datasets in enumerate

* add comment to document failure case

* use backup_column_name in restore app

* Apply OCD.

---------

Co-authored-by: landmanbester <lbester@ska.ac.za>
Co-authored-by: JSKenyon <jonathan.simon.kenyon@gmail.com>

* Fix for summary reporting SOURCE_ID as FIELD_ID (#309)

* Fix version drift.

* Bump to 0.2.0

* Make summary correctly report FIELD_ID and SOURCE_ID.

* Fix receptor summary (#310)

* Fix version drift.

* Bump to 0.2.0

* Fix incorrect assumption that FEED substable will always have 2 receptors.

* Fix similar problem affecting parallactic angle construction.

* Update missing column selection for compatibility with upsteam changes.

* Fix xarray dims (#318)

* Fix version drift.

* Bump to 0.2.0

* Move all usage of xds.dims[dim] to xds.sizes[dim] in preparation for change of return type in xds.dims.

* Fixes for changes relating to Numba error types. (#319)

* Move now-deprecated graph metrics function into the scheduler plugin code. (#320)

* Make small changes to enable 3.11 compatibilty. Requires changes in stimela + a release. (#321)

* Restringify keys in scheduler plugin. (#322)

* Update pyproject.toml. Add poetry.lock. Update docs. (#323)

* Drop 3.8. Commit poetry lock file.

* Update stimela requirement.

* Update docs.

* Set min and max versions in pyproject.toml.

* Remove python3.8 from test matrix.

---------

Co-authored-by: Landman Bester <lbester@sarao.ac.za>
Co-authored-by: landmanbester <lbester@ska.ac.za>
  • Loading branch information
3 people authored Jan 30, 2024
1 parent fe1a23d commit be28e2f
Show file tree
Hide file tree
Showing 51 changed files with 3,735 additions and 176 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-20.04, ubuntu-22.04]
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.9", "3.10"]

steps:
- name: Set up Python ${{ matrix.python-version }}
Expand Down
38 changes: 36 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@ Installation

This page details QuartiCal's recommended installation procedure.

Ubuntu 18.04+
~~~~~~~~~~~~~
Ubuntu 18.04+ via pip
~~~~~~~~~~~~~~~~~~~~~

This is the preferred method of installation. It is simple but may be
vulnerable to upstream changes.

If you wish to install QuartiCal in a virtual environment (recommended), see
`Using a virtual environment`_.
Expand All @@ -25,6 +28,37 @@ QuartiCal can be installed by running the following:
pip3 install -e path/to/repo/
Ubuntu 18.04+ via poetry
~~~~~~~~~~~~~~~~~~~~~~~~

Installing via poetry is less simple but should always work.

Firstly, install `poetry <https://python-poetry.org/docs/>`_

Assuming you have cloned the repository from git and checked out the relevant
tag, run the following from inside the QuartiCal folder:

.. code:: bash
poetry install
.. note::

This will automatically install QuartiCal into a new virtual environment
matching your system Python. The Python version can be changed prior to
installation using:

.. code:: bash
poetry env use python3.10
Users can enter the QuartiCal virtual environment using:

.. code:: bash
poetry -C path/to/repo shell
Using a virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
2,887 changes: 2,887 additions & 0 deletions poetry.lock

Large diffs are not rendered by default.

34 changes: 17 additions & 17 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ classifiers = [
"License :: OSI Approved :: MIT License",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Scientific/Engineering :: Astronomy"
Expand All @@ -25,29 +24,30 @@ include = [
]

[tool.poetry.dependencies]
python = "^3.8"
tbump = "^6.10.0"
columnar = "^1.4.1"
"ruamel.yaml" = "^0.17.26"
dask = {extras = ["diagnostics"], version = "^2023.1.0"}
distributed = "^2023.1.0"
dask-ms = {extras = ["s3", "xarray", "zarr"], version = "^0.2.16"}
codex-africanus = {extras = ["dask", "scipy", "astropy", "python-casacore"], version = "^0.3.4"}
astro-tigger-lsm = "^1.7.2"
loguru = "^0.7.0"
requests = "^2.31.0"
pytest = "^7.3.1"
omegaconf = "^2.3.0"
colorama = "^0.4.6"
stimela = "2.0rc4"
python = "^3.9"
astro-tigger-lsm = ">=1.7.2, <=1.7.3"
codex-africanus = {extras = ["dask", "scipy", "astropy", "python-casacore"], version = ">=0.3.4, <=0.3.4"}
colorama = ">=0.4.6, <=0.4.6"
columnar = ">=1.4.1, <=1.4.1"
dask = {extras = ["diagnostics"], version = ">=2023.5.0, <=2023.12.1"}
dask-ms = {extras = ["s3", "xarray", "zarr"], version = ">=0.2.16, <=0.2.18"}
distributed = ">=2023.5.0, <=2023.12.1"
loguru = ">=0.7.0, <=0.7.2"
matplotlib = ">=3.5.1, <=3.8.2"
omegaconf = ">=2.3.0, <=2.3.0"
pytest = ">=7.3.1, <=7.4.4"
requests = ">=2.31.0, <=2.31.0"
"ruamel.yaml" = ">=0.17.26, <=0.17.40"
stimela = "2.0rc8"
tbump = ">=6.10.0, <=6.11.0"

[tool.poetry.scripts]
goquartical = 'quartical.executor:execute'
goquartical-config = 'quartical.config.parser:create_user_config'
goquartical-backup = 'quartical.apps.backup:backup'
goquartical-restore = 'quartical.apps.backup:restore'
goquartical-summary = 'quartical.apps.summary:summary'

goquartical-plot = 'quartical.apps.plotter:plot'

[build-system]
requires = ["poetry-core"]
Expand Down
54 changes: 47 additions & 7 deletions quartical/apps/backup.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import argparse
from math import prod, ceil
from quartical.data_handling.selection import filter_xds_list
from daskms import xds_from_storage_ms, xds_to_storage_table
from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr
from daskms.fsspec_store import DaskMSStore
Expand All @@ -10,8 +11,9 @@
def backup():
parser = argparse.ArgumentParser(
description='Backup any Measurement Set column to zarr. Backups will '
'be labelled automatically using the current datetime, '
'the Measurement Set name and the column name.'
'be labelled using a combination of the passed in label '
'(defaults to datetime), the Measurement Set name and '
'the column name.'
)

parser.add_argument(
Expand All @@ -33,19 +35,34 @@ def backup():
type=str,
help='Name of column to be backed up.'
)
parser.add_argument(
'--label',
type=str,
help='An explicit label to include in the backup name. Defaults to '
'datetime at which the backup was created. Full name will be '
'given by [label]-[msname]-[column].bkp.qc.'
)
parser.add_argument(
'--nthread',
type=int,
default=1,
help='Number of threads to use.'
)
parser.add_argument(
'--field-id',
type=int,
help='Field ID to back up.'
)

args = parser.parse_args()

ms_name = args.ms_path.full_path.rsplit("/", 1)[1]
column_name = args.column_name

timestamp = time.strftime("%Y%m%d-%H%M%S")
if args.label:
label = args.label
else:
label = time.strftime("%Y%m%d-%H%M%S")

# This call exists purely to get the relevant shape and dtype info.
data_xds_list = xds_from_storage_ms(
Expand All @@ -55,8 +72,11 @@ def backup():
group_cols=("FIELD_ID", "DATA_DESC_ID", "SCAN_NUMBER"),
)

# Use existing functionality. TODO: Improve and expose DDID selection.
xdso = filter_xds_list(data_xds_list, args.field_id)

# Compute appropriate chunks (256MB by default) to keep zarr happy.
chunks = [chunk_by_size(xds[column_name]) for xds in data_xds_list]
chunks = [chunk_by_size(xds[column_name]) for xds in xdso]

# Repeat of above call but now with correct chunking information.
data_xds_list = xds_from_storage_ms(
Expand All @@ -67,9 +87,12 @@ def backup():
chunks=chunks
)

# Use existing functionality. TODO: Improve and expose DDID selection.
xdso = filter_xds_list(data_xds_list, args.field_id)

bkp_xds_list = xds_to_zarr(
data_xds_list,
f"{args.zarr_dir.url}::{timestamp}-{ms_name}-{column_name}.bkp.qc",
xdso,
f"{args.zarr_dir.url}::{label}-{ms_name}-{column_name}.bkp.qc",
)

dask.compute(bkp_xds_list, num_workers=args.nthread)
Expand Down Expand Up @@ -112,9 +135,26 @@ def restore():
zarr_root, zarr_name = args.zarr_path.url.rsplit("/", 1)

zarr_xds_list = xds_from_zarr(f"{zarr_root}::{zarr_name}")
backup_column_name = list(zarr_xds_list[0].data_vars.keys()).pop()

# This will fail if the column does not exist but if we allow all columns
# we need to select out the relevant dims for rechunking below
ms_xds_list = xds_from_storage_ms(
args.ms_path,
columns=(args.column_name,),
index_cols=("TIME",),
group_cols=("FIELD_ID", "DATA_DESC_ID", "SCAN_NUMBER"),
)

for i, (ds, dsr) in enumerate(zip(ms_xds_list, zarr_xds_list)):
dsr = dsr.chunk(ds.chunks)
data_array = getattr(dsr, backup_column_name)
ms_xds_list[i] = ds.assign(
{args.column_name: (data_array.dims, data_array.data)}
)

restored_xds_list = xds_to_storage_table(
zarr_xds_list,
ms_xds_list,
args.ms_path,
columns=(args.column_name,),
rechunk=True
Expand Down
Loading

0 comments on commit be28e2f

Please # to comment.