Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pulled in currently deployed functionality #1

Merged
merged 13 commits into from
Sep 16, 2024
4 changes: 3 additions & 1 deletion .github/workflows/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,14 @@ permissions:
jobs:
test-python:
uses: NERC-CEH/dri-cicd/.github/workflows/test-python.yml@main
with:
optional_dependencies: "[lint,test,all]"

build-test-deploy-docker:
needs: [test-python]
uses: NERC-CEH/dri-cicd/.github/workflows/build-test-deploy-docker.yml@main
with:
package_name: driio
package_name: driutils
secrets:
AWS_REGION: ${{ secrets.AWS_REGION }}
AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ COPY --chown=python:python tests/ /app/tests
USER python
ENV PATH="/app/.venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/.venv"
CMD ["python", "-m", "driio"]
CMD ["python", "-m", "driutils"]
167 changes: 106 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,8 @@
# Python Project Template
# DRI IO

[![tests badge](https://github.com/NERC-CEH/python-template/actions/workflows/pipeline.yml/badge.svg)](https://github.com/NERC-CEH/python-template/actions)
[![docs badge](https://github.com/NERC-CEH/python-template/actions/workflows/deploy-docs.yml/badge.svg)](https://nerc-ceh.github.io/python-template/)
[![tests badge](https://github.com/NERC-CEH/dri-utils/actions/workflows/pipeline.yml/badge.svg)](https://github.com/NERC-CEH/dri-utils/actions)

[Read the docs!](https://nerc-ceh.github.io/python-template)

This repository is a template for a basic Python project. Included here is:

* Example Python package
* Tests
* Documentation
* Automatic incremental versioning
* CI/CD
* Installs and tests the package
* Builds documentation on branches
* Deploys documentation on main branch
* Deploys docker image to AWS ECR
* Githook to ensure linting and code checking
This is a Python package that serves to hold commonly implemented Input/Output actions, typically reading and writing file

## Getting Started

Expand Down Expand Up @@ -64,82 +50,141 @@ The docs, tests, and linter packages can be installed together with:
pip install -e .[dev]
```

### Making it Your Own
#### Other Optional Packages

This repo has a single package in the `./src/...` path called `driio` (creative I know). Change this to the name of your package and update it in:
Some utilities need additional packages that aren't relevant to all projects. To install everything, run:

* `docs/conf.py`
* `src/**/*.py`
* `tests/**/*.py`
* `pyproject.toml`
```
pip install -e .[all]
```

To make thing move a bit faster, use the script `./rename-package.sh` to rename all references of `driio` to whatever you like. For example:
or to include datetime utilities:

```
./rename-package.sh "acoolnewname"
pip install -e .[datetime]
```

Will rename the package and all references to "acoolnewname"
#### A Note on Remote Installs

After doing this it is recommended to also run:
You are likely including this on another project, in this case you should include the git url when installing. For manual installs:
```
pip install "dri-utils[all] @ git+https://github.com/NERC-CEH/dri-utils.git"

```
cd docs
make apidoc

or if including it in your dependencies
```
dependencies = [
"another-package",
...
"dri-utils[all] @ git+https://github.com/NERC-CEH/dri-utils.git"
]
```

## Readers

To keep your documentation in sync with the package name. You may need to delete a file called `driio.rst` from `./docs/sources/...`
### DuckDB Reader
The DuckDB classes use the duckdb python interface to read files from local documents or S3 object storage - this comes with the capacity to use custom s3 endpoints.

### Deploying Docs to GitHub Pages
To read a local file:
```python

If you want docs to be published to github pages automatically, go to your repo settings and enable docs from GitHub Actions and the workflows will do the rest.
from driutils.read import DuckDBFileReader

### Building Docs Locally
reader = DuckDBFileReader()
query = "SELECT * FROM READ_PARQUET('myfile.parquet');"
result = reader.read(query)

The documentation is driven by [Sphinx](https://www.sphinx-doc.org/) an industry standard for documentation with a healthy userbase and lots of add-ons. It uses `sphinx-apidoc` to generate API documentation for the codebase from Python docstrings.
# Result will be a <DuckDBPyConnection object>
# Get your desired format such as polars like:
df = result.pl()

To run `sphinx-apidoc` run:
# Or pandas
df = result.df()

# Close the connection
reader.close()
```
# Install your package with optional dependencies for docs
pip install -e .[docs]

cd docs
make apidoc
Alternatively, use a context manager to automatically close the connection:
```python
...

with DuckDBFileReader() as reader:
df = reader.read(query, params).df()
```

This will populate `./docs/sources/...` with `*.rst` files for each Python module, which may be included into the documentation.
To read from an S3 storage location there is a more configuration available and there is 3 use cases supported:

* Automatic credential loading from current environment variables
* Automatic credential loading from an assumed role
* Authentication to a custom s3 endpoint, i.e. localstack. This currently assumes that credentials aren't needed (they aren't for now)

Documentation can then be built locally by running `make html`, or found on the [GitHub Deployment](https://nerc-ceh.github.io/python-template).
The reader is instantiated like this:
```python
from driutils.read import import DuckDBS3Reader

### Run the Tests
# Automatic authentication from your environment
auto_auth_reader = DuckDBS3Reader("auto")

To run the tests run:
# Automatic authentication from your assumed role
sts_auth_reader = DuckDBS3Reader("sts")

# Custom url for localstack
endpoint = "http://localhost:<port>"
custom_url_reader = DuckDBS3Reader(
"custom_endpoint",
endpoint_url=endpoint,
use_ssl=False
)

# Custom url using https protocol
endpoint = "https://a-real.s3.endpoint"
custom_url_reader = DuckDBS3Reader(
"custom_endpoint",
endpoint_url=endpoint,
use_ssl=True
)
```
#Install package with optional dependencies for testing
pip install -e .[test]

pytest
The `reader.read()` in the background forwards a DuckDB SQL query and parameters to fill arguments in the query with.

## Writers

### S3 Object Writer

The `S3Writer` uploads files to S3 using a pre-existing `S3Client` which is left to the user to resource, but is commonly implemented as:
```python

import boto3
from driutils.write import S3Writer

s3_client = boto3.client('s3', endpoint_url="an_optional_url")
content = "Just a lil string"

writer = S3Writer(s3_client)
writer.write(
bucket_name="target-bucket",
key="path/to/upload/destination",
body=content
)
```

### Automatic Versioning
## Logging

There is a logging module here that defines the base logging format used for all projects, to use it add:

```python

This codebase is set up using [autosemver](https://autosemver.readthedocs.io/en/latest/usage.html#) a tool that uses git commit history to calculate the package version. Each time you make a commit, it increments the patch version by 1. You can increment by:
from driutils import logger

logger.setup_logging()
```

* Normal commit. Use for bugfixes and small updates
* Increments patch version: `x.x.5 -> x.x.6`
* Commit starts with `* NEW:`. Use for new features
* Increments minor version `x.1.x -> x.2.x`
* Commit starts with `* INCOMPATIBLE:`. Use for API breaking changes
* Increments major version `2.x.x -> 3.x.x`
## Datetime Utilities

### Docker and the ECR
The module `driutils.datetime` contains common utilities for working with dates and times in Python. The methods within are currently simple validation methods. Some of the methods require additional packages that are not needed for all projects, so ensure that the package is installed as `pip install .[datetime]` or `pip install .[all]`

The python code is packaged into a docker image and pushed to the AWS ECR. For the deployment to succeed you must:
## General Utilities

* Add 2 secrets to the GitHub Actions:
* AWS_REGION: \<our-region\>
* AWS_ROLE_ARN: \<the-IAM-role-used-to-deploy\>
* Add a repository to the ECR with the same name as the GitHub repo

The module `driutils.utils` contains utility methods that didn't fit anywhere else and includes things such as ensuring that a list is always returned and removing protocols from URLs.
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ help:
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

apidoc:
@sphinx-apidoc -f ../src/driio -o sources
@sphinx-apidoc -f ../src/driutils -o sources
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

import driio
import driutils

project = 'My Project'
copyright = '2024, UKCEH'
author = 'UKCEH'

release = driio.__version__
release = driutils.__version__
version = release

# -- General configuration ---------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/modules.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
driio
driutils
=========

.. toctree::
:maxdepth: 4

driio
driutils
8 changes: 4 additions & 4 deletions docs/sources/mypackage.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
driio package
driutils package
=================

Submodules
----------

driio.module module
driutils.module module
-----------------------

.. automodule:: driio.module
.. automodule:: driutils.module
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: driio
.. automodule:: driutils
:members:
:undoc-members:
:show-inheritance:
23 changes: 16 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,15 @@ requires = ["setuptools >= 61.0", "autosemver"]

[project]
requires-python = ">=3.12"
dependencies = ["autosemver"]
name = "dri-io"
dependencies = [
"autosemver",
"duckdb",
"boto3",
"mypy_boto3_s3",
"moto",
"polars",
]
name = "dri-utils"
dynamic = ["version"]
authors = [{ name = "John Doe", email = "johdoe@ceh.ac.uk" }]
description = "A minimal setup for a template package."
Expand All @@ -14,19 +21,21 @@ description = "A minimal setup for a template package."
test = ["pytest", "pytest-cov", "parameterized"]
docs = ["sphinx", "sphinx-copybutton", "sphinx-rtd-theme"]
lint = ["ruff"]
dev = ["driio[test,docs,lint]"]
datetime = ["isodate"]
all = ["dri-utils[datetime]"]
dev = ["dri-utils[all,test,docs,lint]"]

[tool.setuptools.dynamic]
version = { attr = "driio.__version__" }
version = { attr = "driutils.__version__" }


[tool.setuptools.packages.find]
where = ["src"]
include = ["driio*"]
include = ["driutils*"]

[tool.pytest.ini_options]

addopts = "--cov=driio"
addopts = "--cov=driutils --cov-report term-missing"
markers = ["slow: Marks slow tests"]

filterwarnings = [
Expand All @@ -35,7 +44,7 @@ filterwarnings = [
]

[tool.coverage.run]
omit = ["*__init__.py"]
omit = ["*__init__.py", "**/logger.py"]

[tool.ruff]
src = ["src", "tests"]
Expand Down
3 changes: 0 additions & 3 deletions src/driio/__main__.py

This file was deleted.

15 changes: 0 additions & 15 deletions src/driio/module.py

This file was deleted.

2 changes: 1 addition & 1 deletion src/driio/__init__.py → src/driutils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import autosemver

try:
__version__ = autosemver.packaging.get_current_version(project_name="driio")
__version__ = autosemver.packaging.get_current_version(project_name="driutils")
except Exception:
__version__ = "0.0.0"
Loading