Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Feature/sc 377004/add raster loader support for snowflake #127

Merged
Merged
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ init:
[ -d $(VENV) ] || python3 -m venv $(VENV)
$(BIN)/pip install -r requirements-dev.txt
$(BIN)/pre-commit install
$(BIN)/pip install -e .
$(BIN)/pip install -e .[snowflake,bigquery]

lint:
$(BIN)/black raster_loader setup.py
Expand Down
70 changes: 55 additions & 15 deletions docs/source/user_guide/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,64 @@ Most functions of the Raster Loader are accessible through the carto
command-line interface (CLI). To start the CLI, use the ``carto`` command in a
terminal.

Currently, Raster Loader allows you to upload a local raster file to a BigQuery table.
You can also download and inspect a raster file from a BigQuery table.
Currently, Raster Loader allows you to upload a local raster file to a BigQuery or Snowflake table.
You can also download and inspect a raster file from a BigQuery or Snowflake table.


Using the Raster Loader with BigQuery
-----------------------------------------

Before you can upload a raster file, you need to have set up the following in
BigQuery:

#. A `GCP project`_
#. A `BigQuery dataset`_

To use the bigquery utilities, use the ``carto bigquery`` command. This command has
several subcommands, which are described below.

.. note::

Accessing BigQuery with Raster Loader requires the ``GOOGLE_APPLICATION_CREDENTIALS``
environment variable to be set to the path of a JSON file containing your BigQuery
credentials. See the `GCP documentation`_ for more information.

Uploading to BigQuery
---------------------

To upload a raster file to a BigQuery table, use the ``carto bigquery upload`` command.
Using the Raster Loader with Snowflake
-----------------------------------------

Before you can upload a raster file, you need to have set up the following in
BigQuery:
Snowflake:

#. A `GCP project`_
#. A `BigQuery dataset`_
#. A Snowflake account
#. A Snowflake database
#. A Snowflake schema

To use the snowflake utilities, use the ``carto snowflake`` command. This command has
several subcommands, which are described below.

Uploading a raster layer
------------------------

To upload a raster file, use the ``carto [bigquery|snowflake] upload`` command.

The input raster must be a ``GoogleMapsCompatible`` raster. You can make your raster compatible
by converting it with the following GDAL command:

.. code-block:: bash

gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=NONE -co ADD_ALPHA=NO -co RESAMPLING=NEAREST <input_raster>.tif <output_raster>.tif
gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=NONE -co ADD_ALPHA=NO -co RESAMPLING=NEAREST -co BLOCKSIZE=512 <input_raster>.tif <output_raster>.tif

You have the option to also set up a `BigQuery table`_ and use this table to upload
You have the option to also set up a table in your provider and use this table to upload
your data to. In case you do not specify a table name, Raster Loader will automatically
generate a table name for you and create that table.

At a minimum, the ``carto bigquery upload`` command requires a ``file_path`` to a local
At a minimum, the ``carto upload`` command requires a ``file_path`` to a local
raster file that can be `read by GDAL`_ and processed with `rasterio`_. It also requires
the ``project`` (the GCP project name) and ``dataset`` (the BigQuery dataset name)
parameters. There are also additional parameters, such as ``table`` (BigQuery table
parameters in the case of Bigquery, or the ``database`` and ``schema`` parameters in the
case of Snowflake.

There are also additional parameters, such as ``table`` (table
name) and ``overwrite`` (to overwrite existing data). For example:

.. code-block:: bash
Expand All @@ -58,6 +81,23 @@ project named ``my-gcp-project``, a dataset named ``my-bigquery-dataset``, and a
named ``my-bigquery-table``. If the table already contains data, this data will be
overwritten because the ``--overwrite`` flag is set.

The same operation, performed with Snowflake, would be:

.. code-block:: bash

carto snowflake upload \
--file_path /path/to/my/raster/file.tif \
--database my-snowflake-database \
--schema my-snowflake-schema \
--table my-snowflake-table \
--account my-snowflake-account \
--username my-snowflake-user \
--password my-snowflake-password \
--overwrite

Authentication parameters are explicitly required in this case for Snowflake, since they
are not set up in the environment.

If no band is specified, the first band of the raster will be uploaded. If the
``--band`` flag is set, the specified band will be uploaded. For example, the following
command uploads the second band of the raster:
Expand Down Expand Up @@ -139,11 +179,11 @@ of 2000 rows:



Inspecting a raster file on BigQuery
Inspecting a raster file
------------------------------------

You can also use Raster Loader to retrieve information about a raster file stored in a
BigQuery table. This can be useful to make sure a raster file was transferred correctly
BigQuery or Snowflake table. This can be useful to make sure a raster file was transferred correctly
or to get information about a raster file's metadata, for example.

To access a raster file in a BigQuery table, use the ``carto bigquery describe`` command.
Expand Down
13 changes: 11 additions & 2 deletions docs/source/user_guide/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,30 @@ Raster Loader is available on PyPI_ and can be installed with pip_:

.. code-block:: bash

pip install raster-loader
pip install raster-loader[all]

To install from source:

.. code-block:: bash

git clone https://github.com/cartodb/raster-loader
cd raster-loader
pip install .
pip install .[all]

.. tip::

In most cases, it is recommended to install Raster Loader in a virtual environment.
Use venv_ to create and manage your virtual environment.

The above will install the dependencies required to work with both Snowflake and
BigQuery. In case you only want to work with one of them, you can install the
dependencies for each of them separately:

.. code-block:: bash

pip install raster-loader[snowflake]
pip install raster-loader[bigquery]

After installing the Raster Loader package, you will have access to the
:ref:`carto CLI <cli>`. To make sure the installation was successful, run the
following command in your terminal:
Expand Down
58 changes: 38 additions & 20 deletions docs/source/user_guide/use_with_python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,54 @@
Usage with Python projects
==========================

After installing Raster Loader, you can import the package into your Python project. For
example:
After installing Raster Loader, you can use it in your Python project.

First, import the corresponding connection class from the ``raster_loader`` package.
For Snowflake, use ``SnowflakeConnection``:

.. code-block:: python

from raster_loader import rasterio_to_bigquery, bigquery_to_records
from raster_loader import SnowflakeConnection

Uploading a raster file to BigQuery
-----------------------------------
For BigQuery, use ``BigQueryConnection``:

.. code-block:: python

from raster_loader import BigQueryConnection

Then, create a connection object with the appropriate parameters.

For Snowflake:

.. code-block:: python

Currently, Raster Loader allows you to upload a local raster file to an existing
BigQuery table using the :func:`~raster_loader.rasterio_to_bigquery` function.
connection = SnowflakeConnection('my-user', 'my-password', 'my-account', 'my-database', 'my-schema')

For BigQuery:

.. code-block:: python

connection = BigQueryConnection('my-project')

.. note::

Accessing BigQuery with Raster Loader requires the ``GOOGLE_APPLICATION_CREDENTIALS``
environment variable to be set to the path of a JSON file containing your BigQuery
credentials. See the `GCP documentation`_ for more information.

Uploading a raster file to BigQuery
-----------------------------------

To upload a raster file, use the ``upload_raster`` function


For example:

.. code-block:: python

rasterio_to_bigquery(
connector.upload_raster(
file_path = 'path/to/raster.tif',
project_id = 'my-project',
dataset_id = 'my_dataset',
table_id = 'my_table',
fqn = 'database.schema.tablename',
)

This function returns `True` if the upload was successful.
Expand All @@ -40,22 +60,20 @@ by converting it with the following GDAL command:

.. code-block:: bash

gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=NONE -co ADD_ALPHA=NO -co RESAMPLING=NEAREST <input_raster>.tif <output_raster>.tif
gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=NONE -co ADD_ALPHA=NO -co RESAMPLING=NEAREST -co BLOCKSIZE=512 <input_raster>.tif <output_raster>.tif

Inspecting a raster file on BigQuery
------------------------------------
Inspecting a raster file
------------------------

You can also access and inspect a raster file located in a BigQuery table using the
:func:`~raster_loader.bigquery_to_records` function.
You can also access and inspect a raster file located in a BigQuery or Snowflake table using the
:func:`get_records` function.

For example:

.. code-block:: python

records_df = bigquery_to_records(
project_id = 'my-project',
dataset_id = 'my_dataset',
table_id = 'my_table',
records = connector.get_records(
fqn = 'database.schema.tablename',
)

This function returns a DataFrame with some samples from the raster table on BigQuery
Expand Down
12 changes: 7 additions & 5 deletions raster_loader/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
from raster_loader._version import __version__

from raster_loader.io import (
rasterio_to_bigquery,
bigquery_to_records,
from raster_loader.io.bigquery import (
BigQueryConnection,
)
from raster_loader.io.snowflake import (
SnowflakeConnection,
)

__all__ = [
"__version__",
"rasterio_to_bigquery",
"bigquery_to_records",
"BigQueryConnection",
"SnowflakeConnection",
]
44 changes: 14 additions & 30 deletions raster_loader/cli/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,7 @@
import click
from functools import wraps, partial

try:
import google.cloud.bigquery
except ImportError: # pragma: no cover
_has_bigquery = False
else:
_has_bigquery = True
from raster_loader.io.bigquery import BigQueryConnection


def catch_exception(func=None, *, handle=Exception):
Expand Down Expand Up @@ -69,7 +64,6 @@ def bigquery(args=None):
default=False,
is_flag=True,
)
@click.option("--test", help="Use Mock BigQuery Client", default=False, is_flag=True)
@catch_exception()
def upload(
file_path,
Expand All @@ -81,14 +75,12 @@ def upload(
chunk_size,
overwrite=False,
append=False,
test=False,
):
from raster_loader.tests.mocks import bigquery_client
from raster_loader.io import import_error_bigquery
from raster_loader.io import rasterio_to_bigquery
from raster_loader.io import get_number_of_blocks
from raster_loader.io import print_band_information
from raster_loader.io import get_block_dims
from raster_loader.io.common import (
get_number_of_blocks,
print_band_information,
get_block_dims,
)

# check that band and band_name are the same length
# if band_name provided
Expand All @@ -106,14 +98,7 @@ def upload(
table = os.path.basename(file_path).split(".")[0]
table = "_".join([table, "band", str(band), str(uuid.uuid4())])

# swap out BigQuery client for testing purposes
if test:
client = bigquery_client()
else: # pragma: no cover
"""Requires bigquery."""
if not _has_bigquery: # pragma: no cover
import_error_bigquery()
client = google.cloud.bigquery.Client(project=project)
connector = BigQueryConnection(project)

# introspect raster file
num_blocks = get_number_of_blocks(file_path)
Expand All @@ -134,14 +119,12 @@ def upload(

click.echo("Uploading Raster to BigQuery")

rasterio_to_bigquery(
fqn = f"{project}.{dataset}.{table}"
connector.upload_raster(
file_path,
table,
dataset,
project,
fqn,
bands_info,
chunk_size,
client=client,
overwrite=overwrite,
append=append,
)
Expand All @@ -156,10 +139,11 @@ def upload(
@click.option("--table", help="The name of the table.", required=True)
@click.option("--limit", help="Limit number of rows returned", default=10)
def describe(project, dataset, table, limit):
from raster_loader.io import bigquery_to_records
connector = BigQueryConnection(project)

df = bigquery_to_records(table, dataset, project, limit)
print(f"Table: {project}.{dataset}.{table}")
fqn = f"{project}.{dataset}.{table}"
df = connector.get_records(fqn, limit)
print(f"Table: {fqn}")
print(f"Number of rows: {len(df)}")
print(f"Number of columns: {len(df.columns)}")
print(f"Column names: {df.columns}")
Expand Down
Loading
Loading