Read data from Google BigQuery with Dask.
This package uses the BigQuery Storage API. Please refer to the data extraction # table for associated costs while using Dask-BigQuery.
dask-bigquery
can be installed with pip
:
pip install dask-bigquery
or with conda
:
conda install -c conda-forge dask-bigquery
dask-bigquery
assumes that you are already authenticated.
import dask_bigquery
ddf = dask_bigquery.read_gbq(
project_id="your_project_id",
dataset_id="your_dataset",
table_id="your_table",
)
ddf.head()
To run the tests locally you need to be authenticated and have a project created on that account. If you're using a service account, when created you need to select the role of "BigQuery Admin" in the section "Grant this service account access to project".
You can run the tests with
$ pytest dask_bigquery
if your default gcloud
project is set, or manually specify the project ID with
DASK_BIGQUERY_PROJECT_ID pytest dask_bigquery
This project stems from the discussion in this Dask issue and this initial implementation developed by Brett Naul, Jacob Hayes, and Steven Soojin Kim.