This Google Sheets tap produces JSON-formatted data following the Singer spec.
tap-google-sheets
is a Singer tap for the Google Sheets API built with the Meltano Tap SDK for Singer Taps.
A full list of supported settings and capabilities for this tap is available by running:
tap-google-sheets --about
OAuth
At Matatika, we have OAuth support for this tap. This means when you # and use this tap in one of our workspaces you can go through the Google OAuth flow, allowing the Matatika app access to your Google Sheet to sync data on your behalf.
Using the tap this way means you do not have to get any of the following credentials.
Client ID, Client Secret & Refresh Token
To get your google credentials we recommend reading and following the OAuth 2.0 Google API Documentation
The tap calls the following Google APIs, these need to be enabled in Google Cloud Console
Consent for these scopes needs to be supplied in required scopes during OAuth client creation and requested in your authorization flow.
https://www.googleapis.com/auth/spreadsheets.readonly https://www.googleapis.com/auth/drive.readonly
Sheet ID
Your sheet_id
is also required to run --discover
, as running this will build the streams schema based on your google sheet.
When you open your Google sheet, the url will look something like:
https://docs.google.com/spreadsheets/d/abc123/edit#gid=0
Your sheet_id
are the characters after spreadsheets/d/
, so in this case would be abc123
.
Setting | Required | Type | Description |
---|---|---|---|
oauth_credentials.client_id |
Required | String | Your google client id |
oauth_credentials.client_secret |
Required | String | Your google client secret |
oauth_credentials.refresh_token |
Required | String | Your google refresh token |
sheet_id |
Required | String | Your target google sheet id |
output_name |
Optional | String | Optionailly rename the stream and output file or table from the tap |
child_sheet_name |
Optional | String | Optionally choose a different sheet from your Google Sheet file |
key_properties |
Optional | Array of Strings | Optionally choose primary key column(s) from your Google Sheet file. Example: ["column_one", "column_two"] |
sheets |
Optional | Array of Objects | Optionally provide a list of configs for each sheet/stream. See "Per Sheet Config" below. Overrides the sheet_id provided at the root level. |
Setting | Required | Type | Description |
---|---|---|---|
sheet_id |
Required | String | Your target google sheet id |
output_name |
Optional | String | Optionailly rename the stream and output file or table from the tap |
child_sheet_name |
Optional | String | Optionally choose a different sheet from your Google Sheet file |
key_properties |
Optional | Array of Strings | Optionally choose primary key column(s) from your Google Sheet file. Example: ["column_one", "column_two"] |
These settings expand into environment variables of:
TAP_GOOGLE_SHEETS_OAUTH_CREDENTIALS_CLIENT_ID
TAP_GOOGLE_SHEETS_OAUTH_CREDENTIALS_CLIENT_SECRET
TAP_GOOGLE_SHEETS_OAUTH_CREDENTIALS_REFRESH_TOKEN
TAP_GOOGLE_SHEETS_SHEET_ID
TAP_GOOGLE_SHEETS_OUTPUT_NAME
TAP_GOOGLE_SHEETS_CHILD_SHEET_NAME
TAP_GOOGLE_SHEETS_KEY_PROPERTIES
TAP_GOOGLE_SHEETS_SHEETS
-
If you do not provide a
child_sheet_name
, the tap will find the first visible sheet in your Google Sheet and try to sync the data from there. -
You need to provide all the required settings for this tap to run the it. These settings are used to generate the stream and schema for the tap to use from your Google Sheet.
-
Currently the tap supports sheets that have the column name in the first row. (The tap builds a usable json object up by using these column names).
-
The tap will skip all columns without a name. (The tap builds a usable json object up by using these column names).
-
If syncing to a database it will not respect duplicated column names. The last column with the same name will be the only one synced along with its data.
-
The tap will use your Google Sheet's name as output file or table name unless you set an
output_name
. It will replace any spaces with underscores. -
The tap will again replace any spaces in column names with underscores.
-
When using the
key_properties
setting, you must choose columns with no null values. -
You can extract multiple sheets using the
sheets
config, which is just an array containing configurable properties for each item. Doing so will ignore any sheet config defined by the root levelsheet_id
,output_name
,child_sheet_name
,key_properties
properties.
- target-jsonl
- target-csv
- target-postgres transferwise variant
- target-snowflake meltano variant (matatika fork) - Does not like numbers as column names
-
Add setting to optionally allow the selection of a range of data from a sheet. (Add an optional range setting).
-
Improve default behavior of a sheet with multiple columns of the same name and
target-postgres
.
Currently if have duplicate column names, a database will either:
- Throw an error that you can't have two columns with the same name.
- Sync all the data but only persist the data from the last duplicate column into the table.
Use pip to install a release from GitHub.
pip install git+https://github.com/Matatika/tap-google-sheets@vx.x.x
You can easily run tap-google-sheets
by itself or in a pipeline using Meltano.
tap-google-sheets --version
tap-google-sheets --help
tap-google-sheets --config CONFIG --discover > ./catalog.json
Note: to run --discover
you need to have set the required tap settings found here.
pipx install poetry
poetry install
Create tests within the tap_google_sheets/tests
subfolder and
then run:
poetry run pytest
You can also test the tap-google-sheets
CLI interface directly using poetry run
:
poetry run tap-google-sheets --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Your project comes with a custom meltano.yml
project file already created. Open the meltano.yml
and follow any "TODO" items listed in
the file.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-google-sheets
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-google-sheets --version
# OR run a test `elt` pipeline:
meltano elt tap-google-sheets target-jsonl
See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.