Add standardized test approach to evaluate stream output against expectations #257

MeltyBot · 2021-10-26T10:45:25Z

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/259

Originally created by @stkbailey on 2021-10-26 10:45:25

Summary

This request covers the ability to write integration tests at the stream level for taps. Example tests a developer may want to create are:

Stream returns at least one record.
All discovered stream schema keys are available in the returned records.
All live record schema keys are recorded in the discovered stream.
All primary keys in Stream A also exist in Stream B, Column X.

Proposed benefits

An endorsed approach to testing streams will allow developers to easily implement test-driven development practices as well as increase the quality of taps overall.

Proposal details

I recently added some testing to tap-slack that might be worth refining/abstracting for the SDK. The approach was this:

In a Pytest fixture, perform a full tap sync with the sample config.
Read stdout and parse the records into an array. Then group the records by TYPE and STREAM.
Create a generic set of tests that can be applied on a stream basis: at least one record returned, catalog schema keys are in the record schema and vice versa.
Apply the generic tests for each stream, passing in the parsed full sync results.

This approach allowed me to catch several schema mismatches and a few critical issues related to the state partitioning keys mentioned above.

https://github.com/MeltanoLabs/tap-slack/blob/7892c39667f7817e426ee025d2c52622568c38d6/tests/test_streams.py#L27

Best reasons not to build

I don't think adding a feature like this would negatively affect existing taps, as the tests could be added "a la carte" by developers. However, I do think there is a risk in adding a test suite that is prone to taking up a long time or prone to error. For example, the approach outlined above works when there is a very small data volume but would not work on large taps. So finding ways to control execution time in particular is very important.

The text was updated successfully, but these errors were encountered:

MeltyBot · 2022-05-29T23:59:19Z

View 8 previous comments from the original issue on GitLab

kgpayne · 2023-01-30T13:34:18Z

Closed by #1171

MeltyBot assigned edgarrmondragon May 29, 2022

MeltyBot mentioned this issue Jan 13, 2022

Resolve "Create a stream map connector for use in composable pipeline" - [merged] #614

Closed

labelsync-manager bot added the kind/Feature New feature or request label Jun 23, 2022

tayloramurphy removed the kind::feature label Jun 24, 2022

kgpayne closed this as completed Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standardized test approach to evaluate stream output against expectations #257

Add standardized test approach to evaluate stream output against expectations #257

MeltyBot commented Oct 26, 2021

MeltyBot commented May 29, 2022

kgpayne commented Jan 30, 2023

Add standardized test approach to evaluate stream output against expectations #257

Add standardized test approach to evaluate stream output against expectations #257

Comments

MeltyBot commented Oct 26, 2021

Summary

Proposed benefits

Proposal details

Best reasons not to build

MeltyBot commented May 29, 2022

kgpayne commented Jan 30, 2023