Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add standardized test approach to evaluate stream output against expectations #257

Closed
MeltyBot opened this issue Oct 26, 2021 · 2 comments
Closed

Comments

@MeltyBot
Copy link
Contributor

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/259

Originally created by @stkbailey on 2021-10-26 10:45:25


Summary

This request covers the ability to write integration tests at the stream level for taps. Example tests a developer may want to create are:

  • Stream returns at least one record.
  • All discovered stream schema keys are available in the returned records.
  • All live record schema keys are recorded in the discovered stream.
  • All primary keys in Stream A also exist in Stream B, Column X.

Proposed benefits

An endorsed approach to testing streams will allow developers to easily implement test-driven development practices as well as increase the quality of taps overall.

Proposal details

I recently added some testing to tap-slack that might be worth refining/abstracting for the SDK. The approach was this:

  1. In a Pytest fixture, perform a full tap sync with the sample config.
  2. Read stdout and parse the records into an array. Then group the records by TYPE and STREAM.
  3. Create a generic set of tests that can be applied on a stream basis: at least one record returned, catalog schema keys are in the record schema and vice versa.
  4. Apply the generic tests for each stream, passing in the parsed full sync results.

This approach allowed me to catch several schema mismatches and a few critical issues related to the state partitioning keys mentioned above.

https://github.com/MeltanoLabs/tap-slack/blob/7892c39667f7817e426ee025d2c52622568c38d6/tests/test_streams.py#L27

Best reasons not to build

I don't think adding a feature like this would negatively affect existing taps, as the tests could be added "a la carte" by developers. However, I do think there is a risk in adding a test suite that is prone to taking up a long time or prone to error. For example, the approach outlined above works when there is a very small data volume but would not work on large taps. So finding ways to control execution time in particular is very important.

@MeltyBot
Copy link
Contributor Author

@kgpayne
Copy link
Contributor

kgpayne commented Jan 30, 2023

Closed by #1171

@kgpayne kgpayne closed this as completed Jan 30, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants