Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add Tap Test Runner and Test Template classes #580

Closed
wants to merge 15 commits into from

Conversation

MeltyBot
Copy link
Contributor

@MeltyBot MeltyBot commented May 30, 2022

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/merge_requests/197

Originally created by @stkbailey on 2021-10-27 19:41:01


Note: Content edited 11/13 with updated approach.

This MR addresses #259.

This MR creates new two object types in the SDK: the TapTestRunner and the TestTemplate object. These can be used in conjunction to automatically test taps based on properties of the Tap, Stream, and Stream Attribute. It also updates the utility function get_standard_tap_tests to leverage the test templates, and adds functionality for easily parameterizing pytest (rather looping through a bunch of tests).

TestTemplate

The TestTemplate object provides a standardized interface for building tap-related tests. Each of the tests must have id, name, and required_args properties, and a run_test method. Any number of keyword arguments can be passed in when initializing the tests, but only the required arguments will be added as properties that can be referenced in the run_test() method.

There are currently templates for tap, stream, and attribute level tests.

TapTestRunner

The Test Runner object is a convenience class designed to make it easy to access sync messages and generate "default tests" from the SDK classes. Generally what a user would do is initialize the runner, perform a sync, then generate tests.

Example

In general, users would want to extract some initialized test objects through the get_standard_tap_pytest_parameters function, and then simply run them. This can be used to generate the tap-level, schema-level, or attribute-level tests.

from singer_sdk.samples.sample_tap_countries.countries_tap import SampleTapCountries
from singer_sdk.testing import TapTestRunner

pytest_params = runner.get_standard_tap_pytest_parameters(
    tap_class = SampleTapCountries,
    include_tap_tests=True,
    include_schema_tests=True,
    include_attribute_tests=True,
)


@pytest.mark.parametrize("test_object", **pytest_params)
def test_builtin_tap_tests(test_object):
    test_object.run_test()

The map of this MR is shown below:

  • TapTestRunner class
  • TestTemplate class
  • Default TapTest classes (run_cli, discovery, stream_connections)
  • Default StreamTest classes (catalog_schema_matches_records, record_schema_matches_catalog, returns_records, primary_keys)
  • Default AttributeTest classes
    • is_boolean
    • is_datetime
    • is_integer
    • is_number
    • is_object
    • not_null
    • unique
  • TapRunner tests
  • TapTest class tests
  • [] StreamTest class tests
  • AttributeTest class tests
  • Documentation of all attributes/tests

--- original approach [included for posterity] ---
The initial approach chosen here creates a utility class that will run a tap and capture its output to a couple of fields. The goal is to make it very easy for developers to get access to a set of records from their tap, so that they can evaluate the actual data content against expectations.

An implementation in a real tap might look something like this:

from singer_sdk.samples.sample_tap_countries.countries_tap import SampleTapCountries
from singer_sdk.testing import StreamTestUtility


@pytest.fixture(scope="session")
def st_util():
   stu = StreamTestUtility(SampleTapCountries, config={})
   stu.run_sync()
   yield stu

def test_countries_stream(st_util):
    st_util._test_stream_returns_at_least_one_record("countries")
    st_util._test_stream_catalog_attributes_in_records("countries")
    st_util._test_stream_record_attributes_in_catalog("countries")

def test_stream_continents(it_util):
    st_util._test_stream_returns_at_least_one_record("continents")
    st_util._test_stream_catalog_attributes_in_records("continents")
    st_util._test_stream_record_attributes_in_catalog("continents")

    records = st_util.records["continents"]

    assert all(r["name"] in ["Asia", "North America", ...] for r in records)

I am very open to suggestions on how to implement or refine this. Some question

  • What should the class be named?
  • Are the raw, records, schema_messages, state_messages attributes sufficient?
  • Should there be any consolidation / connection with the get_standard_tap_tests functionality?
  • What other "out of the box" data tests should be included? Maybe test_primary_keys_are_not_null, test_replication_key_is_valid type of things?

@kgpayne
Copy link
Contributor

kgpayne commented Jan 30, 2023

Closed by #1171

Thanks again @stkbailey for your excellent work on this 🙌

@kgpayne kgpayne closed this Jan 30, 2023
@edgarrmondragon edgarrmondragon deleted the add_stream_test_classes branch January 19, 2024 01:39
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants