feat: add ability to perform comprehensions in stream maps #2002

haleemur · 2023-10-07T09:54:52Z

Feature scope

Taps (catalog, state, stream maps, tests, etc.)

Description

It would be helpful to have a way to parse out specific information from json fields. Currently my team is working on an integration where some information in an array of objects are considered too sensitive to bring into the data warehouse.

For instance, if a field custom_fields in stream test_data contains the json object [{"key": "a", "value": 1}, {"key": "b", "value": 2}, {"key": "c", "value": 10}], and the fields b & c are deemed sensitive and should not be brought into the data warehouse, then the following stream map definition would be ideal.

stream_maps:
    test_data:
        custom_fields: __NULL__
        custom_field_dict: "[x['value'] for x in custom_fields if x['key'] == 'a'][0]"

however this is not currently possible, it fails with the error

singer_sdk.exceptions.MapExpressionError: Failed to evaluate simpleeval expressions [x['value'] for x in custom_fields if x['key'] == 'a'][0]

Adding support for this should be fairly easy. simpleeval supports compound types through the EvalWithCompoundTypes class.

I believe the following changes would have to be made in mapper.py

# add new function
def compound_eval(expr, operators=None, functions=None, names=None):
    s = simpleeval.EvalWithCompoundTypes(operators=operators, functions=functions, names=names)
    return s.eval(expr)

# replace this (328 - 332):
            result: str | int | float = simpleeval.simple_eval(
                expr,
                functions=self.functions,
                names=names,
            )
# with this:
            result: str | int | float = compound_eval(
                expr,
                functions=self.functions,
                names=names,
            )

The text was updated successfully, but these errors were encountered:

edgarrmondragon · 2023-10-07T21:57:25Z

Hey @haleemur! Thanks for logging. Do you know if there are any significant downsides to using the EvalWithCompoundTypes class?

For example, drastic performance penalties?

I'm more confident about catching regressions in common expressions since we have tests for those but the project's currently lacking performance regression tests.

haleemur · 2023-10-08T13:34:50Z

Thank you @edgarrmondragon. I'll take a stab at writing out some performance tests this weekend.

My initial draft turned out a little more complicated than i had assumed.

the current way that type hints are provided (i.e. wrapping the transforming expression in an int()|float() etc). doesn't work when the transformed value is None, because int(None)|bool(None)|float(None) will throw a type error.

This makes it unwieldy to type out complex expressions (makes us increase complexity), which harm programmer ergonomics as well as performance.

for example:
int(dict([(x["id"], x["value"]) for x in custom_fields]).get(2)) will fail if the resulting dictionary does not have a key called 2 or it is explicitly set to None, and we have to use a ternary statement to get around this limitation.

int(dict([(x["id"], x["value"]) for x in custom_fields]).get(2)) if dict([(x["id"], x["value"]) for x in custom_fields]).get(2) else None

It would be nice to instead get the programmer to specify the resulting type explicitly in the tap config.

haleemur · 2023-10-08T13:41:07Z

Was reading the simple-eval docs, and it is mentioned

One useful feature of using the SimpleEval object is that you can parse an expression once, and then evaluate it mulitple times using different names

Meltano SDK currently does not do this, and I think this optimization is going to benefit stream map performance immensely.

edgarrmondragon · 2023-10-09T21:02:34Z

It would be nice to instead get the programmer to specify the resulting type explicitly in the tap config.

@haleemur agreed! I've logged #2010

haleemur added kind/Feature New feature or request valuestream/SDK labels Oct 7, 2023

edgarrmondragon mentioned this issue Oct 7, 2023

feat(taps): add ability to do list comprehensions in stream map expressions #2003

Merged

haleemur mentioned this issue Oct 10, 2023

test(taps): Add benchmark test on map transforms via pytest-benchmark #2011

Merged

edgarrmondragon closed this as completed in #2003 Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ability to perform comprehensions in stream maps #2002

feat: add ability to perform comprehensions in stream maps #2002

haleemur commented Oct 7, 2023

edgarrmondragon commented Oct 7, 2023

haleemur commented Oct 8, 2023

haleemur commented Oct 8, 2023

edgarrmondragon commented Oct 9, 2023

feat: add ability to perform comprehensions in stream maps #2002

feat: add ability to perform comprehensions in stream maps #2002

Comments

haleemur commented Oct 7, 2023

Feature scope

Description

edgarrmondragon commented Oct 7, 2023

haleemur commented Oct 8, 2023

haleemur commented Oct 8, 2023

edgarrmondragon commented Oct 9, 2023