Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Python SDK get_historical_features does not use field mappings. #2248

Closed
michelle-rascati-sp opened this issue Jan 27, 2022 · 1 comment · Fixed by #2252
Closed

Python SDK get_historical_features does not use field mappings. #2248

michelle-rascati-sp opened this issue Jan 27, 2022 · 1 comment · Fixed by #2252

Comments

@michelle-rascati-sp
Copy link
Contributor

Expected Behavior

When setting a field mapping for offline data sources such as {"column_name": "feature_name"}, I would expect to call get_historical_features(features=["feature_name"]) and get back a dataframe with this feature_name as a column.

Current Behavior

  • File data source: works as expected.
  • Bigquery data source: google.api_core.exceptions.BadRequest: 400 Unrecognized name: feature_name at [581:13]
  • Redshfit data source: Redshift SQL Query failed to finish. Details: ... 'Error': 'ERROR: column "feature_name" does not exist

Steps to reproduce

Within the fraud detection tutorial, update the fraud_features.py to use a field mapping in the user_transaction_count_7d feature view:

driver_stats_fv = FeatureView(
    name="user_transaction_count_7d",
    entities=["user_id"],
    ttl=timedelta(weeks=1),
    batch_source=BigQuerySource(
        table_ref=f"{PROJECT_ID}.{BIGQUERY_DATASET_NAME}.user_count_transactions_7d",
        event_timestamp_column="feature_timestamp",
        field_mapping={{"transaction_count_7d": "transaction_count_7d_fm"}}))

when calling get_historical_features you get an error that this column doesn't exist.

training_data = store.get_historical_features(
    entity_df=f"""
    select 
        src_account as user_id,
        timestamp,
        is_fraud
    from
        feast-oss.fraud_tutorial.transactions
    where
        timestamp between timestamp('{two_days_ago.isoformat()}') 
        and timestamp('{now.isoformat()}')""",
    features=[
        "user_transaction_count_7d:transaction_count_7d_fm",
        "user_account_features:credit_score",
        "user_account_features:account_age_days",
        "user_account_features:user_has_2fa_installed",
        "user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
    ],
    full_feature_names=True
).to_df()

training_data.head()

> BadRequest: 400 Unrecognized name: transaction_count_7d_fm; Did you mean transaction_count_7d? at [77:13]

Note, the materialize step handles the field mapping appropriately, and get_online_features works as expected.

feature_vector = store.get_online_features(
        features=[
        "user_transaction_count_7d:transaction_count_7d_fm",
        "user_account_features:credit_score",
        "user_account_features:account_age_days",
        "user_account_features:user_has_2fa_installed",
        "user_has_fraudulent_transactions:user_has_fraudulent_transactions_7d"
    ],
        entity_rows=entity_rows
    ).to_dict()

> {'credit_score': [480], 'account_age_days': [655], 'user_has_2fa_installed': [1], 'transaction_count_7d_fm': [6], 'user_has_fraudulent_transactions_7d': [0.0]}

Specifications

  • Version: 0.17
  • Platform: Any
  • Subsystem:

Possible Solution

Update data sources to query from the column name and return the mapped feature name.

@michelle-rascati-sp
Copy link
Contributor Author

I think I have a solution. Working on submitting PR.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
2 participants