-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Go packages in protos use incorrect repo #16
Comments
Resolved. |
Yanson
pushed a commit
to Yanson/feast
that referenced
this issue
Jul 29, 2020
…-ingestion Closes KE-609 - read data from Kafka Closes KE-636 - write data into ADLS Gen2 Closes KE-655 - write data into Redis Added a Spark ingestion job reading from Kafka into Delta Lake storage and Redis. Created an integration test running against local Kafka, Redis and Spark and ingesting 128 random features into Redis and 128 random features into Delta, with different data types, and checking the result. As Spark only runs on Java 8, the Integration test is skipped in the CI build (that runs under Java 11), but is run in the e2e tests, when the emulator container is built (including the ingestion jar). Delta tables are automatically created and partitioned by day. ```bash # run SparkIngestionTest.java $ ls /var/folders/67/59_hhx6d5lz0wg__x35g8wbw0000gn/T/junit9193296653579758585/bXlwcm9qZWN0/bXlwcm9qZWN0L2ZlYXR1cmVfc2V0X2Zvcl9kZWx0YQ==/event_timestamp_day=2020-06-08/part-00002-abd1fbe1-c773-4d83-972d-5193c75885e5.c000.snappy.parquet /var/folders/67/59_hhx6d5lz0wg__x35g8wbw0000gn/T/junit9193296653579758585/bXlwcm9qZWN0/bXlwcm9qZWN0L2ZlYXR1cmVfc2V0X2Zvcl9kZWx0YQ==/event_timestamp_day=2020-06-08/part-00004-440cca76-fdc1-437a-93f2-d6739296cfe4.c000.snappy.parquet /var/folders/67/59_hhx6d5lz0wg__x35g8wbw0000gn/T/junit9193296653579758585/bXlwcm9qZWN0/bXlwcm9qZWN0L2ZlYXR1cmVfc2V0X2Zvcl9kZWx0YQ==/event_timestamp_day=2020-06-08/part-00005-bb670850-3001-493b-9000-30b9b41f81e6.c000.snappy.parquet ... # bXlwcm9qZWN0 is base64 for "myproject" # bXlwcm9...9kZWx0YQ== is base64 for "myproject/feature_set_for_delta" ``` ```python >>> import pandas as pd >>> df=pd.read_parquet("/var/folders/67/59_hhx6d5lz0wg__x35g8wbw0000gn/T/junit9193296653579758585/bXlwcm9qZWN0/bXlwcm9qZWN0L2ZlYXR1cmVfc2V0X2Zvcl9kZWx0YQ==") >>> df.iloc[1] event_timestamp 2020-06-08 07:47:04.931000 created_timestamp 2020-06-08 07:47:43.915000 ingestion_id testjob entity_id_primary -2101962939 entity_id_secondary iAkKJCqcry6NTS4 f_BYTES b'Smts8audYXVOYTw' f_STRING gK1chlMFi2Btbdd f_INT32 -951335350 f_INT64 3659296699309908912 f_DOUBLE 0.697594 f_FLOAT 0.879062 f_BOOL True f_STRING_LIST [Xx92gwGy2PUQCLl] f_INT32_LIST [-1400465827] f_INT64_LIST [6251409099521094342] f_DOUBLE_LIST [0.11672805668860675] f_FLOAT_LIST [0.6432213] f_BOOL_LIST [False] event_timestamp_day 2020-06-08 Name: 1, dtype: object ``` The spark-ingestion uses several classes copied and adapted from feast-ingestion and feast-storage-connector-redis. To reduce merge conflicts downstream, I've kept those classes as close as possible to the original. When we approach PR submission into public Feast, we can work on creating shared projects for both ingestion modes. A few data types have been disabled in the tests as they give a difference when checking for equality, although based on manual inspection they seem ok. Need to debug later on.
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
The go packages in the protos are still pointing at gojektech.
option go_package = "github.com/gojektech/..."
Should be:
option go_package = "github.com/gojek/..."
The text was updated successfully, but these errors were encountered: