-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Rollback inflights when using Spark [Streaming] write #660
Conversation
@@ -149,7 +149,7 @@ public static HoodieWriteClient createHoodieClient(JavaSparkContext jssc, String | |||
// override above with Hoodie configs specified as options. | |||
.withProps(parameters).build(); | |||
|
|||
return new HoodieWriteClient<>(jssc, writeConfig); | |||
return new HoodieWriteClient<>(jssc, writeConfig, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a consumer wants to perform write() and read() operations in parallel, we have to turn off rollback for one of the operations. Can we make this configurable for now and then address how multiple write clients created for the same dataset should be handled in the longer term later..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : IIRC, the DataSource read() path does not instantiate HoodieWriteClient. If that is the case, then the change would be consistent between DeltaStreamer and Spark Source. Let me know if this is ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry my bad, I meant write() and clean() operation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the explicit constructor is better here.. user should not have to care about this flag..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with explicit constructor, I'm merely saying not hardcoded to true, instead default=true and driven by a config, in that case if someone is running 2 things in parallel, they can turn one off to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized, there is no way to trigger explicit clean from data sources I guess, so this is fine..
@n3nash @vinothchandar : when working on automatic compactor support for delta-streamer and Data Source, I will revisit the defaults. For now, merging it. |
@n3nash @vinothchandar @ovj : Don't see any reason why inflight rollback should be disabled