Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rollback inflights when using Spark [Streaming] write #660

Merged
merged 1 commit into from
May 2, 2019

Conversation

bvaradar
Copy link
Contributor

@bvaradar bvaradar commented May 2, 2019

@n3nash @vinothchandar @ovj : Don't see any reason why inflight rollback should be disabled

@bvaradar bvaradar requested review from vinothchandar and n3nash May 2, 2019 00:22
@@ -149,7 +149,7 @@ public static HoodieWriteClient createHoodieClient(JavaSparkContext jssc, String
// override above with Hoodie configs specified as options.
.withProps(parameters).build();

return new HoodieWriteClient<>(jssc, writeConfig);
return new HoodieWriteClient<>(jssc, writeConfig, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a consumer wants to perform write() and read() operations in parallel, we have to turn off rollback for one of the operations. Can we make this configurable for now and then address how multiple write clients created for the same dataset should be handled in the longer term later..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : IIRC, the DataSource read() path does not instantiate HoodieWriteClient. If that is the case, then the change would be consistent between DeltaStreamer and Spark Source. Let me know if this is ok ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my bad, I meant write() and clean() operation..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the explicit constructor is better here.. user should not have to care about this flag..

Copy link
Contributor

@n3nash n3nash May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with explicit constructor, I'm merely saying not hardcoded to true, instead default=true and driven by a config, in that case if someone is running 2 things in parallel, they can turn one off to false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized, there is no way to trigger explicit clean from data sources I guess, so this is fine..

@n3nash n3nash self-requested a review May 2, 2019 19:47
@bvaradar
Copy link
Contributor Author

bvaradar commented May 2, 2019

@n3nash @vinothchandar : when working on automatic compactor support for delta-streamer and Data Source, I will revisit the defaults. For now, merging it.

@bvaradar bvaradar merged commit 978470a into apache:master May 2, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants