Skip to content

Apply Optimize Java 8 Streams refactoring #786

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 3 commits into from

Conversation

khatchad
Copy link

@khatchad khatchad commented Aug 20, 2018

Apply Optimize Java 8 Streams Refactoring

Description

This is a semantics-preserving automated refactoring that attempts to optimize code using Java 8 streams when it is safe and possibly advantageous to do so. It may make certain streams parallel, others sequential, and unorder streams where necessarily. The tool does not add new functionality; it only rearranges existing code in an effort to increase its performance. Your program's results should be the same before and after the refactoring.

Details

  • We are evaluating a research prototype automated refactoring Eclipse plug-in called Optimize Java 8 Streams Refactoring. We have applied the tool to your project in the hopes of receiving feedback.
  • The approach is very conservative. That may mean that not all changes that can be made were made. Please feel free to continue the refactoring manually if you wish.
  • The source code should be semantically equivalent to the original.
  • There were some manual change reversals due to parallelization of logging statements.
  • The tool does not assess overhead vs. the amount of data processed; it only performs the refactoring when it is safe and possibly advantageous to do so. However, we ran several performance tests on the refactored code (see below).

Performance Evaluation

We did not find any dedicated performance tests in your repository. But, we did find many unit tests. We converted the unit tests involved in our transformation to performance tests by integrating the JMH framework. We also increased the dataset size, since unit tests are normally meant to be quick, to assess the increased parallelism. The results show an average speedup of ~3.92 (see below). Column avgt is the average run time in seconds per operation and the standard deviation is in parenthesis:

benchmark avgt (s/op)    
  original refactored speedup
com.iluwatar.abstractdocument.AbstractDocumentTest.shouldRetrieveChildren 0.129 (0.007) 0.016 (0.000) 8.01
com.iluwatar.abstractdocument.DomainTest.shouldConstructCar 0.106 (0.002) 0.014 (0.000) 7.81
com.iluwatar.dao.ExistingCustomer.addingShouldResultInFailureAndNotAffectExistingCustomers 0.014 (0.000) 0.004 (0.000) 3.78
com.iluwatar.dao.ExistingCustomer.deletionShouldBeSuccessAndCustomerShouldBeNonAccessible 0.013 (0.000) 0.003 (0.000) 3.82
com.iluwatar.dao.NonExistingCustomer.addingShouldResultInSuccess 0.027 (0.000) 0.005 (0.000) 5.08
com.iluwatar.dao.NonExistingCustomer.deletionShouldBeFailureAndNotAffectExistingCustomers 0.014 (0.000) 0.004 (0.000) 3.90
com.iluwatar.specification.app.AppTest.test 12.666 (5.961) 12.258 (1.880) 1.03
com.iluwatar.threadpool.CoffeeMakingTaskTest.testIdGeneration 0.681 (0.065) 0.469 (0.009) 1.45
com.iluwatar.threadpool.PotatoPeelingTaskTest.testIdGeneration 0.676 (0.062) 0.465 (0.008) 1.45

In most cases, we expanded the dataset to operation on 1,000,000 objects. FYI, you may find the patch we used to convert the unit tests to performance tests here. We had to move somethings around to make JMH work correctly (e.g., converting inner classes to outer ones). We also moved some of the setup code to its own method for profiling purposes.

Feedback

Thank you for your help in this evaluation! Any feedback you can provide would be very helpful. In particular, we are interested if each of the proposed changes are helpful or not.

@iluwatar
Copy link
Owner

Thanks for the pull request. I can see parallelStream has been used to replace stream to improve performance. However, performance is not the point of the examples. The point it is to show correct code implemented with industry's best practices. In my opinion the sequential stream should be the default choice and only if there are performance problems should the parallelStream be considered.
Good job by the tool however. Seems to be working fine! 👍

@khatchad
Copy link
Author

khatchad commented Oct 17, 2018 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants