Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ENH: Implement Multivariate Rejection Sampling (MRS) #738

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

Lucas-Prates
Copy link
Contributor

@Lucas-Prates Lucas-Prates commented Nov 28, 2024

Pull request type

  • Code changes (bugfix, features)

Checklist

  • Unit tests for the changes have been added
  • Integration tests for the changes have been added
  • Docs have been reviewed and added / updated
  • Lint (black rocketpy/ tests/) has passed locally
  • All tests (pytest tests -m slow --runslow) have passed locally
  • CHANGELOG.md has been updated (if relevant)
  • RST documentation
  • Monte Carlo comparison Feature

New behavior

This PR implements the MRS requested in #162 and described in RocketPy paper.

Breaking change

  • Yes (perhaps)

Additional information

@Lucas-Prates Lucas-Prates requested a review from a team as a code owner November 28, 2024 21:36
@Lucas-Prates Lucas-Prates added Enhancement New feature or request, including adjustments in current codes Monte Carlo Monte Carlo and related contents labels Nov 28, 2024
@Lucas-Prates Lucas-Prates linked an issue Nov 28, 2024 that may be closed by this pull request
@Lucas-Prates Lucas-Prates marked this pull request as draft November 28, 2024 21:38
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 81.77778% with 41 lines in your changes missing coverage. Please review.

Project coverage is 79.18%. Comparing base (4df0b38) to head (d335d26).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
rocketpy/plots/monte_carlo_plots.py 77.38% 19 Missing ⚠️
rocketpy/prints/monte_carlo_prints.py 57.69% 11 Missing ⚠️
...ketpy/simulation/multivariate_rejection_sampler.py 89.81% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #738      +/-   ##
===========================================
+ Coverage    79.11%   79.18%   +0.06%     
===========================================
  Files           96       97       +1     
  Lines        11575    11798     +223     
===========================================
+ Hits          9158     9342     +184     
- Misses        2417     2456      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Lucas-Prates
Copy link
Contributor Author

This PR is ready for a first "Design Review." I would like to get your opinion if this implementation provides what you think on how the user should use the MRS.

Albeit a implementation as a function seems natural, I implemented as a class because:

  1. the function would be humongous;
  2. if the user wants to resample several times, the data monte carlo data is only read once from the harddrive.

It currently works as follows:

  1. Input: monte carlo filepath prefix, mrs filepath prefix, distribution dictionary;
  2. Load input and output data from a monte carlo simulation into memory (python objects - lists of jsons);
  3. To avoid having to read data twice, while loading, precompute some important properties required in the sampler algorithm;
  4. Select and save iteratively accepted samples;
  5. Output: files are saved in the same "scheme" as the MonteCarlo simulation.

I provided a quick and dirty notebook, which will be removed, just to show how the class is being used at the moment.

Copy link
Member

@Gui-FernandesBR Gui-FernandesBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much time so I'll be short

  • The class dies after you sample the data once, this makes it pointless to have a class
  • Instead I'd remove the sample dictionary from arguments.
  • First read the data during initialization. Then you use a function to set the variables you are going to allow be varied. This way u can already anticipate which variables may be varied.
  • We want almost instant results for a MonteCarlo simulation after using MRS.
  • Other thing is that the user must supply the original pdf. Could we possile estimate this from data? (imagine 70k)
  • Finally, plotting is crucial for MRS, or even tables. We'd love to see more on that later.

@Lucas-Prates
Copy link
Contributor Author

Lucas-Prates commented Feb 24, 2025

This PR is ready for review again. Changes from last time:

  1. addressed some of the previous suggestions of @phmbressan;
  2. added a comparison_info method that provides a statistical summary print comparison of the results of two monte carlo simulations, similar to the method used in the regular monte carlo class;
  3. added a comparison_plots method that plots boxplots and histograms comparison of the results of two monte carlo simulations;
  4. added a compare_ellipses method that plots the ellipses for the apogee and landing point comparison of the results of two monte carlo simulations;
  5. added the mrs.rst file to the documentation explaining what is the MRS and how to use it;
  6. added the MultivariateRejectionSampler class .rst documentation;
  7. removed the test_mrs.ipynb notebook since the .rst file does the same in a much clearer way;
  8. on my pc, the MRS is 500x faster than running a monte carlo simulation of the equivalent sample size.

For review, I recommend mostly checking the .rst documentation. Here are some previews of the comparisons if you do not have the time to compile the html:

Print comparison:
image

Plot comparison:
image

Ellipses comparison:
image

@Lucas-Prates
Copy link
Contributor Author

A second comment: I made the design choice to implement the comparison methods in the MonteCarlo class instead of the MultivariateRejectionSampler. Here are the most relevant arguments:

  1. It makes sense to compare monte carlo simulations even outside of MRS, i.e. even one is not a sub-sample of the other. Hence, implementing this inside the MRS would make this usage extremely odd;
  2. It keeps the MultivariateRejectionSampler class implementation really simple and with only one job: to sample.

One option is, of course, to create another class for that job, but it did not seem to be the best option.

@Gui-FernandesBR Gui-FernandesBR marked this pull request as ready for review February 25, 2025 18:44
Copy link
Member

@Gui-FernandesBR Gui-FernandesBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Great implementation, @Lucas-Prates !

I would kindly recommend adding unit tests to some of the methods you've develop, just to make sure we cover some of the new code lines.

@Gui-FernandesBR
Copy link
Member

Also, please update the CHANGELOG file!

@Gui-FernandesBR Gui-FernandesBR changed the title ENH: Implementing Multivariate Rejection Sampling (MRS) in RocketPy ENH: Implement Multivariate Rejection Sampling (MRS) Feb 25, 2025
@Lucas-Prates
Copy link
Contributor Author

I have addressed the suggestions made by @Gui-FernandesBR, implemented unit and integration tests for the MRS, and updated the CHANGELOG.

Copy link
Member

@Gui-FernandesBR Gui-FernandesBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Gui-FernandesBR
Copy link
Member

@Lucas-Prates good work. Please squash and merge at your earliest convinience

Copy link
Member

@MateusStano MateusStano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good overall! Great job

Comment on lines 373 to 389
plt.scatter(
original_apogee_x,
original_apogee_y,
s=5,
marker="^",
color="green",
label="Original Apogee",
)
plt.scatter(
original_impact_x,
original_impact_y,
s=5,
marker="v",
color="blue",
label="Original Landing Point",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve colors of the points/ellipses. Using oposite colors is probably best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no such thing as "opposite colors", but matplotlib does offer a fair good guide to selecting colormaps: https://matplotlib.org/stable/users/explain/colors/colormaps.html

I suggest blue/orange or blue/red.

Ideally the user should be able to select their specific color... But I understand the current limitations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the right term in complementary not "oposite". Anyway, I meant something like this:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use then a "tetradic" color combination I found in this site. Here is the pallete I got, just for reference:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tetradic colors for the ellipses plot:
image

@Lucas-Prates
Copy link
Contributor Author

This PR is ready again for review. Since last time:

1 - addressed the points in Stano's review about documentation and color;
2 - removed some input checks which I think were not that useful but were a bit trick to do correctly;
3 - modified the flatten_dict function to better handle variables in inner levels.

The most important point is 3 since it might introduce a breaking change.

Copy link
Member

@Gui-FernandesBR Gui-FernandesBR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking good to me, great implementation.
I don't see the changes to flatten_dict as a problem.
Nobody is using that function right now.

Waiting for @MateusStano 's final comments so we can proceed.

@Gui-FernandesBR Gui-FernandesBR requested a review from Copilot April 7, 2025 13:56
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 14 out of 21 changed files in this pull request and generated 1 comment.

Files not reviewed (7)
  • .vscode/settings.json: Language not supported
  • docs/notebooks/monte_carlo_analysis/monte_carlo_analysis_outputs/mrs.outputs.txt: Language not supported
  • docs/reference/classes/MultivariateRejectionSampler.rst: Language not supported
  • docs/reference/index.rst: Language not supported
  • docs/user/index.rst: Language not supported
  • docs/user/mrs.rst: Language not supported
  • docs/user/sensitivity.rst: Language not supported
Comments suppressed due to low confidence (1)

rocketpy/tools.py:595

  • [nitpick] Consider renaming 'flatted_dict' to 'flattened_dict' for clarity and consistency with common terminology.
flatted_dict = {}

other_impact_x = np.array([])
other_impact_y = np.array([])

if len(original_apogee_x) == 0 and len(original_impact_x) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(original_apogee_x) == 0 and len(original_impact_x) == 0:
if len(original_apogee_x) == 0 and len(original_impact_x) == 0: # pragma no cover

if image is not None:
try:
img = imageio.imread(image)
except FileNotFoundError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except FileNotFoundError as e:
except FileNotFoundError as e: # pragma no cover

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Enhancement New feature or request, including adjustments in current codes Monte Carlo Monte Carlo and related contents
Projects
Status: Next Version
Development

Successfully merging this pull request may close these issues.

ENH: Implement MRS method on RocketPy!
4 participants