-
-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Modified fast_array_util.py for faster implementation #2940
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2940 +/- ##
==========================================
- Coverage 70.39% 70.02% -0.38%
==========================================
Files 222 222
Lines 16157 16161 +4
==========================================
- Hits 11373 11316 -57
- Misses 4784 4845 +61 ☔ View full report in Codecov by Sentry. |
Hi @andrewfullard @atharva-2001 @jvshields, |
*beep* *bop* Significantly changed benchmarks: All benchmarks: Benchmarks that have stayed the same:
| Change | Before [5e2d0bb3] <master> | After [6922871f] | Ratio | Benchmark (Parameter) |
|----------|------------------------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
| | 3.23±0.4μs | 4.01±0.4μs | ~1.24 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_within_shell |
| | 3.87±0.02ms | 4.54±0.08ms | ~1.17 | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('macroatom') |
| | 3.40±0.4μs | 3.84±0.4μs | ~1.13 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_bad_vpacket |
| | 6.69±0.9μs | 7.37±1μs | ~1.10 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket |
| | 2.68±3μs | 2.26±1μs | ~0.84 | transport_montecarlo_estimators_radfield_estimator_calcs.BenchmarkMontecarloMontecarloNumbaPacket.time_update_line_estimators |
| | 28.2±8μs | 22.0±5μs | ~0.78 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_last_interaction_tracker_list |
| | 2.62±0.5ms | 2.82±0.5ms | 1.08 | transport_montecarlo_single_packet_loop.BenchmarkTransportMontecarloSinglePacketLoop.time_single_packet_loop |
| | 63.8±0.2ms | 67.1±0.03ms | 1.05 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_rpacket_trackers_to_dataframe |
| | 531±100ns | 551±200ns | 1.04 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_pair_creation_opacity_calculation |
| | 38.7±0.04s | 39.1±0.02s | 1.01 | run_tardis.BenchmarkRunTardis.time_run_tardis |
| | 41.5±30μs | 41.8±20μs | 1.01 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_emission |
| | 1.05±0m | 1.05±0m | 1.00 | run_tardis.BenchmarkRunTardis.time_run_tardis_rpacket_tracking |
| | 2.08±0m | 2.08±0m | 1.00 | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_FormalIntegrator_functions |
| | 205±0.4ns | 205±0.07ns | 1.00 | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_intensity_black_body |
| | 1.68±0.02ms | 1.68±0.01ms | 1.00 | transport_montecarlo_main_loop.BenchmarkTransportMontecarloMontecarloMainLoop.time_montecarlo_main_loop |
| | 1.21±0.01μs | 1.20±0μs | 0.99 | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_boundary |
| | 37.9±0.08μs | 37.5±0.2μs | 0.99 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_tracker_list |
| | 531±200ns | 521±100ns | 0.98 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_compton_opacity_calculation |
| | 2.79±0ms | 2.72±0.01ms | 0.98 | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('scatter') |
| | 561±200ns | 541±200ns | 0.96 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_photoabsorption_opacity_calculation |
| | 1.63±0.4μs | 1.57±0.4μs | 0.96 | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_line |
| | 45.4±30μs | 43.7±30μs | 0.96 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_scatter |
| | 758±2ns | 723±0.6ns | 0.95 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_thomson_scatter |
| | 7.67±2μs | 7.25±2μs | 0.95 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_volley |
If you want to see the graph of the results, you can check it here |
Hi @andrewfullard , |
Hi @andrewfullard @KasukabeDefenceForce , |
Please demonstrate the speedup using |
@andrewfullard ,
# It is currently not possible to use scipy.integrate.cumulative_trapezoid in
# numba. So here is my own implementation.
import numpy as np
from numba import njit, prange
import timeit
from tardis.transport.montecarlo import njit_dict
@njit(**njit_dict)
def numba_cumulative_trapezoid(f, x):
"""
Cumulatively integrate f(x) using the composite trapezoidal rule.
Parameters
----------
f : numpy.ndarray, dtype float
Input array to integrate.
x : numpy.ndarray, dtype float
The coordinate to integrate along.
Returns
-------
numpy.ndarray, dtype float
The result of cumulative integration of f along x
"""
integ = (np.diff(x) * (f[1:] + f[:-1]) / 2.0).cumsum()
return integ / integ[-1]
@njit(**njit_dict)
def cumulative_integrate_array_by_blocks(f, x, block_references):
"""
Cumulatively integrate a function over blocks.
This function cumulatively integrates a function `f` defined at
locations `x` over blocks given in `block_references`.
Parameters
----------
f : numpy.ndarray, dtype float
Input array to integrate. Shape is (N_freq, N_shells), where
N_freq is the number of frequency values and N_shells is the number
of computational shells.
x : numpy.ndarray, dtype float
The sample points corresponding to the `f` values. Shape is (N_freq,).
block_references : numpy.ndarray, dtype int
The start indices of the blocks to be integrated. Shape is (N_blocks,).
Returns
-------
numpy.ndarray, dtype float
Array with cumulatively integrated values. Shape is (N_freq, N_shells)
same as f.
"""
n_rows = len(block_references) - 1
integrated = np.zeros_like(f)
for i in prange(f.shape[1]): # columns
# TODO: Avoid this loop through vectorization of cumulative_trapezoid
for j in prange(n_rows): # rows
start = block_references[j]
stop = block_references[j + 1]
integrated[start + 1 : stop, i] = numba_cumulative_trapezoid(
f[start:stop, i], x[start:stop]
)
return integrated
@njit(**njit_dict)
def numba_cumulative_trapezoid2(f, x):
"""
Cumulatively integrate f(x) using the composite trapezoidal rule with manual loop.
"""
n = len(f)
integ = np.zeros(n, dtype=np.float64)
for i in range(1, n):
dx = x[i] - x[i-1]
integ[i] = integ[i-1] + (f[i] + f[i-1]) * dx / 2.0
return integ / integ[-1]
@njit(**njit_dict)
def cumulative_integrate_array_by_blocks2(f, x, block_references):
"""
Cumulatively integrate a function over blocks using a manual loop-based trapezoidal rule.
"""
n_blocks = len(block_references) - 1
integrated = np.zeros_like(f)
for i in prange(f.shape[1]):
for j in prange(n_blocks):
start = block_references[j]
stop = block_references[j + 1]
integrated[start:stop, i] = numba_cumulative_trapezoid2(f[start:stop, i], x[start:stop])
return integrated
N_freq = 100000
N_shells = 1000
N_blocks = 30
x = np.linspace(0, 10, N_freq)
f = np.random.random((N_freq, N_shells))
block_references = np.linspace(0, N_freq, N_blocks + 1, dtype=int)
cumulative_integrate_array_by_blocks(f, x, block_references)
cumulative_integrate_array_by_blocks2(f, x, block_references)
original_time = timeit.timeit(
'cumulative_integrate_array_by_blocks(f, x, block_references)',
globals=globals(),
number=10
)
optimized_time = timeit.timeit(
'cumulative_integrate_array_by_blocks2(f, x, block_references)',
globals=globals(),
number=10
)
print(f"Original implementation average time: {original_time / 10:.4f} seconds")
print(f"Optimized implementation average time: {optimized_time / 10:.4f} seconds")
original_result = cumulative_integrate_array_by_blocks(f, x, block_references)
optimized_result = cumulative_integrate_array_by_blocks2(f, x, block_references)
assert np.allclose(original_result, optimized_result), "Results do not match!"
print("Both implementations give the same result.")
print("Original result (first 10 values):", original_result[:10,:10])
print("Optimized result (first 10 values):", optimized_result[:10,:10]) Output for running my test script is: Original implementation average time: 3.0688 seconds
Optimized implementation average time: 2.8569 seconds
Both implementations give the same result.
Original result (first 10 values): [[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
0.00023938 0.00032636 0.00030593 0.00035552]
[0.00035765 0.00041922 0.0007794 0.00043033 0.00054467 0.00085113
0.00035781 0.00068471 0.0005169 0.00071021]
[0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
0.00049022 0.00107013 0.00080128 0.00113117]
[0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
0.00080311 0.00154914 0.00104561 0.00134419]
[0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
0.00109628 0.0019187 0.00114761 0.0015953 ]
[0.0017717 0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
0.00121426 0.00214087 0.00123347 0.00182233]
[0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
0.00151144 0.00233141 0.00126556 0.00191921]
[0.00241064 0.0016145 0.00292188 0.00261326 0.00173571 0.0025099
0.00177101 0.00268684 0.00154277 0.00201923]
[0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
0.00196604 0.00317937 0.00204687 0.00223908]]
Optimized result (first 10 values): [[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
0.00023938 0.00032636 0.00030593 0.00035552]
[0.00035765 0.00041922 0.0007794 0.00043033 0.00054467 0.00085113
0.00035781 0.00068471 0.0005169 0.00071021]
[0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
0.00049022 0.00107013 0.00080128 0.00113117]
[0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
0.00080311 0.00154914 0.00104561 0.00134419]
[0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
0.00109628 0.0019187 0.00114761 0.0015953 ]
[0.0017717 0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
0.00121426 0.00214087 0.00123347 0.00182233]
[0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
0.00151144 0.00233141 0.00126556 0.00191921]
[0.00241064 0.0016145 0.00292188 0.00261326 0.00173571 0.0025099
0.00177101 0.00268684 0.00154277 0.00201923]
[0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
0.00196604 0.00317937 0.00204687 0.00223908]] |
So the speedup is ~7% then. I think you should revisit your solution. |
thanks for your contribution. |
📝 Description
Type: 🪲
bugfix
| 🚀feature
numba_cumulative_trapezoid
method modified to integrate using a manual for loop.Fixes #2757
📌 Resources
Examples, notebooks, and links to useful references.
🚦 Testing
How did you test these changes?
I have tested on a random data, I have compared both, the original and the optimized implementation, Here is the snippet:


The snippets show that the values of outputs with both methods is exactly same even upto high precision and also the optimised implementation is more than
~50%
faster.☑️ Checklist
build_docs
label