Batch kernels for backward pass of Preprocessing #3

sandeepnmenon · 2024-04-28T23:49:29Z

This is a follow up for PR #2 and should be merged after.

Changes

New kernels for batched preprocessCUDA for backward pass
New API for the package preprocess_gaussians_backward_batched
Test file to test and compare batched and non batched preprocess forward and backward kernel

Results

Tests ran on V100

num_gaussians = 1000000
num_batches=64
SH_ACTIVE_DEGREE = 3

Time taken by run_batched_gaussian_rasterizer: 95.2484 ms
Time taken by run_batched_gaussian_rasterizer BACKWARD: 84.6223 ms
Time taken by run_batched_gaussian_rasterizer_batch_processing: 28.9444 ms
Time taken by run_batched_gaussian_rasterizer_batch_processing BACKWARD: 4.9838 ms

…s as a batch

…-gaussian-rasterization into mlsys/batched_preprocess

…of batched_raster_settings

…s as a batch

…in preprocess functions

…ess functions

…ion_backward_tests.py

…or all inputs

…o/diff-gaussian-rasterization into mlsys/batched_preprocess

prapti19

Tested everything, works great!

TarzanZhao · 2024-05-11T04:11:20Z

cuda_rasterizer/backward.cu

@@ -17,21 +17,21 @@ namespace cg = cooperative_groups;

 // Backward pass for conversion of spherical harmonics to RGB for
 // each Gaussian.
-__device__ void computeColorFromSH(int idx, int deg, int max_coeffs, const glm::vec3* means, glm::vec3 campos, const float* shs, const bool* clamped, const glm::vec3* dL_dcolor, glm::vec3* dL_dmeans, glm::vec3* dL_dshs)
+__device__ void computeColorFromSH(int point_idx, int result_idx, int deg, int max_coeffs, const glm::vec3* means, glm::vec3 campos, const float* shs, const bool* clamped, const glm::vec3* dL_dcolor, glm::vec3* dL_dmeans, glm::vec3* dL_dshs)


We should make sure your two pull request will not conflict with each other when we merge both of them. Maybe we could do this together in tomorrow meeting.

They are separate PRs. We'll resolve merge conflict after the forward pass PR is merged in.

TarzanZhao · 2024-05-11T04:12:26Z

cuda_rasterizer/rasterizer_impl.cu

@@ -472,8 +472,8 @@ int CudaRasterizer::Rasterizer::preprocessForwardBatches(
 	// In sep_rendering==True case, we will compute tiles_touched in the renderForward. 
 	// TODO: remove it later by modifying FORWARD::preprocess when we deprecate sep_rendering==False case
 	uint32_t* tiles_touched_temp_buffer;
-	CHECK_CUDA(cudaMalloc(&tiles_touched_temp_buffer, P * sizeof(uint32_t)), debug);
-	CHECK_CUDA(cudaMemset(tiles_touched_temp_buffer, 0, P * sizeof(uint32_t)), debug);


Same as comments in the other PR. We could delete these memory allocation.

Created issue #4

TarzanZhao · 2024-05-11T04:18:33Z

tests/rasterization_preprocess_test.py

+    torch.cuda.synchronize()
+    start_backward_event.record()
+
+    loss = compute_dummy_loss(means3D, scales, rotations, shs, opacity)


This might be wrong. To test the correctness of your batched implementation, we should indeed execute the backward kernels. However, the dummy loss only use original parameters as input. Therefore, we won't call the backward kernel of preprocess when you do loss.backward(). You should change to
loss = compute_dummy_loss( batched_means2D, batched_conic_opacity, batched_rgb, batched_depths, batched_radii)

sandeepnmenon and others added 29 commits April 27, 2024 01:58

preprocess batches for backward outline

c408be1

solve syntax errors in backward

6444aa7

Refactor GaussianRasterizationSettings class to handle raster_setting…

8df1b63

…s as a batch

Merge branch 'mlsys/batched_preprocess' of github.com:TarzanZhao/diff…

f59e67c

…-gaussian-rasterization into mlsys/batched_preprocess

added focal_x and focal_y calculation inside the kernel

529710b

Refactor rasterization_tests.py to use raster_settings_batch instead …

3ac1ad3

…of batched_raster_settings

fixed namedtuple setting bug

ce314e2

Refactor GaussianRasterizationSettings class to handle raster_setting…

fdd3b4f

…s as a batch

remove focal_x and focal_y calculations

cdf3bc1

Refactor CUDA rasterizer code to include width and height parameters …

591a5c1

…in preprocess functions

Renamed W and H to image_width and image_height parameters in preproc…

1b54023

…ess functions

reverted focal_x and focal_y removal in normal preprocessBackward

9294a07

grad_means2D to handle more than 2 dimensions

877677f

add tests for backward

444e8a5

ruff formatting and gradients for remaining inputs

46b83eb

Add pyproject.toml file with ruff line-length set to 120

710b56f

Refactor ruff.toml file to set line-length to 120 and indent-width to 4

2ca5ae6

Refactor compare_tensors function to handle None values in rasterizat…

7a4b6b4

…ion_backward_tests.py

Update ruff.toml file to set line-length to 120

14889e6

Refactor rasterization_backward_tests.py to include gradient checks f…

5682d26

…or all inputs

gradients calculated for all the variables to check and cloning them

945e8cf

converted to pytest testing

c84d7cd

fixed colon bug and ruff formatiting

6f38446

Add __pycache__/ to .gitignore

7b30782

renamed to *_test.py

13a5559

Update .gitignore to include __pycache__/

5b24881

moved test into tests folder

9e6f4a9

Add instructions for running tests in README.md

7be38fa

Merge branch 'mlsys/forward_preprocess_batch' of github.com:TarzanZha…

1887e14

…o/diff-gaussian-rasterization into mlsys/batched_preprocess

sandeepnmenon requested a review from prapti19 April 28, 2024 23:49

sandeepnmenon added 2 commits April 28, 2024 19:50

deleted old test file

307e156

renamed idx to point_idx and view_idx to result_idx in backward

21ee225

prapti19 approved these changes Apr 29, 2024

View reviewed changes

sandeepnmenon changed the title ~~[WIP] Batch kernels for backward pass of Preprocessing~~ Batch kernels for backward pass of Preprocessing May 1, 2024

sandeepnmenon added 2 commits May 8, 2024 00:30

moved from python time to torch record

363b4ee

fixed num_points in preprocessForwardBatches

91d1582

TarzanZhao reviewed May 11, 2024

View reviewed changes

TarzanZhao requested changes May 11, 2024

View reviewed changes

sandeepnmenon added 3 commits May 11, 2024 10:00

Refactor test function names for clarity and consistency

ee767da

fixed but in printing only first 5 non matching indices

2e7f032

fixed backward bug of backward kernel not getting executed

e8edb86

sandeepnmenon mentioned this pull request May 11, 2024

Batch kernels for forward pass of Preprocessing #2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch kernels for backward pass of Preprocessing #3

Batch kernels for backward pass of Preprocessing #3

sandeepnmenon commented Apr 28, 2024 •

edited

Loading

prapti19 left a comment

TarzanZhao May 11, 2024

sandeepnmenon May 11, 2024

TarzanZhao May 11, 2024

sandeepnmenon May 11, 2024

TarzanZhao May 11, 2024

Batch kernels for backward pass of Preprocessing #3

Are you sure you want to change the base?

Batch kernels for backward pass of Preprocessing #3

Conversation

sandeepnmenon commented Apr 28, 2024 • edited Loading

Changes

Results

prapti19 left a comment

Choose a reason for hiding this comment

TarzanZhao May 11, 2024

Choose a reason for hiding this comment

sandeepnmenon May 11, 2024

Choose a reason for hiding this comment

TarzanZhao May 11, 2024

Choose a reason for hiding this comment

sandeepnmenon May 11, 2024

Choose a reason for hiding this comment

TarzanZhao May 11, 2024

Choose a reason for hiding this comment

sandeepnmenon commented Apr 28, 2024 •

edited

Loading