Skip to content

hw04 #25

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

hw04 #25

wants to merge 1 commit into from

Conversation

Sduby22
Copy link

@Sduby22 Sduby22 commented Feb 12, 2022

平台

CPU: i7-4790k

OS: Windows 10 WSL2

不知为什么,只有MINGW gcc和WSL2上的gcc能够完成矢量化,MSVC与clang均警告无法矢量化。

优化结果

  1. 未优化
Initial energy: -13.414000
Final energy: -13.356842
Time elapsed: 1946 ms
  1. 将全局命名空间的sqrt()替换为std::sqrt()
    • 这样会默认使用float的重载版本而不是double
Initial energy: -13.414000
Final energy: -13.356841
Time elapsed: 1571 ms
  1. 使用SOA + SIMD优化
    • 这样做的话不同对象的相同属性会连续存储,在循环中连续访问的时候便于使用AVX指令集优化。

步骤:

  • 将Star结构体内的float改为vector<float>
  • 将循环中的迭代器改为下标访问
  • CMakeLists中添加OpenMP包,添加-ffast-math -march=native选项启用AVX指令集
  • 使用#pragma omp simd避免pointer aliasing影响优化
Initial energy: -13.414010
Final energy: -13.356913
Time elapsed: 234 ms
  1. 将循环中的不变量提取出来,这样可以避免重复计算

对于step()其中的不变量为G*dteps*eps, 提取到全局变量后加快了速度

Initial energy: -13.414012
Final energy: -13.356912
Time elapsed: 217 ms
  1. 进行某些代数化简
float x = stars.mass[j] * Gdt / d2;
stars.vx[i] += dx * x;
stars.vy[i] += dy * x;
stars.vz[i] += dz * x;
Initial energy: -13.414010
Final energy: -13.356913
Time elapsed: 201 ms
  1. 不使用STL容器,使用C数组

有一定提升,但感觉不是很必要

Initial energy: -13.414012
Final energy: -13.356912
Time elapsed: 170 ms
  1. 尝试使用循环展开,然而无效果

结果&总结

最终结果:1946ms -> 170ms (11.4倍)

  • 效果最显著优化:开启SIMD指令
  • 其他优化:使用std::sqrt(), 进行代数化简、循环无关量提出,不使用STL容器

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant