Skip to content

finish hw04 #28

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

finish hw04 #28

wants to merge 1 commit into from

Conversation

luozhiya
Copy link

@luozhiya luozhiya commented Mar 8, 2022

小彭老师,又好久不见哈,前段时间非常非常忙,现在才有空做作业,请见谅哈

@luozhiya
Copy link
Author

luozhiya commented Mar 8, 2022

第04讲笔记

效果

优化前

Initial energy: -13.414000
Final energy: -13.356842
Time elapsed: 1304 ms

优化后

Initial energy: -13.414000
Final energy: -13.403915
Time elapsed: 81 ms

优化

SOA

把AOS改成SOA,虽然没有了面向对象的属性,但更方便数据成块读取

对齐

数据对齐成 64 bytes cache line width,有利于CPU 缓存,减少内存访问次数。

AVX256

手写AVX指令,可以一次批量处理8个float数据

  • _mm256_fmadd_ps 这个比自己写加乘要快
  • sqrt使用avx的_mm256_sqrt_ps

循环优化

把在循环中常量放到循环外层,或者减少部分计算次数

  • Gdt和mass的相乘是常量
  • AVX __m256 转 float的reduce_sum单独一个循环可以减少次数
  • 使用AVX,则循环步进可以一次+8

空间换时间

把计算中间结果存为全局变量,有利于减少计算次数

cmake配置

针对MSVC和G++有单独的配置,这个案例只测试Release所以只把编译选项加入Release。

if (CMAKE_COMPILER_IS_GNUCXX)
    target_compile_options(main PRIVATE $<$<CONFIG:Release>:-march=native -funroll-loops -O3>)
endif()
if (MSVC)
    target_compile_options(main PRIVATE $<$<CONFIG:Release>:/arch:AVX2 /fp:fast>)
endif()

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant