Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Oxide: Conda-provided toolchain performance consistently different than fresh-built #311

Open
tcal-x opened this issue Oct 2, 2021 · 13 comments
Assignees

Comments

@tcal-x
Copy link
Collaborator

tcal-x commented Oct 2, 2021

We have nightly actions that built 3 designs each with 3 different seeds. One of these actions gets Yosys and Nextpnr-Nexus via Conda packges. The other action builds them fresh by cloning the Yosys and Nextpnr repositories and building them fresh.

The performance (achieved maximum frequency of the placed and routed design) is usually worse with the Conda-provided package, and there is no explanation for it.

See https://github.com/google/CFU-Playground/actions/workflows/fmax-trials.yml (Conda) and https://github.com/google/CFU-Playground/actions/workflows/fmax-trials-fresh-build.yml (fresh-built).

For the middle design with fresh-built tools, the (prelim/final) fmax in MHz were (70/84), (61/83), (62/82).

Using Conda-provided tools, the values were (66/73), (54/76), (64/75).

They should be identical unless there was a significant commit between the runs (the git hashes are printed out in each run), but that is not the case here. The fresh-built results have been the same for the last few days, and the Conda packages were built within the last day.

Are the tools built with different flags? Could there be some other executable in the Conda packages that somehow affects performance? You can run both ways locally (look at the Github actions for each).

@mithro
Copy link
Contributor

mithro commented Oct 2, 2021

This certainly seems weird. @PiotrZierhoffer - can you get someone to investigate?

My bet is that the versions are not as close as @tcal-x thinks they are.

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 2, 2021

I suppose the prjoxide executable/database is a potential source of difference as well. If the nextpnr-nexus Conda build in turn uses the prjoxide Conda package, that might be a bit old.

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 2, 2021

Yeah, actually the Yosys Conda package is a bit old (3 days). Piotr mentioned that some packages were't getting approved as a new 'main' because of an unrelated CI failure.

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 5, 2021

@PiotrZierhoffer , I see the Litex-Hub Yosys 'main' issue has been resolved, so that we are getting an up-to-date Yosys version. I am still seeing differences between the Conda-provided tools and the fresh-built.

The yosys --version printouts are pretty different -- this means they were compiled with different flags? Do you know the story behind all of the flags in the Conda build?

From fresh-built:

Yosys 0.10+10 (git sha1 f3ef579a, clang 10.0.0-4ubuntu1 -fPIC -Os)
nextpnr-nexus -- Next Generation Place and Route (Version 9c32e2d8)

From Conda-provided:

Yosys 0.10+10 (git sha1 abc57006, x86_64-conda_cos6-linux-gnu-gcc 1.24.0.133_b0863d8_dirty -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -fdebug-prefix-map=/home/runner/work/conda-eda/conda-eda/workdir/conda-env/conda-bld/yosys_1633388921977/work=/usr/local/src/conda/yosys-0.9_5622_gabc57006 -fdebug-prefix-map=/home/runner/work/CFU-Playground/CFU-Playground/env/conda/envs/cfu-common=/usr/local/src/conda-prefix -fPIC -Os -fno-merge-constants)
nextpnr-nexus -- Next Generation Place and Route (Version 0.0.0-3848-g9c32e2d8)

@PiotrZierhoffer
Copy link
Collaborator

First of all, I see clang vs gcc, so it's a different toolchain. The flags come mainly from conda, I see that we add -std=c++11 -Os -fno-merge-constants.

Do you still observe the performance difference here?

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 6, 2021

Hi @PiotrZierhoffer , yes, there is still a difference looking at the latest workflows (https://github.com/google/CFU-Playground/actions/workflows/fmax-trials.yml and https://github.com/google/CFU-Playground/actions/workflows/fmax-trials-fresh-build.yml).

The Yosys compile flags might not have anything to do with it; only if they affect Yosys output. But I don't see any -D<something> flags.

Can you have someone try to get to the bottom of it? E.g. see if the Yosys output differs, if so why, if not then what is different, etc. Maybe the difference can be reproduced on your machine, maybe not -- that would be 'interesting' too if there's no difference locally.

@kgugala
Copy link
Collaborator

kgugala commented Oct 14, 2021

@tcal-x it seems the problem does not exist anymore (see the latest runs):

conda: https://github.com/google/CFU-Playground/runs/3882511524?check_suite_focus=true#step:19:1
fresh build: https://github.com/google/CFU-Playground/runs/3882327394?check_suite_focus=true#step:19:1

in both cases the results were:

Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)

@mithro
Copy link
Contributor

mithro commented Oct 14, 2021

Should the results not be identical if given identical input / versions?

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 14, 2021

Should the results not be identical if given identical input / versions?

Even though I should know what is going on, it confused me at first as well.
Then I remembered that each run gives out two max freq lines: one preliminary and one final.

So to make it more clear:

Conda results:

Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)

Fresh build results:

Info: Max frequency for clock 'por_clk$glb_clk': 71.47 MHz (PASS at 70.72 MHz)
Info: Max frequency for clock 'por_clk$glb_clk': 76.56 MHz (PASS at 70.72 MHz)

Thanks @kgugala ; I will check the runs again tomorrow, and assuming they still match, I'll close this.

@tcal-x
Copy link
Collaborator Author

tcal-x commented Oct 18, 2021

I see identical results with the most recent runs; I'll close this.

@tcal-x tcal-x closed this as completed Oct 18, 2021
@tcal-x tcal-x reopened this Feb 25, 2022
@tcal-x
Copy link
Collaborator Author

tcal-x commented Feb 25, 2022

I'm again seeing significant performance (critical path / fmax) differences for HPS between locally-built tools and Conda-provided tools. I see it both in CI and building locally.

Using locally-built and installed yosys and nextpnr-nexus:

seed-1/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 89.08 MHz (PASS at 53.50 MHz)
seed-2/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 80.06 MHz (PASS at 53.50 MHz)
seed-3/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 85.35 MHz (PASS at 53.50 MHz)
seed-4/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 68.47 MHz (PASS at 53.50 MHz)
seed-5/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 82.35 MHz (PASS at 53.50 MHz)
seed-6/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 81.95 MHz (PASS at 53.50 MHz)
seed-7/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 81.95 MHz (PASS at 53.50 MHz)
seed-8/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 86.63 MHz (PASS at 53.50 MHz)

Using Conda-provided tools:

seed-1/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 79.23 MHz (PASS at 53.50 MHz)
seed-2/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 79.17 MHz (PASS at 53.50 MHz)
seed-3/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 71.94 MHz (PASS at 53.50 MHz)
seed-4/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 71.09 MHz (PASS at 53.50 MHz)
seed-5/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 73.35 MHz (PASS at 53.50 MHz)
seed-6/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 74.83 MHz (PASS at 53.50 MHz)
seed-7/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 73.35 MHz (PASS at 53.50 MHz)
seed-8/nextpnr-nexus.log:Info: Max frequency for clock 'clkout$glb_clk': 69.44 MHz (PASS at 53.50 MHz)

@tcal-x
Copy link
Collaborator Author

tcal-x commented Feb 26, 2022

The difference is entirely due to whether gcc or clang is used to build Yosys. With a local build, clang is default. If I instead build Yosys with:

make config-gcc
make -j8
sudo make install

then I get exactly the same results as when using the Conda package.

I'll file an issue on Yosys to see if this is expected behavior.

@tcal-x
Copy link
Collaborator Author

tcal-x commented Mar 2, 2022

I opened YosysHQ/yosys#3218 last week.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants