Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

--optimize-level 3 slowing things down #892

Open
mml opened this issue Dec 13, 2024 · 3 comments
Open

--optimize-level 3 slowing things down #892

mml opened this issue Dec 13, 2024 · 3 comments

Comments

@mml
Copy link

mml commented Dec 13, 2024

Yesterday I filed a bug prematurely, but I knew something was not quite right. The code below takes >30% longer with --optimize-level 3 than without it.

% scheme --script div-and-mod.ss
(time (for-each (lambda (...) ...) ...))
    no collections
    6.400004545s elapsed cpu time
    6.400545484s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
% echo '(compile-file "div-and-mod.ss")' | scheme -q
compiling div-and-mod.ss with output to div-and-mod.so
% scheme --script div-and-mod.so
(time (for-each (lambda (...) ...) ...))
    no collections
    6.117455272s elapsed cpu time
    6.117724356s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
% echo '(compile-file "div-and-mod.ss")' | scheme -q --optimize-level 3
compiling div-and-mod.ss with output to div-and-mod.so
% scheme --script div-and-mod.so
(time (for-each (lambda (...) ...) ...))
    no collections
    8.324555132s elapsed cpu time
    8.329026569s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
(set! total 0)
(let ([l (iota 300000000)])
  (time
    (for-each
      (lambda (x)
        (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
          (set! total (+ total quo rem))))
      l)))
(printf "Total ~:d~n" total)
% scheme --version
10.1.0
% uname -a
Linux welwitschia 6.1.0-27-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.115-1 (2024-11-01) x86_64 GNU/Linux
@jltaylor-us
Copy link
Contributor

This works as expected (i.e., optimize level 3 is faster) on tarm64osx. Someone else will have to check the x64 linux builds.

@mbakhterev
Copy link

i've got same timings approx as @mml. level 3 is slower

$ cat test.scm 
(set! total 0)

(let ([l (iota 300000000)])
  (time
    (for-each
      (lambda (x)
        (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
          (set! total (+ total quo rem))))
      l)))

(printf "Total ~:d~n" total)
$ echo '(compile-file "test.scm")' | chez -q
compiling test.scm with output to test.so
$ chez --script test.so
(time (for-each (lambda (...) ...) ...))
    no collections
    6.137851039s elapsed cpu time
    6.142206869s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
$ echo '(compile-file "test.scm")' | chez -q --optimize-level 3
compiling test.scm with output to test.so
$ chez --optimize-level 3 --script test.so
(time (for-each (lambda (...) ...) ...))
    no collections
    8.453322405s elapsed cpu time
    8.458856063s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
$ uname -a
Linux rocket 6.12.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 09 Dec 2024 14:31:57 +0000 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep 'model name' | uniq
model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
$ chez --version
10.1.0

@owaddell
Copy link
Contributor

I see the slowdown on arm64osx and a6le.

It looks like the issue is that the cp0 inline handler for fxdiv-and-mod does not (yet) anticipate how np-expand-primitives will handle fxdiv and fxmod. Since 1000 is not a power of two, we end up turning one out-of-line call into two such calls. If the test program had used 1024 instead of 1000, then the optimize-level 3 version would likely run faster.

We can cut a lot of (unmeasured) overhead from the test by replacing the for-each over the output of iota with a do loop:

(let ()
  (define (add-quo-rem total x)
    (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
      (+ total quo rem)))
  (define total
    (time
     (do ([x 0 (fx+ x 1)]
          [total 0 (add-quo-rem total x)])
         ((fx= x 300000000) total))))
  (printf "Total ~:d~n" total))

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants