Toom/FFT multiplication #227

czurnieden · 2019-04-27T22:23:52Z

Toom-Cook 4, 5 and FFT (radix-2 Hartley transform, nothing fancy)
etc/tune has also been updated to tune the additional algorithms, too.
Toom-Cook-3 has been updated to allow to "proof" the algorithm in the same way as Toom-Cook 4 and 5: they contain comments that can be fed into pari/gp.

czurnieden · 2019-04-27T23:43:08Z

So my local astyle 3.1 (current) said it was already well formatted but the older astyle 2.06 had a different opinion about that.
Oh, my…

sjaeckel · 2019-04-27T23:59:00Z

IMO that's a regression in the current astyle!

minad · 2019-04-29T12:31:10Z

Toom-Cook-3 has been updated to allow to "proof" the algorithm...
they contain comments that can be fed into pari/gp...

I think it might also be good to add these pari/gp programs as separate scripts in a subdirectory for comparison.

Some superficial review comments:

Are there tests? Is it possible to test some of the functions separately? This PR contains quite a lot of code.
Is it possible to split the PR up in logical units?
Licensing situation - I've seen you refer to quite a few papers in your comments. I wonder if there is something to be feared regarding to unfree code?

czurnieden · 2019-04-29T19:02:20Z

I think it might also be good to add these pari/gp programs as separate scripts in a subdirectory for comparison.

Anything wrong with:
grep "/\*\*" $FILE | sed -e 's#/\*\*##g;s#\*/##g' | gp -q
If it ends in a zero it is correct.
If not it is either not correct or I forgot a line like in bn_s_mp_toom_cook_4_mul.c as I just saw.
*grrr*

Are there tests?

Yes, the tuning program contains testing code (It is already comparing the new code against the old one for benchmarking, so it wasn't a lot of work to add a mp_cmp in between. Came also quite handy at debugging)
It was already written long before you (plural) decided that all functions, including the private ones, can be tested by demo/test.c, too.

That program is included in testme.sh, so the actual Travis test tested them already.

"All checks have passed" means the new functions are correct ("as far as tested" and so on, the usual caveats apply here, too, of course)

The FFT functions have an additional run-time test but that is just casting-nines. It catches the point when the rounding errors start to appear but it is not strictly deterministic. If you remember from primary school: it has false positives.

Is it possible to test some of the functions separately?

Yes, and it has been done in the aforementioned program. T-C-[2,3,4](mul|sqr) and FFT(mul|sqr) are tested individually.

This PR contains quite a lot of code.

It has been stripped down quite a bit already ;-)
Most of the code is in the FFT compartment for all of the different MP_xBITs. It cannot be avoided without going into preprocessor hell.

Is it possible to split the PR up in logical units?

Yes.

The two commits (TC and FFT) are more or less independent and can be separated but it would need quite a bit of work to do so.

An even finer grid would be a lot of work, though, and to be honest, the amount of work involved to make TC and FFT completely independent alone is already something I cannot see the advantage of.

OK, making small logical units has a lot of advantages, but ripping things apart for the sake of ripping them apart is not something I can support with a clean conscious.
Especially if I am the one who's gonna do all of the work involved ;-)

Also: even if I separate them, all the glue (mp_mul.c and mp_sqr.c) will have to be adapted if you want to merge any of these individual little units. And the tuning program, which I nearly forgot, would also need some extra massage for every merge.

I cannot just C&P the ten tests from tune.c into demo/test.c, either, I would need to write them more or less from scratch, although that needs to be done at one time, we shouldn't have tests at various places.

Licensing situation - I've seen you refer to quite a few papers in your comments. I wonder if there is something to be feared regarding to unfree code?

I don't use unfree code, out of principle!
(And stinginess, but I'll never admit that! ;-) )

But you mean Public Domain code, which is a very different kind of beast, mainly because it doesn't really exist anymore in most of the jurisdictions of the ""developed world". For some reason (money?) the lobbyists were able to persuade the governments that Public Domain is something very icky.
It also means that the default for "no license" is "full copyright" which makes the "Unlicense" very problematic. But that's not my problem, I'm just here to give back (and have a bit of fun, admitted. I don't know about you, but writing code without pressure is fun for me ;-) )

No code has been copied, all had been written from scratch.

That's the reason it took me so long to prepare it for LTM. My private T-C implementations had code from Bodrato and Zimmermann in it, both GPL-3, which I had to replace with my own derived from the descriptions in the papers cited in my code. It's mostly the interpolation part that is of any interest, the rest is just optimizing for memory by reusing variables and for calculations by reusing results. Just shoving things to and fro until it looks good and runs well ;-)

It is also highly simplified now. You might take a look at the implementation of T-C-3 by Zimmermann for contrast ;-)

A lot of it is from Jörg Arndt's book "Matters Computational" including the idea to optimize the Hartley transform (actually, the idea to use the Hartley transform in the first place) and his way of avoiding the bit-reverse part. I was not able to use any of his code for LTM, because it is GPL-3, only the descriptions and pseudocodes in his book.

Leaves the question if it was OK for me, legally, to use Jörg Arndt's book in that manner.

While looking for a paper online Google offered me a piece of software from Harvard for atmospheric modelling who used J. Arndt's book to implement the Hartley transform, too. Their license, although very close to PD, is still a license and not PD, so I couldn't even use that. (I think I made a note in bn_mp_fft.c?).
But that find gives at least a little bit of confidence that using the book alone without the code is OK.

I will give it a try and ask Jörg himself but I don't have a lot of hope for an answer, his inbox is most probably full with questions regarding his book *sigh*

License is also the reason I didn't offer the fast-division part of it now.
It is already several years ago but I remember that I got stuck with implementing Burnikel-Ziegler. There is a typo in the paper but I wasn't sure if it is an actual typo or if I was just to stupid to write the code so I asked Google. I also remember that I found out that it was a typo but I also found somebody who had implemented it already, also straight from the description in the paper, no further optimizations, and funnily meant to get into LTM, too. What I do not remember is, if I took anything from that code. Problem: I wasn't able to find it, my Google-Foo left me here, so I cannot tell for sure until I found it. If it is gone from the web for good I have to port my Javascript implementation of which I am sure that is is my very own (the C version couldn't be easily ported) but it's also a bit slower.

No, found it. It was indeed a patch meant for LTM and the year fits, too, it was less than 10 years ago that I wrote my version and it is also straight from the paper, even the variable names are the same. I doubt I took anything from it, I would have made a note at a prominent place if I did, but I'll put a link to the Sourceforge archive in the head of my file, just to be sure.

The Newton Division seems to need some work. It is correct but benchmarking showed some odd slowdowns at unexpected places so I most probably put some kind of bummer in it somewhere and found no time yet, to look for it.

Fast division, B-Z at least, is necessary to implement fast number conversion which is the last addition I would like LTM to have so I can restart my work at LibTomFloat. I stopped it when I found out that I forgot to implement the guard bits for proper rounding but was sure I already did which was definitely one of the bigger forehead-flattening "Aaargh!" moments in my programming-life ;-)

czurnieden · 2019-05-01T13:13:36Z

I will give it a try and ask Jörg himself

I did and got an answer

Just re-assign everything you need(*) to whatever license. Suggest to put "by permission of the author" or something to that effect. (*) includes code from the fxt lib not printed, localized FHT comes to mind.
Parts of fxt are used in software with different licenses,
including closed source (commercial) software.

I basically put GPL3 to make things clear in case I
drop dead. I'd like the FSF to grab the fxt lib then.

When code is printed in a book one should always be
able to use under any license, except when the authors
decided to be extra nasty (think "Numerical Recipes").
This is from a "do the right thing" perspective,
your resident legal eagle may well disagree!

Best regards, Joerg

P.S.: feel free to put this message onto the github
discussion, so things are documented.

I "stamped" every file with his permission where there is a chance that I used any of his stuff, ~~couldn't be bothered to check all of the gory details~~ better safe than sorry.

czurnieden · 2019-05-02T19:10:12Z

@minad et. al. : found some time this evening to shovel the tests in etc/tune.c into their proper places in demo/test.c but found out that tommath_private.h is not included.

Is it because tommath_private.h wasn't needed yet and I'm supposed to simply add it if that need arises—which would be now—or did you change your mind w.r.t testing private functions?

Also: call me stingy if you want but I don't like to use precious entropy for the PRNG if it is not needed at all for an individual test that just needs a bag of mixed-bits like the tests for the multiplication functions. Another disadvantage is the fact that it is not repeatable.

So I would like to use a deterministic PRNG as the source of non-cryptographic bits for the non-cryptographic tests. My go-to function for that purpose is the 64-bit variant of Bob Jenkins' PRNG. It has a good mix, good avalanche and is sufficiently fast. That together with a fixed seed (0xdeadbeef) would make a good generator for the bags of mixed-bits needed for testing.

minad · 2019-05-02T20:58:17Z

@minad et. al. : found some time this evening to shovel the tests in etc/tune.c into their proper places in demo/test.c but found out that tommath_private.h is not included. Is it because tommath_private.h wasn't needed yet and I'm supposed to simply add it if that need arises—which would be now—or did you change your mind w.r.t testing private functions?

Maybe it got somehow lost on the way but I think it should be there. If needed, add it.

So I would like to use a deterministic PRNG as the source of non-cryptographic bits for the non-cryptographic tests. My go-to function for that purpose is the 64-bit variant of Bob Jenkins' PRNG. It has a good mix, good avalanche and is sufficiently fast. That together with a fixed seed (0xdeadbeef) would make a good generator for the bags of mixed-bits needed for testing.

Yes we could add a deterministic PRNG for this purpose. I used xoroshiro recently which is simpler than Jenkins.

sjaeckel · 2019-05-06T20:58:07Z

demo/test.c but found out that tommath_private.h is not included....

Maybe it got somehow lost on the way but I think it should be there. If needed, add it.

I'm still no fan of "adding private headers to what-should-be API tests" but until we've got this pulled apart properly I agree to do it like that.

minad · 2019-05-15T11:14:41Z

@czurnieden If you have time you might want to update your PRs to the current state on develop. After this consolidation work, I personally feel better about adding functionality. Furthermore you can test private functions in the unit tests. Also some of your work like the prime sieve goes in the right direction concerning #243.

czurnieden · 2019-05-16T17:01:23Z

Sorry for the silence but had to update my machine because LTS stopped last month. Failed completely (but my machine was as mess, would have made me wonder if it would have worked flawlessly ;-) ), had to install a new one, am still plucking my stuff out of the backups, cursing the Gnome and systemd developers, and trying to find out what's amiss.
But when I'm fully done: that was it for the next four years :-)

If you have time

Not at all, but that shall not matter.

you might want to update your PRs to the current state on develop

I think it cannot be done with a simple rebase now, probably needs more work, hence a bit more time than what is usual with me.

Also some of your work

List, please, so I can concentrate on what is wanted.

like the prime sieve goes in the right direction concerning #243.

OK, so prime sieve first.

I had some questions regarding the naming and scope of some the constants and types I need for the sieve. Did you make a decision or do you want me to try my own teeth at it?

BTW: why did you change the title? Is it a hint for me re. cutting my stuff into logical parts? ;-)

minad · 2019-05-16T19:18:16Z

Sorry for the silence but had to update my machine because LTS stopped last month. Failed completely (but my machine was as mess, would have made me wonder if it would have worked flawlessly ;-) ), had to install a new one, am still plucking my stuff out of the backups, cursing the Gnome and systemd developers, and trying to find out what's amiss.
But when I'm fully done: that was it for the next four years :-)

The usual issues ;)

you might want to update your PRs to the current state on develop
I think it cannot be done with a simple rebase now, probably needs more work, hence a bit more time than what is usual with me.
Yes probably not. I would really urge you to make your PRs smaller. This should also help with that.

List, please, so I can concentrate on what is wanted.

My focus until now was (or is) working on consolidation of the code base (Testing, API, ...), . So I am mostly interested in additions which improve/generalize what we have now. Also throwing away (or deprecating) things which good obsolete etc. But this is my personal preference, you have your own goals, I guess. So there is nothing barring us from adding more useful stuff.

Since you are adding also more special cases to mp_mul, I hope we can get #262 or something like it. I hope we can add stuff without degrading the quality of what we have now. Or even better - if you discover some old warts while working on new stuff, make a PR improving on those!

I had some questions regarding the naming and scope of some the constants and types I need for the sieve. Did you make a decision or do you want me to try my own teeth at it?

I think we have pretty good guidelines now? Generally put everything into tommath_private if possible.

BTW: why did you change the title? Is it a hint for me re. cutting my stuff into logical parts? ;-)

:)

minad · 2019-05-21T05:21:02Z

@czurnieden Can we close this one in favor of the split-up PRs you are proposing now? For example #265

czurnieden · 2019-05-21T12:02:39Z

Can we close this one in favor of the split-up PRs […]

Yes.

minad · 2019-05-21T12:10:57Z

You can also close ;)

addition of T-C-[45](mul|sqr), addition of statistics to tuning

dad2509

czurnieden force-pushed the faster_multiplication branch 2 times, most recently from 05bdfa4 to c962e52 Compare April 27, 2019 23:39

addition of FFT multiplication/squaring

07d24da

czurnieden force-pushed the faster_multiplication branch 2 times, most recently from b12dedf to 6e2f86d Compare May 1, 2019 13:12

Added J. Arndt's permission

c23ead9

czurnieden force-pushed the faster_multiplication branch from 6e2f86d to c23ead9 Compare May 1, 2019 13:22

moved tests for Karatsuba, T-C and FFT multiplications to demo/test.c

43cba1f

czurnieden mentioned this pull request May 3, 2019

Tunable cutoff parameters #231

Merged

added autotune

c0c44b9

minad changed the title ~~Faster multiplication~~ FFT multiplication May 15, 2019

minad changed the title ~~FFT multiplication~~ Toom/FFT multiplication May 15, 2019

czurnieden mentioned this pull request May 16, 2019

faster Toom-Cook 3 algorithms #265

Merged

minad added the work in progress label May 21, 2019

minad closed this May 21, 2019

sjaeckel removed the work in progress label May 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toom/FFT multiplication #227

Toom/FFT multiplication #227

czurnieden commented Apr 27, 2019

czurnieden commented Apr 27, 2019

sjaeckel commented Apr 27, 2019

minad commented Apr 29, 2019 •

edited

Loading

czurnieden commented Apr 29, 2019

czurnieden commented May 1, 2019 •

edited

Loading

czurnieden commented May 2, 2019

minad commented May 2, 2019 •

edited

Loading

sjaeckel commented May 6, 2019

minad commented May 15, 2019

czurnieden commented May 16, 2019

minad commented May 16, 2019

minad commented May 21, 2019 •

edited

Loading

czurnieden commented May 21, 2019

minad commented May 21, 2019

Toom/FFT multiplication #227

Toom/FFT multiplication #227

Conversation

czurnieden commented Apr 27, 2019

czurnieden commented Apr 27, 2019

sjaeckel commented Apr 27, 2019

minad commented Apr 29, 2019 • edited Loading

czurnieden commented Apr 29, 2019

czurnieden commented May 1, 2019 • edited Loading

czurnieden commented May 2, 2019

minad commented May 2, 2019 • edited Loading

sjaeckel commented May 6, 2019

minad commented May 15, 2019

czurnieden commented May 16, 2019

minad commented May 16, 2019

minad commented May 21, 2019 • edited Loading

czurnieden commented May 21, 2019

minad commented May 21, 2019

minad commented Apr 29, 2019 •

edited

Loading

czurnieden commented May 1, 2019 •

edited

Loading

minad commented May 2, 2019 •

edited

Loading

minad commented May 21, 2019 •

edited

Loading