Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to make rotationally invariant solves fast? #126

Open
dlfivefifty opened this issue May 5, 2022 · 5 comments
Open

How to make rotationally invariant solves fast? #126

dlfivefifty opened this issue May 5, 2022 · 5 comments

Comments

@dlfivefifty
Copy link
Member

Rotationally invariant operators are typically represented by ModalInterlace which combines matrices acting only on Fourier modes into a diagonal-block-banded matrix (that is, block banded matrix with diagonal blocks, i.e. subblockbandwidths are (0,0)).

At the moment we aren't taking advantage of this structure. There are two approaches:

  1. Do it at the level of factorizing a BandedBlockBandedMatrix by checking whether the subblockbandwidths are (0,0). The easiest way would be to copy to a banded matrix, do a QR, and then copy the data back.
  2. Introduce a DiagonalBlockBandedMatrix though this doesn't seem to have any obvious benefits over (1).
  3. Do it directly on a ModalInterlace. This has the benefit of being able to call qr on each of the banded operators separately (and potentially taking advantage of @threads). Though probably (1) could also be parallelised.

@TSGut @ioannisPApapadopoulos any thoughts? Do you think this is important right now? It won't help with the variable coefficients problem but could be important for fractional DEs and time-stepping.

@ioannisPApapadopoulos
Copy link
Member

To understand properly, this discussion is entirely about how to do the solve of a diagonally banded block banded matrix? What's the current way things are solved? QR on the whole matrix?

@dlfivefifty
Copy link
Member Author

For general banded-block-banded matrix it just reduces to block-banded QR, that is, it treats the blocks as dense. This is because there is invariably fill-in. This is an O(N^2) = O(n^4) algorithm (where N is total degrees of freedom, n is the number of blocks). (In theory it can be reduced to O(N^(3/2)) using hierarchical methods but this may be unstable.)

At the moment diagonal-block-banded is handled the same way. Though using any of the proposals above it reduces to O(N). Also it should be easy to multithread (though we'll see whether my banded QR is actually multithread safe....)

@ioannisPApapadopoulos
Copy link
Member

ok! And in option (1), each individual block would be copied to a banded matrix and then a QR factorization is applied? So there would be n QR factorizations in both (1) and (3), where n is the number of blocks?

I guess te advantage of (1) is that it can be applied to any problem that induces a diagonal banded block banded matrix. Rather than constraining ourselves to the ModalInterlace case.

@TSGut
Copy link
Member

TSGut commented May 5, 2022

I suppose to answer this I'd need to have a better understanding of howModalInterlace is meant to be used - the tests are a bit sparse, any more natural examples?

Lacking this, I'd also lean towards (1).

@dlfivefifty
Copy link
Member Author

I guess te advantage of (1) is that it can be applied to any problem that induces a diagonal banded block banded matrix. Rather than constraining ourselves to the ModalInterlace case.

Exactly. The negative is you take banded matrices with simple rational formulae and nice memory sequential properties, rearrange as a BandedBlockBandedMatrix, and then break apart by copying. Whether this extra rearranging has a significant impact on performance is not clear to me.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants