Skip to content

Assurance Level #37

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
mkannwischer opened this issue May 15, 2024 · 20 comments
Closed

Assurance Level #37

mkannwischer opened this issue May 15, 2024 · 20 comments
Labels
documentation Improvements or additions to documentation

Comments

@mkannwischer
Copy link
Contributor

According to the PQCP charter, the goal for projects within PQCP is to
build high-assurance production-ready software implementations of forthcoming post-quantum cryptography standards, starting with the ML-KEM algorithm

During the first PQCP TSC meeting it was discussed that we should be more explicit on what we mean by high-assurance production-ready. This differs between different sub-projects of PQCP and it would be good to come up with some assurance levels to unify between different project.
For the next TSC meeting (May 23), each sub-project should document the corresponding interpretation of high-assurance production-ready and document the intended level of assurance.

Let's discuss here.

@hanno-becker
Copy link
Contributor

Ultimately, I would like to see at least:

  • C code is verified to not exhibit undefined behaviour (assuming to-be-documented preconditions)
  • ASM code is verified for functional correctness
  • Some argument is made why all code is constant-time

@hanno-becker
Copy link
Contributor

@mkannwischer @cothan Thoughts?

@hanno-becker hanno-becker added the documentation Improvements or additions to documentation label May 21, 2024
@mkannwischer
Copy link
Contributor Author

What do you mean by "ASM code is verified for functional correctness"? You mean by passing testvectors?
I think we need to distinguish between the ultimate goal and the initial goal for a first release.
Initially, I'd aim for no formal verification, i.e., likely the lowest assurance level of the PQCP
This would aim to

  • keygen/encaps/decaps tested against official testvectors and extended testvectors (supplied through a separate PQCP project)
  • Each function implemented in assembly is tested against testvectors
  • Make use of static and dynamic analysis to make sure there is no UB/out-of/bounds memory access and such
  • Automated tests for checking constant-time behavior (e.g., ct-grind or similar) - this may include scanning for instructions that have data-dependent timing
  • External code audits

All of the tests should be integrated into CI.
For the code audits there is going to be a process to go through in the PQCA. There is some budget for this in the PQCA and we need to talk to the TAC to get this started.

@hanno-becker
Copy link
Contributor

What do you mean by "ASM code is verified for functional correctness"? You mean by passing testvectors?

No, I meant formal verification against a specification on the basis of some architecture model.

I think we need to distinguish between the ultimate goal and the initial goal for a first release.

Agreed. I described above what I would like to see as the mid-term assurance level for this repository, but not necessarily as the baseline upon first release. We should clarify at the next TSC meeting what we're talking about.

@cothan
Copy link
Contributor

cothan commented May 21, 2024

I don't know much about formal verification, at our first release we can use some model checker to verify the code.
I'm learning how to use CBMC, but I'm not sure if it can check SIMD instructions. I will try soon.

@mkannwischer
Copy link
Contributor Author

CBMC would only be useful for the C sources, right?
I think there is still plenty of functions that are going to stay in C as they are not performance critical for which this is going to be useful. Those are likely the same functions that can be shared with other implementations within the PQCP.
In case we are considering to use CBMC, I think we should discuss this in the TSC as well to make sure everyone is on-board with that.

@hanno-becker
Copy link
Contributor

hanno-becker commented May 21, 2024

CBMC will be a useful tool for verifying the absence of certain classes of undefined behaviour in the C sources (only). One can also specify custom assertions and have CBMC attempt to prove those, but I am unsure if this would we practical for the functional specifications encountered in MLKEM, esp. modular arithmetic. See e.g. how in LibMLKEM the correctness of modular multiplication has to be deferred to interactive provers

@hanno-becker
Copy link
Contributor

In case we are considering to use CBMC, I think we should discuss this in the TSC as well to make sure everyone is on-board with that.

I don't object to bringing this up, but it's not urgent, and ultimately our choice, isn't it?

@cothan
Copy link
Contributor

cothan commented May 21, 2024

Honestly, I have no idea what can and what cannot work in formal verification for ML-KEM. So my opinion about high assurance tool set is naive. I can learn from your PRs.
Definitely, it CBMC isn't our ultimately choice, even the pure C implementation has to deal with modular arithmetic correctness so it may not be the only tool we use. But again, I have no idea and no background in formal. So @hanno-becker and @mkannwischer please take the lead.

@hanno-becker
Copy link
Contributor

hanno-becker commented May 22, 2024

We don't have to sort the how at this point, only agree on what we want to achieve.

In the interest of progress, I think @mkannwischer points above make sense for an initial pre-release. At the same time, I don't think we qualify as "high assurance" and "production-ready" if we don't handle functional correctness of at least the assembly.

@rod-chapman
Copy link
Contributor

I concur with Hanno. To expand a little, reasonable targets for formal (static) verification would be:

  1. Absence of undefined behaviour.
  2. Absence of all other memory- and type-safety defects, including arithmetic overflows.
  3. Absence of dependence on unspecified behaviour (e.g. no dependence on expression evaluation order).

These are basic targets that must come first before other, more advanced proofs could be attempted.

(PS.. allow me to introduce myself: I am Rod, colleague of Hanno at AWS. I am responsible for the LibMLKEM SPARK/Ada implementation and its proof.)

@mkannwischer
Copy link
Contributor Author

Thanks @rod-chapman!

Would you see these basic targets as required for any kind of deployment, or do you see value in having a fast implementations first and adding these (basic) proofs later?

@rod-chapman
Copy link
Contributor

Basic proofs of all the C code would be a good starting point, and should come first. Additional proofs of optimized assembly code can come later as and when needed.

(There are some additional full correctness proofs in my SPARK code, for example ... the correctness of "*" (mod Q) on Zq for example.)

@rod-chapman
Copy link
Contributor

One concern I have is that there appears to be repetition of C code across all our proposed repositories? Why? I see no need at all to have the same C code repeated and/or copied across several repo's. The non-time-critical stuff should identical for all platforms anyway.

@hanno-becker
Copy link
Contributor

hanno-becker commented May 23, 2024

@rod-chapman I agree that a shared C code base is preferred, but it is TBD if it can be achieved. For example, the AArch64 repository will require changes to the C code when introducing the batched Keccak API, which the maintainers of mlkem-c-generic may or may not want to upstream.

@rod-chapman
Copy link
Contributor

"Batched" or "not batched" is an implementation detail. Surely, the API should be the same regardless...

@hanno-becker
Copy link
Contributor

hanno-becker commented May 23, 2024

I thought you're talking about the C code, which is part of the implementation.

@mkannwischer
Copy link
Contributor Author

I agree that we want to share as much as possible between the C projects (embedded, generic, aarch64 for now). Maybe we even want to combine them into a single project at some point.

The reason for starting off with separate code bases was the different goals of the projects: In particular, the generic project wants to stay in sync with the Kyber teams reference implementation and they want to make as few changes as possible. That's why we started off our own copy and postponed the discussion of code sharing to a later point. (Since nothing has been merged into the generic project so far, I believe this was a good approach).

I'm happy to kick-off the discussion of code sharing between embedded and aarch64 projects now. Let's do that in a separate issue: #42

@hanno-becker
Copy link
Contributor

Related: #63

@mkannwischer
Copy link
Contributor Author

We've settled what we want to achieve in the long term:

  • All C code is verified to be memory and type safe using CBMC. This is already done for all C code in mlkem directory leaving only the FIPS202 code.
  • All assembly formally verified. Initial experiments are underway to verify AArch64 using HOL-Light.

This is reflected in the README and we'll update there if the status changes. Closing this now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants