Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

base64 decoding should not be "strict" #96

Closed
andir opened this issue Dec 7, 2021 · 5 comments
Closed

base64 decoding should not be "strict" #96

andir opened this issue Dec 7, 2021 · 5 comments

Comments

@andir
Copy link
Contributor

andir commented Dec 7, 2021

I've encountered yet another mail where the base64 decoding using mailparse (with and without my recent change in #95) fails.

The reduced sample looks like this:

PC9odG1sPn==

If you decode that string with a non-strict (as in not strictly requiring "normalized" base64) it will result in:

</html>

If you reencode that string with Python/Ruby/base64 on the CLI you'll get

PC9odG1sPg==

Which then decodes properly with the mailparse crate.

The way I am currently working around this (with the data_encoding crate) is by definining my own BASE64 decoder:

lazy_static! {
    static ref BASE64_DECODER : data_encoding::Encoding = {
        let mut spec = data_encoding::BASE64_MIME.specification();
        spec.check_trailing_bits = false; // <- the important bit
        spec.encoding().expect("The encoding must be valid")
    };
}

I've come to believe that parsing mail with "strict" base64 parsers is just not a good idea. It might work in an ideal world but sadly I've received tons of mails with edge cases over the years :(

My ask for this issue is that we should probably switch to a non-strict decoder for mails. This is perhaps something that is better suited as part of the data_encoding library instead?

@staktrace
Copy link
Owner

It seems that since data_encoding offers a MIME-compatible base64 decoder they might want to make this change in that decoder's spec. If they're not interested in making that change we can do it in mailparse.

@staktrace
Copy link
Owner

@andir Did you file a PR against the data_encoding crate for this? If so please link it here; otherwise I'm going to close this issue due to lack of movement.

@wathiede
Copy link
Contributor

wathiede commented Apr 8, 2024

I recently came across this issue while using mailparse with my personal email collection. Per #96 (comment) I filed ia0/data-encoding#102

ia0 responded with 3 possible options, one of which is adding code to data-encoding, and none of which would be zero changes for mailparse.

I'd like to resolve this issue, but given none of the options are zero change for mailparse, I'd like your advice which option is preferable. I'm happy to submit a pull request to one or both of the repos to help fix this, but seek your guidance for the best path forward.

@staktrace
Copy link
Owner

Hi @wathiede , thanks for following up on this! My personal preference would be to add the PERMISSIVE codec in the data-encoding crate and then update mailparse to use that. I'd be happy to take a PR to do that.

@staktrace
Copy link
Owner

This was resolved with #126

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants