Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Protect capabilities from appearing in logs #559

Closed
meejah opened this issue Aug 26, 2021 · 4 comments · Fixed by #615
Closed

Protect capabilities from appearing in logs #559

meejah opened this issue Aug 26, 2021 · 4 comments · Fixed by #615
Labels
before-release Must be resolved before a first public release enhancement New feature or request

Comments

@meejah
Copy link
Collaborator

meejah commented Aug 26, 2021

Capabilities are secret and should never be revealed in logs. Users cannot be expected to know that logs contain secret information that could give someone access to all their files (e.g. if they post them on a help forum or send logs to a provider).

One way to achieve this could be to create an object to wrap capabilities instead of using a byte-string. The default __str__ for such an object would output [REDACTED] or similar so that logs don't accidentally show capabilities.

Much of the Eliot logs, for example, are littered with capabilities. Currently it is up to "outside software" to ensure these are censored properly, which is obviously not ideal.

@meejah meejah added enhancement New feature or request before-release Must be resolved before a first public release labels Aug 26, 2021
@exarkun
Copy link
Member

exarkun commented Aug 31, 2021

The Eliot-native way to prevent caps from ever appearing in the Eliot log would be to define an Eliot Field type for a capability and then for all capabilities being logged. This Field can define its own serialization logic which produces arbitrary values for the value supplied at the log site.

I'd carefully consider whether we never want the real capabilities to appear in logs or not. This could make debugging certain failures much more difficult - especially if different capabilities are indistinguishable from each other in the logs.

GridSync's "sanitization" logic replaces distinct sensitive values with distinct sanitized values - eg something like "cap1" becoming "sanitized1", "cap2" becoming "sanitized2", etc. This at least allows some interpretation of the information that depends on the value/identity of these things.

GridSync also does this after the fact, though, which means the original values are potentially available. This has advantages and disadvantages of course. The advantage is that the original values may be essential to understanding the meaning of the logs or digging for further information. It is perfectly reasonable for someone to see these values for their own data (esp. if the person is a developer).

The disadvantage, of course, is the one given in the ticket description - anyone else who obtains the capabilities can read (and perhaps write) the associated data.

I suggest that the most we should hamper the system is to sanitize secrets by default (in an identity-preserving way - like GridSync does) but provide a switch which someone who wants more information can easily toggle to deactivate the sanitization logic.

GridSync already has such a switch exposed to users in the "export debug info" part of the UI. One idea that's attractive to me, then, is that this checkbox also controls how GridSync asks magic-folder (and tahoe-lafs!) for its Eliot logs. In turn, the APIs for getting these logs have two options - sanitized or not sanitized. Then they can produce the kind of logs that a user has indicated to GridSync (or another UI) that they want. I don't know if this is feasible, though, because I suppose many logs are collected before the GridSync "export debug info" UI is activated. And doing otherwise would require more log buffering inside magic-folder/tahoe-lafs...

So if that option is not feasible, then maybe a magic-folder run option that controls the behavior is the next best thing? And GridSync can pass the option when GridSync itself is started with some option (eg --debug) - or maybe there can be some other UI inside GridSync for controlling this behavior.

@meejah
Copy link
Collaborator Author

meejah commented Aug 31, 2021

FWIW, I've personally never found it useful to use the capabilities from the logs.

The capabilities I have found useful for debugging are the "collective" or "personal" DMDs; from there, the Tahoe WebUI can be used to traverse everything else. (These can be obtained with magic-folder-api .. dump-state in plaintext already though)

Of course, there could be some use to knowing the individual capabilities ... but maybe we could enable such a "not-sanitized" feature then?

@exarkun
Copy link
Member

exarkun commented Jan 28, 2022

The capabilities I have found useful for debugging are the "collective" or "personal" DMDs; from there, the Tahoe WebUI can be used to traverse everything else. (These can be obtained with magic-folder-api .. dump-state in plaintext already though)

I think this doesn't account for two things. First, the Tahoe WebUI is fine for a developer doing local investigation using full client state but when a non-developer user of the software needs help it is not reasonable to expect them to grant this access to a developer to poke around in their state for them. Second, if there is a problem involving some other piece of data then without knowing the capability string for that data there is no way to know what you're looking for, even if you have access to the WebUI for interactive inspection.

I think seeing only [REDACTED] for every capability string from magic-folder is going to hamper debugging all issues that are encountered by someone who is not a developer and thus make user support dramatically more challenging.

If the essential argument against including any information about capabilities in the logs is that you haven't found them useful before, I think I've given reasonable arguments to the contrary - here and in my earlier comment above. In addition to those, the fact that Tahoe itself will at least include the first few bytes of a capability string in its logs and that GridSync has chosen to preserve the identity (not the value) of each capability it includes even in the sanitized logs also seem to suggest that others may have found the information useful at various times.

I suggest we have magic-folder mirror one of these approaches.

@meejah
Copy link
Collaborator Author

meejah commented Jan 28, 2022

If the essential argument against including any information about capabilities in the logs is that you haven't found them useful before,

No, the argument is that leaking them is catastrophic for user privacy.

Anyone seeing them "for debugging" should only get access with the explicit help of the person receiving the debugging help (and like "use this command with something obvious like --include-secret-information in it not "send me the logs please").

Tahoe-LAFS logs WebUI requests with "[CENSORED]" for all capability-strings -- where do you see partial ones?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
before-release Must be resolved before a first public release enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants