-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Protect capabilities from appearing in logs #559
Comments
The Eliot-native way to prevent caps from ever appearing in the Eliot log would be to define an Eliot Field type for a capability and then for all capabilities being logged. This Field can define its own serialization logic which produces arbitrary values for the value supplied at the log site. I'd carefully consider whether we never want the real capabilities to appear in logs or not. This could make debugging certain failures much more difficult - especially if different capabilities are indistinguishable from each other in the logs. GridSync's "sanitization" logic replaces distinct sensitive values with distinct sanitized values - eg something like "cap1" becoming "sanitized1", "cap2" becoming "sanitized2", etc. This at least allows some interpretation of the information that depends on the value/identity of these things. GridSync also does this after the fact, though, which means the original values are potentially available. This has advantages and disadvantages of course. The advantage is that the original values may be essential to understanding the meaning of the logs or digging for further information. It is perfectly reasonable for someone to see these values for their own data (esp. if the person is a developer). The disadvantage, of course, is the one given in the ticket description - anyone else who obtains the capabilities can read (and perhaps write) the associated data. I suggest that the most we should hamper the system is to sanitize secrets by default (in an identity-preserving way - like GridSync does) but provide a switch which someone who wants more information can easily toggle to deactivate the sanitization logic. GridSync already has such a switch exposed to users in the "export debug info" part of the UI. One idea that's attractive to me, then, is that this checkbox also controls how GridSync asks magic-folder (and tahoe-lafs!) for its Eliot logs. In turn, the APIs for getting these logs have two options - sanitized or not sanitized. Then they can produce the kind of logs that a user has indicated to GridSync (or another UI) that they want. I don't know if this is feasible, though, because I suppose many logs are collected before the GridSync "export debug info" UI is activated. And doing otherwise would require more log buffering inside magic-folder/tahoe-lafs... So if that option is not feasible, then maybe a |
FWIW, I've personally never found it useful to use the capabilities from the logs. The capabilities I have found useful for debugging are the "collective" or "personal" DMDs; from there, the Tahoe WebUI can be used to traverse everything else. (These can be obtained with Of course, there could be some use to knowing the individual capabilities ... but maybe we could enable such a "not-sanitized" feature then? |
I think this doesn't account for two things. First, the Tahoe WebUI is fine for a developer doing local investigation using full client state but when a non-developer user of the software needs help it is not reasonable to expect them to grant this access to a developer to poke around in their state for them. Second, if there is a problem involving some other piece of data then without knowing the capability string for that data there is no way to know what you're looking for, even if you have access to the WebUI for interactive inspection. I think seeing only If the essential argument against including any information about capabilities in the logs is that you haven't found them useful before, I think I've given reasonable arguments to the contrary - here and in my earlier comment above. In addition to those, the fact that Tahoe itself will at least include the first few bytes of a capability string in its logs and that GridSync has chosen to preserve the identity (not the value) of each capability it includes even in the sanitized logs also seem to suggest that others may have found the information useful at various times. I suggest we have magic-folder mirror one of these approaches. |
No, the argument is that leaking them is catastrophic for user privacy. Anyone seeing them "for debugging" should only get access with the explicit help of the person receiving the debugging help (and like "use this command with something obvious like Tahoe-LAFS logs WebUI requests with "[CENSORED]" for all capability-strings -- where do you see partial ones? |
Capabilities are secret and should never be revealed in logs. Users cannot be expected to know that logs contain secret information that could give someone access to all their files (e.g. if they post them on a help forum or send logs to a provider).
One way to achieve this could be to create an object to wrap capabilities instead of using a byte-string. The default
__str__
for such an object would output[REDACTED]
or similar so that logs don't accidentally show capabilities.Much of the Eliot logs, for example, are littered with capabilities. Currently it is up to "outside software" to ensure these are censored properly, which is obviously not ideal.
The text was updated successfully, but these errors were encountered: