-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: go/types: add Hasher{,IgnoreTags} types #69420
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
|
This type is often used as a set, as this convenience method prints
Those are principled suggestions, but in practice the value has always been either a non-nil pointer (in which case nil => missing) or |
Is there anything Put another way, if there were an appropriately-defined package types
import "container/hashmap"
func MakeHashMap[V any]() hashmap.Map[Type, V] {
return hashmap.Make[Type, V](Hash)
} |
Very good question. Now that you mention it, no, the HashMap is completely generic other than the fact it assumes (K=types.Type, hash=types.Hash, eq=types.Identical). If container/hashmap existed this proposal could reduce to just the Hash function, and perhaps a convenience constructor: func Hash(Type) uint
type HashMap[V any] = hashmap.Map[Type, V]
func NewHashMap[V any]() *HashMap[V] { return hashmap.New(Hash, Identical) } |
In light of @jimmyfrasche's idea, let's restrict this proposal to just the Hash function (the tricky part), with the expectation that a generic hash table (the easy part) will someday follow and that in the meantime it's easy enough for clients to write their own HashMap. |
cc @jba, who is thinking about unordered maps. Our generic unordered and ordered maps should be aligned. |
This proposal has been added to the active column of the proposals project |
My only real question here is whether the Hash function needs to take a seed parameter to prevent flooding. As I mentioned above, I don't think hash flooding is a real concern because if you control its inputs, the type checker is already far more vulnerable than a hash table to DoS attacks. But what does concern me is ergonomics: the Hash function should interoperate with the proposed standard hash-based map, which may demand that hash functions accept a seed. Perhaps @ianlancetaylor and @prattmic can opine. |
I'm slightly inclined toward providing a seed in the style of the We can always use a per-process seed to complicate an attack, though that's certainly not as good as a per-table seed. If we do that instead of an explicit seed, I think we would have to document that the hash function is not consistent from process to process. One advantage of requiring an explicit seed is that it makes it clear in the API where the boundary around comparable hashes lies. |
The proposed API seems to imply that the hash is stable across process invocations. Which implies that changing the hash function (or the hashing process) in the future would be a breaking change. Do we want to get locked into that? The only advantage of a stable function that I can think of is that you could use the hash in process-external communication (like an on-disk cache or if you do something like running analyzers over the network). If we want that, we probably want the seed (if any) to not be opaque either. If we don't want to commit, another advantage of a seed argument (IMO) is that it makes this instability more obvious in the API. |
Good question. No, we certainly do not; but whether or not we have a seed parameter I think we can reserve the right to change the hash function arbitrarily with judicious documentation. |
This proposal is on hold until we resolve the fundamental question: |
As @adonovan mentioned above, we certainly want to be able to change the hash function, so we can't guarantee it will be stable across process invocations. Given that, I would argue we need to make it aggressively unstable, much like how we randomize map iteration order, so that people don't start to depend on any sort of stability. Certainly the minimum bar for that is a per-process invocation seed. |
Placed on hold. (pending outcome of #70471 --adonovan) |
Another design question: The implementation of typeutil.Hasher uses a map to memoize every type it has ever seen, ostensibly for performance (though in fact it is quite slow), but it occurs to me that this naturally causes it to return consistent hashes for |
[Never mind: the design of maphash.Comparable (discussion) allowing pointers tacitly admits that heap-allocated variables are non-moving.] |
FWIW, Joe Tsai asked a related question in #54670 (comment), with a short response from Austin in #54670 (comment) and then Austin's extended response in #54670 (comment). The conclusion there I think was that the arguments to Comparable escape (unless it can be proven it doesn't matter, like for strings I think). Those comments are probably worth a quick read if you haven't already (though TBH, I've lost the thread of this propsoal a bit, so maybe those are only tangentially related to your most recent question.) |
@mvdan points out that the types.Hash function should ignore struct tags (and be documented to do so) that the hash function works equally well with types.Identical and types.IdenticalIgnoreTags. (The lossiness of ignoring tags is very minor.) |
I would be fine with a I also think the hash algorithm should aim to not hash any pointers such as Another extremely similar use case is gopls's fingerprinting of types, which aims to stringify types such that the strings are equal iff the types are identical. It seems to me like that could also use a form of hashing. @adonovan also correctly points out that hashing some named types can be tricky, such as:
A naive hashing algorithm might hash both of these named types as |
@mvdan If you need the hash to be consistent over process executions, you might want to chime in on #70471 as well. Part of that issue is about using [edit] also, whoops, just realized that even here, we have explicitly discussed the question of hashes being consistent across process executions) [/edit] |
It is not necessary to have two variants of the Hash function; one consistent with It is also not necessary for this proposal to promise anything more than consistency with Promising stability of hashes over time is unusual, unnecessary, and quite limiting. Gopls' fingerprint algorithm is a stable mapping from types to strings; I don't think it has any bearing on this proposal. |
Change https://go.dev/cl/657297 mentions this issue: |
This proposal has been added to the active column of the proposals project |
This is the current proposal: package types // import "go/types"
// Hasher defines a hash function and equivalence relation for Types
// that is consistent with [Identical]. Hashers are stateless.
type Hasher struct{}
func (Hasher) Hash(h *maphash.Hash, t Type)
func (Hasher) Equal(x, y Type) bool
// HasherIgnoreTags defines a hash function and equivalence relation for Types
// that is consistent with [IdenticalIgnoreTags]. HasherIgnoreTags is stateless.
type HasherIgnoreTags struct{}
func (HasherIgnoreTags) Hash(h *maphash.Hash, t Type)
func (HasherIgnoreTags) Equal(x, y Type) bool |
Based on the discussion above, this proposal seems like a likely accept. The proposal details are in #69420 (comment) |
[Edit: this proposal is now just for the hash function. The HashMap is another proposal.]
We propose to add the HashMap data type (a generic evolution of golang.org/x/tools/go/types/typeutil.Map) to the standard
go/types
package, with the following API:Rescinded part of the proposal:
Some notes:
The text was updated successfully, but these errors were encountered: