proposal: go/types: add Hasher{,IgnoreTags} types #69420

adonovan · 2024-09-12T13:55:27Z

[Edit: this proposal is now just for the hash function. The HashMap is another proposal.]

We propose to add the HashMap data type (a generic evolution of golang.org/x/tools/go/types/typeutil.Map) to the standard go/types package, with the following API:

// Updated Apr 9 2025
package types // import "go/types"

// Hasher defines a hash function and equivalence relation for Types
// that is consistent with [Identical]. Hashers are stateless.
type Hasher struct{}

func (Hasher) Hash(h *maphash.Hash, t Type)
func (Hasher) Equal(x, y Type) bool

// HasherIgnoreTags defines a hash function and equivalence relation for Types
// that is consistent with [IdenticalIgnoreTags]. HasherIgnoreTags is stateless.
type HasherIgnoreTags struct{}

func (HasherIgnoreTags) Hash(h *maphash.Hash, t Type)
func (HasherIgnoreTags) Equal(x, y Type) bool

Rescinded part of the proposal:

// HashMap[V] is a mapping from Type to values of type V.
// Map keys (types) are considered equivalent if they are Identical.
// As with map[K]V, a nil *HashMap is a valid empty map.
type HashMap[V any] struct { ... }

// All returns an iterator over the key/value entries of the map in undefined order.
func (m *HashMap[V]) All() iter.Seq2[Type, V]

// At returns the map entry for the given key. It returns zero if the entry is not present.
func (m *HashMap[V]) At(key Type) V

// Delete removes th//e entry with the given key, if any. It returns true if the entry was found.
func (m *HashMap[V]) Delete(key Type) bool

// Keys returns an iterator over the map keys in undefined order.
func (m *HashMap[V]) Keys() iter.Seq[Type]

// KeysString returns a string representation of the map's key set in unspecified order.
func (m *HashMap[V]) KeysString() string

// Len returns the number of map entries.
func (m *HashMap[V]) Len() int

// Set updates the map entry for key to value, and returns the previous entry, if any.
func (m *HashMap[V]) Set(key Type, value V) (prev V)

// String returns a string representation of the map's entries in unspecified order.
// Values are printed as if by fmt.Sprint.
func (m *HashMap[V]) String() string

Some notes:

This proposal supersedes proposal: x/tools/go/types/typeutil: add TypeMap[V] #67161, to x/tools/go/types/typeutil.Map.
It is generic, whereas typeutil.Map is not.
It uses idiomatic Go 1.23 iterators.
The typeutil.Hasher type and SetHasher method are not part of this proposal. They had no performance advantage, and the statefulness turned reads into writes. The hash recursion bottoms out at named types, so most types are shallow.
There is still no way to distinguish "m[k] is zero" from "missing"; this has never been a problem in practice.
The Hash function does not accept a seed and thus HashMap is vulnerable to flooding. But the type checker is vulnerable to various other performance problems if an attacker controls its input.

The text was updated successfully, but these errors were encountered:

gabyhelp · 2024-09-12T13:55:42Z

Related Issues and Documentation

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

ianlancetaylor · 2024-09-12T19:36:00Z

What's the point of KeysString?
If this is going to be an exported type, I think we should support some way to distinguish the zero value from the non-existent value. We can keep At for convenience, and add Contains. We can change Set to return an additional bool.

adonovan · 2024-09-12T19:58:23Z

What's the point of KeysString?

This type is often used as a set, as this convenience method prints {T1, ..., Tn} without all the informationlessT: true values.

If this is going to be an exported type, I think we should support some way to distinguish the zero value from the non-existent value. We can keep At for convenience, and add Contains. We can change Set to return an additional bool.

Those are principled suggestions, but in practice the value has always been either a non-nil pointer (in which case nil => missing) or true (in which case false => missing), so it has never been a problem. But uses of this type are few enough between that the convenience of At/Set either way is a minor consideration.

jimmyfrasche · 2024-09-12T20:22:08Z

Is there anything Type-specific about HashMap other than using Hash?

Put another way, if there were an appropriately-defined "container/hashmap", could you just have something like

package types

import "container/hashmap"

func MakeHashMap[V any]() hashmap.Map[Type, V] {
  return hashmap.Make[Type, V](Hash)
}

adonovan · 2024-09-12T20:39:16Z

Is there anything Type-specific about HashMap other than using Hash?

Very good question. Now that you mention it, no, the HashMap is completely generic other than the fact it assumes (K=types.Type, hash=types.Hash, eq=types.Identical). If container/hashmap existed this proposal could reduce to just the Hash function, and perhaps a convenience constructor:

func Hash(Type) uint

type HashMap[V any] = hashmap.Map[Type, V] 

func NewHashMap[V any]() *HashMap[V] { return hashmap.New(Hash, Identical) }

adonovan · 2024-09-16T16:26:29Z

In light of @jimmyfrasche's idea, let's restrict this proposal to just the Hash function (the tricky part), with the expectation that a generic hash table (the easy part) will someday follow and that in the meantime it's easy enough for clients to write their own HashMap.

adonovan · 2024-09-17T20:01:46Z

cc @jba, who is thinking about unordered maps. Our generic unordered and ordered maps should be aligned.

aclements · 2024-10-23T19:41:37Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.

adonovan · 2024-10-23T20:08:31Z

My only real question here is whether the Hash function needs to take a seed parameter to prevent flooding. As I mentioned above, I don't think hash flooding is a real concern because if you control its inputs, the type checker is already far more vulnerable than a hash table to DoS attacks. But what does concern me is ergonomics: the Hash function should interoperate with the proposed standard hash-based map, which may demand that hash functions accept a seed.

Perhaps @ianlancetaylor and @prattmic can opine.

aclements · 2024-11-06T19:09:01Z

I'm slightly inclined toward providing a seed in the style of the maphash package. That is, the seed is opaque and the only thing a caller can do is generate a new one. Partly this would be for consistency, partly it would be to preempt any question of flooding attacks on this hash.

We can always use a per-process seed to complicate an attack, though that's certainly not as good as a per-table seed. If we do that instead of an explicit seed, I think we would have to document that the hash function is not consistent from process to process. One advantage of requiring an explicit seed is that it makes it clear in the API where the boundary around comparable hashes lies.

Merovius · 2024-11-13T20:56:33Z

The proposed API seems to imply that the hash is stable across process invocations. Which implies that changing the hash function (or the hashing process) in the future would be a breaking change. Do we want to get locked into that?

The only advantage of a stable function that I can think of is that you could use the hash in process-external communication (like an on-disk cache or if you do something like running analyzers over the network). If we want that, we probably want the seed (if any) to not be opaque either.

If we don't want to commit, another advantage of a seed argument (IMO) is that it makes this instability more obvious in the API.

adonovan · 2024-11-13T23:33:12Z

The proposed API seems to imply that the hash is stable across process invocations. Which implies that changing the hash function (or the hashing process) in the future would be a breaking change. Do we want to get locked into that?

Good question. No, we certainly do not; but whether or not we have a seed parameter I think we can reserve the right to change the hash function arbitrarily with judicious documentation.

adonovan · 2024-11-20T20:32:38Z

This proposal is on hold until we resolve the fundamental question:

hash: standardize the hash function #70471

aclements · 2024-11-21T03:28:11Z

The proposed API seems to imply that the hash is stable across process invocations.

As @adonovan mentioned above, we certainly want to be able to change the hash function, so we can't guarantee it will be stable across process invocations.

Given that, I would argue we need to make it aggressively unstable, much like how we randomize map iteration order, so that people don't start to depend on any sort of stability. Certainly the minimum bar for that is a per-process invocation seed.

aclements · 2024-11-21T04:06:27Z

Placed on hold. (pending outcome of #70471 --adonovan)

adonovan · 2024-12-16T19:11:04Z

Another design question: The implementation of typeutil.Hasher uses a map to memoize every type it has ever seen, ostensibly for performance (though in fact it is quite slow), but it occurs to me that this naturally causes it to return consistent hashes for *types.Named values, which use pointer identity, without relying on the ability to hash a pointer. A simpler and faster implementation simply hashes the numeric value of the pointer, but this causes it to assume that the GC is non-moving. Are we ok with that? (Do we presume that a moving GC would necessarily provide a means like Java's identityHashCode to retrieve the hash derived from an object's original address?)

adonovan · 2024-12-16T19:18:26Z

Another design question: The implementation of typeutil.Hasher uses a map to memoize every type it has ever seen, ostensibly for performance (though in fact it is quite slow), but it occurs to me that this naturally causes it to return consistent hashes for *types.Named values, which use pointer identity, without relying on the ability to hash a pointer. A simpler and faster implementation simply hashes the numeric value of the pointer, but this causes it to assume that the GC is non-moving. Are we ok with that? (Do we presume that a moving GC would necessarily provide a means like Java's identityHashCode to retrieve the hash derived from an object's original address?)

[Never mind: the design of maphash.Comparable (discussion) allowing pointers tacitly admits that heap-allocated variables are non-moving.]

thepudds · 2024-12-16T19:20:38Z

FWIW, Joe Tsai asked a related question in #54670 (comment), with a short response from Austin in #54670 (comment) and then Austin's extended response in #54670 (comment). The conclusion there I think was that the arguments to Comparable escape (unless it can be proven it doesn't matter, like for strings I think).

Those comments are probably worth a quick read if you haven't already (though TBH, I've lost the thread of this propsoal a bit, so maybe those are only tangentially related to your most recent question.)

adonovan · 2025-01-18T02:31:27Z

@mvdan points out that the types.Hash function should ignore struct tags (and be documented to do so) that the hash function works equally well with types.Identical and types.IdenticalIgnoreTags. (The lossiness of ignoring tags is very minor.)

mvdan · 2025-01-21T23:39:13Z

I would be fine with a types.HashIgnoreTags function to mirror types.IdenticalIgnoreTags, for what it's worth. Or, if we think more knobs may be useful, perhaps types.Hash could take an options struct, although at this point in time it may be premature.

I also think the hash algorithm should aim to not hash any pointers such as *types.Named. The reason being that it's relatively common for Go tools built atop go/types to want to hash types and store/reuse that hash across processes. For example, https://github.com/burrowers/garble is a tool written to integrate with go build -toolexec, and by the nature of that, it gets executed once per package being compiled. Having consistent hashing of types across processes is very valuable for determinism and to allow using those hashes for on-disk caching.

Another extremely similar use case is gopls's fingerprinting of types, which aims to stringify types such that the strings are equal iff the types are identical. It seems to me like that could also use a form of hashing.

@adonovan also correctly points out that hashing some named types can be tricky, such as:

package p
func f() {
  { type A int32 }
  { type A int64 }
}

A naive hashing algorithm might hash both of these named types as p.A, when in fact they are not the same named type. One possible solution might be to as#teger elements to local types, such as p.f.1.A and p.f.2.A. I'd personally also be fine if the hash algorithm wasn't perfect in some rare edge cases, and that caused collisions - as long as we documented that.

Merovius · 2025-01-22T01:25:54Z

@mvdan If you need the hash to be consistent over process executions, you might want to chime in on #70471 as well. Part of that issue is about using maphash.Hash or maphash.Seed more or less mandatorily in the general hash interface, which would make such consistent hashes impossible. And given that this issue is on hold pending a decision on #70471, I'm not sure if that would make your use case impossible?

[edit] also, whoops, just realized that even here, we have explicitly discussed the question of hashes being consistent across process executions) [/edit]

adonovan · 2025-01-22T03:28:32Z

It is not necessary to have two variants of the Hash function; one consistent with IdenticalIgnoreTags is sufficient, and one less way to misuse the API.

It is also not necessary for this proposal to promise anything more than consistency with Identical{,IgnoreTags}. In particular, hashing pointers is fair game if Identical makes pointer comparisons, as it currently does. There is an open proposal (#57497) to reconsider that, though I'm not sure it will ever see the light of day. If it does, the behavior of the Hash function can be changed then without breaking anything.

Promising stability of hashes over time is unusual, unnecessary, and quite limiting.

Gopls' fingerprint algorithm is a stable mapping from types to strings; I don't think it has any bearing on this proposal.

gopherbot · 2025-03-12T22:17:46Z

Change https://go.dev/cl/657297 mentions this issue: go/types: Hasher, a hash function for Types

mvdan · 2025-03-26T22:53:49Z

Fair enough, thanks @adonovan. After skimming #69420, I agree with your conclusion regarding hash stability.

I guess what I would then want is a sibling API that exports gopls's fingerprint algorithm, but I may be the only such user besides gopls, so I'm fine with vendoring/copying for now.

aclements · 2025-04-02T20:44:04Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

aclements · 2025-04-09T16:51:06Z

Now that #70471 is accepted, I've reactivated this proposal.

@adonovan , care to post an updated API that follows #70471?

adonovan · 2025-04-09T17:20:25Z

This is the current proposal:

package types // import "go/types"

// Hasher defines a hash function and equivalence relation for Types
// that is consistent with [Identical]. Hashers are stateless.
type Hasher struct{}

func (Hasher) Hash(h *maphash.Hash, t Type)
func (Hasher) Equal(x, y Type) bool

// HasherIgnoreTags defines a hash function and equivalence relation for Types
// that is consistent with [IdenticalIgnoreTags]. HasherIgnoreTags is stateless.
type HasherIgnoreTags struct{}

func (HasherIgnoreTags) Hash(h *maphash.Hash, t Type)
func (HasherIgnoreTags) Equal(x, y Type) bool

aclements · 2025-04-16T18:45:03Z

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

The proposal details are in #69420 (comment)

gopherbot added the Proposal label Sep 12, 2024

gopherbot added this to the Proposal milestone Sep 12, 2024

adonovan mentioned this issue Sep 12, 2024

proposal: x/tools/go/types/typeutil: add TypeMap[V] #67161

Closed

adonovan added this to Proposals Sep 12, 2024

adonovan moved this to Incoming in Proposals Sep 12, 2024

adonovan changed the title ~~proposal: go/types: add Hash function and HashMap[V] type~~ proposal: go/types: add Hash function Sep 16, 2024

adonovan mentioned this issue Sep 20, 2024

proposal: container/hash: Map, a generic hash table with custom hash function and equivalence relation #69559

Open

aclements moved this from Incoming to Active in Proposals Oct 23, 2024

adonovan mentioned this issue Nov 20, 2024

hash: standardize the hash function #70471

Open

aclements moved this from Active to Hold in Proposals Nov 21, 2024

aclements added the Proposal-Hold label Nov 21, 2024

aclements moved this from Hold to Active in Proposals Apr 2, 2025

aclements removed the Proposal-Hold label Apr 2, 2025

adonovan changed the title ~~proposal: go/types: add Hash function~~ proposal: go/types: add Hasher{,IgnoreTags} types Apr 9, 2025

aclements moved this from Active to Likely Accept in Proposals Apr 16, 2025

aclements added the Proposal-FinalCommentPeriod label Apr 16, 2025

gabyhelp mentioned this issue Apr 20, 2025

proposal: builtin: Standard library adds interface type for creating a hash #73453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: go/types: add Hasher{,IgnoreTags} types #69420

proposal: go/types: add Hasher{,IgnoreTags} types #69420

adonovan commented Sep 12, 2024 •

edited

Loading

gabyhelp commented Sep 12, 2024

ianlancetaylor commented Sep 12, 2024

adonovan commented Sep 12, 2024

jimmyfrasche commented Sep 12, 2024

adonovan commented Sep 12, 2024 •

edited

Loading

adonovan commented Sep 16, 2024

adonovan commented Sep 17, 2024

aclements commented Oct 23, 2024

adonovan commented Oct 23, 2024 •

edited

Loading

aclements commented Nov 6, 2024

Merovius commented Nov 13, 2024

adonovan commented Nov 13, 2024

adonovan commented Nov 20, 2024

aclements commented Nov 21, 2024

aclements commented Nov 21, 2024 •

edited by adonovan

Loading

adonovan commented Dec 16, 2024 •

edited

Loading

adonovan commented Dec 16, 2024

thepudds commented Dec 16, 2024

adonovan commented Jan 18, 2025 •

edited

Loading

mvdan commented Jan 21, 2025

Merovius commented Jan 22, 2025 •

edited

Loading

adonovan commented Jan 22, 2025

gopherbot commented Mar 12, 2025

mvdan commented Mar 26, 2025

aclements commented Apr 2, 2025

aclements commented Apr 9, 2025

adonovan commented Apr 9, 2025 •

edited

Loading

aclements commented Apr 16, 2025

proposal: go/types: add Hasher{,IgnoreTags} types #69420

proposal: go/types: add Hasher{,IgnoreTags} types #69420

Comments

adonovan commented Sep 12, 2024 • edited Loading

gabyhelp commented Sep 12, 2024

ianlancetaylor commented Sep 12, 2024

adonovan commented Sep 12, 2024

jimmyfrasche commented Sep 12, 2024

adonovan commented Sep 12, 2024 • edited Loading

adonovan commented Sep 16, 2024

adonovan commented Sep 17, 2024

aclements commented Oct 23, 2024

adonovan commented Oct 23, 2024 • edited Loading

aclements commented Nov 6, 2024

Merovius commented Nov 13, 2024

adonovan commented Nov 13, 2024

adonovan commented Nov 20, 2024

aclements commented Nov 21, 2024

aclements commented Nov 21, 2024 • edited by adonovan Loading

adonovan commented Dec 16, 2024 • edited Loading

adonovan commented Dec 16, 2024

thepudds commented Dec 16, 2024

adonovan commented Jan 18, 2025 • edited Loading

mvdan commented Jan 21, 2025

Merovius commented Jan 22, 2025 • edited Loading

adonovan commented Jan 22, 2025

gopherbot commented Mar 12, 2025

mvdan commented Mar 26, 2025

aclements commented Apr 2, 2025

aclements commented Apr 9, 2025

adonovan commented Apr 9, 2025 • edited Loading

aclements commented Apr 16, 2025

adonovan commented Sep 12, 2024 •

edited

Loading

adonovan commented Sep 12, 2024 •

edited

Loading

adonovan commented Oct 23, 2024 •

edited

Loading

aclements commented Nov 21, 2024 •

edited by adonovan

Loading

adonovan commented Dec 16, 2024 •

edited

Loading

adonovan commented Jan 18, 2025 •

edited

Loading

Merovius commented Jan 22, 2025 •

edited

Loading

adonovan commented Apr 9, 2025 •

edited

Loading