diff --git a/design/44253-generic-array-sizes.md b/design/44253-generic-array-sizes.md new file mode 100644 index 00000000..9ea4e25b --- /dev/null +++ b/design/44253-generic-array-sizes.md @@ -0,0 +1,258 @@ +# Proposal: Generic parameterization of array sizes + +Author(s): Andrew Werner + +Last updated: March 16th, 2021 + +## Abstract + +With the type parameters generics proposal has been accepted, though not yet +fully specified or implemented, we can begin to talk about extension. [That +proposal][type parameters] lists the following omission: + +> No parameterization on non-type values such as constants. This arises most +obviously for arrays, where it might sometimes be convenient to write type +`Matrix[n int] [n][n]float64`. It might also sometimes be useful to specify +significant values for a container type, such as a default value for elements. + +This proposal seeks to resolve this limitation by (a) specifying when `len` can +be used as a compile-time constant and (b) adding syntax to specify constraints +for all arrays of a given type in type lists. + +## Background + +An important property of the generics proposal is that it enables the creation +of libraries of specialized container data structures. The existence of such +libraries will help developers write more efficient code as these data +structures will be able to allocate fewer object and provide greater access +locality. [This Google blog post][block based data structures] about block-based +C++ data drives home the point. + +The justification is laid out in the omission of the type parameter proposal. +The motivation that I've stumbled upon is in trying to implement a B-Tree +and allowing the client to dictate the degree of the node. + +One initial idea would be to allow the client to provide the actual array +which will be backing the data inside the node as a type parameter. This might +actually be okay in some data structure user cases but in a B-Tree it's bad +because we still would like to instantiate an array for the pointers and that +array needs to have a size that is a function of the data array. + +The proposal here seeks to make it possible for clients to provide default +values for array sizes of generic data structures in a way that is minimally +invasive to the concepts which go already has. The shorthand comment stated +in the Omission of the Type Parameter Proposal waves its hand at what feels +like a number of new and complex concepts for the language. + +## Proposal + +This proposals attempts to side-step questions of how one might provide a +scalar value in a type constraint by not ever providing a scalar directly. +This proposal recognizes that constants can be used to specify array lengths. +It also notes that the value of `len()` can be computed as a compile-time +constant in some cases. Lastly, it observes that type lists could be extended +minimally to indicate a constraint that a type is an array of a given type +without constraining the length of the array. + +### The vanilla generic B-Tree + +Let's explore the example of a generic B-Tree with a fixed-size buffer. Find +such an example [here][vanilla btree]. + +```go +// These constants are the wart. +const ( + degree = 16 + maxItems = 2*degree - 1 + minItems = degree - 1 +) + +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { + return &btree[K, V]{cmp: cmp} +} + +type btree[K, V any] struct { + cmp LessFn[K] + root *node[K, V] +} + +// ... + +type node[K, V any] struct { + count int16 + leaf bool + keys [maxItems]K + vals [maxItems]V + children [maxItems + 1]*node[K, V] +} +``` + +### Parameterized nodes + +Then we allow parameterization on the node type within the btree implementation +so that different node concrete types with different memory layouts may be +used. Find an example of this generalization +[here][parameterized node btree]. + +```go +type nodeI[K, V, N any] interface { + type *N + find(K, LessFn[K]) (idx int, found bool) + insert(K, V, LessFn[K]) (replaced bool) + remove(K, LessFn[K]) (K, V, bool) + len() int16 + at(idx int16) (K, V) + child(idx int16) *N + isLeaf() bool +} + +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { + type N = node[K, V] + return &btree[K, V, N, *N]{ + cmp: cmp, + newNode: func(isLeaf bool) *N { + return &N{leaf: isLeaf} + }, + } +} + +type btree[K, V, N any, NP nodeI[K, V, N]] struct { + len int + cmp LessFn[K] + root NP + newNode func(isLeaf bool) NP +} + +type node[K, V any] struct { + count int16 + leaf bool + keys [maxItems]K + vals [maxItems]V + children [maxItems + 1]*node[K, V] +} +``` + +This still ends up using constants and there's no really easy +way around that. You might want to parameterize the arrays into the node like +in [this example][bad parameterization btree]. This still +doesn't tell a story about how to relate the children array to the items. + +### The proposal to parameterize the arrays + +Instead, we'd like to find a way to express the idea that there's a size +constant which can be used in the type definitions. The proposal would +result in an implementation that looked like +[this][proposal btree]. + +```go + +// StructArr is a constraint that says that a type is an array of empty +// structs of any length. +type StructArr interface { + type [...]struct{} +} + +type btree[K, V, N any, NP nodeI[K, V, N]] struct { + len int + cmp LessFn[K] + root NP + newNode func(isLeaf bool) NP +} + +// NewBTree constructs a generic BTree-backed map with degree 16. +func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] { + const defaultDegree = 16 + return NewBTreeWithDegree[K, V, [defaultDegree]struct{}](cmp) +} + +// NewBTreeWithDegree constructs a generic BTree-backed map with degree equal +// to the length of the array used as type parameter A. +func NewBTreeWithDegree[K, V any, A StructArr](cmp LessFn[K]) OrderedMap[K, V] { + type N = node[K, V, A] + return &btree[K, V, N, *N]{ + cmp: cmp, + newNode: func(isLeaf bool) *N { + return &N{leaf: isLeaf} + }, + } +} + +type node[K, V any, A StructArr] struct { + count int16 + leaf bool + keys [2*len(A) - 1]K + values [2*len(A) - 1]V + children [2 * len(A)]*node[K, V, A] +} +``` +### The Matrix example + +The example of the omission in type parameter proposal could be achieved in +the following way: + +```go +type Dim interface { + type [...]struct{} +} + +type SquareFloatMatrix2D[D Dim] [len(D)][len(D)]float64 +``` + +### Summary + +1) Support type list constraints to express that a type is an array + + +```go +// Array expresses a constraint that a type is an array of T of any +// size. +type Array[T any] interface { + type [...]T +} +``` + +2) Support a compile-time constant expression for `len([...]T)` + +This handy syntax would permit parameterization of arrays relative to other +array types. Note that the constant expression `len` function on array types +could actually be implemented today using `unsafe.Sizeof` by a parameterization +over an array whose members have non-zero size. For example `len` could be +written as `unsafe.Sizeof([...]T)/unsafe.Sizeof(T)` so long as +`unsafe.Sizeof(T) > 0`. + +## Rationale + +This approach is simpler than generally providing a constant scalar expression +parameterization of generic types. Of the two elements of the proposal, neither +feels particularly out of line with the design of the language or its concepts. +The `[...]T` syntax exists in the language to imply length inference for array +literals and is not a hard to imagine concept. It is the deeper requirement to +make this proposal work. + +One potential downside of this proposal is that we're not really using the +array for anything other than its size which may feel awkward. For that reason +I've opted to use a constraint which forces the array to use `struct{}` values +to indicate that the structure of the elements isn't relevant. This awkwardness +feels justified to side-step introduces scalars to type parameters. + +## Compatibility + +This proposal is fully backwards compatible with all of the language and also +the now accepted type parameters proposal. + +## Implementation + +Neither of the two features of this proposal feel particularly onerous to +implement. My guess is that the `[...]T` type list constraint would be extremely +straightforward given an implementation of type parameters. The `len` +implementation is also likely to be straightforward given the existence of +both compile-time evaluation of `len` expressions on array types which exist +in the language and the constant nature of `unsafe.Sizeof`. Maybe there'd be +some pain in deferring the expression evaluation until after type checking. + +[type parameters]: https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md +[block based data structures]: https://opensource.googleblog.com/2013/01/c-containers-that-save-memory-and-time.html +[vanilla btree]: https://go2goplay.golang.org/p/A5auAIdW2ZR +[parameterized node btree]: https://go2goplay.golang.org/p/TFn9BujIlc3 +[bad parameterization btree]: https://go2goplay.golang.org/p/JGgyabtu_9F +[proposal btree]: https://go2goplay.golang.org/p/4o36RLxF73C \ No newline at end of file