Skip to content

Commit

Permalink
design: add proposal for parameterized generic array sizes
Browse files Browse the repository at this point in the history
  • Loading branch information
ajwerner committed Mar 16, 2021
1 parent a384797 commit 8e3bc88
Showing 1 changed file with 258 additions and 0 deletions.
258 changes: 258 additions & 0 deletions design/44253-generic-array-sizes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
# Proposal: Generic parameterization of array sizes

Author(s): Andrew Werner

Last updated: March 16th, 2021

## Abstract

With the type parameters generics proposal has been accepted, though not yet
fully specified or implemented, we can begin to talk about extension. [That
proposal][type parameters] lists the following omission:

> No parameterization on non-type values such as constants. This arises most
obviously for arrays, where it might sometimes be convenient to write type
`Matrix[n int] [n][n]float64`. It might also sometimes be useful to specify
significant values for a container type, such as a default value for elements.

This proposal seeks to resolve this limitation by (a) specifying when `len` can
be used as a compile-time constant and (b) adding syntax to specify constraints
for all arrays of a given type in type lists.

## Background

An important property of the generics proposal is that it enables the creation
of libraries of specialized container data structures. The existence of such
libraries will help developers write more efficient code as these data
structures will be able to allocate fewer object and provide greater access
locality. [This Google blog post][block based data structures] about block-based
C++ data drives home the point.

The justification is laid out in the omission of the type parameter proposal.
The motivation that I've stumbled upon is in trying to implement a B-Tree
and allowing the client to dictate the degree of the node.

One initial idea would be to allow the client to provide the actual array
which will be backing the data inside the node as a type parameter. This might
actually be okay in some data structure user cases but in a B-Tree it's bad
because we still would like to instantiate an array for the pointers and that
array needs to have a size that is a function of the data array.

The proposal here seeks to make it possible for clients to provide default
values for array sizes of generic data structures in a way that is minimally
invasive to the concepts which go already has. The shorthand comment stated
in the Omission of the Type Parameter Proposal waves its hand at what feels
like a number of new and complex concepts for the language.

## Proposal

This proposals attempts to side-step questions of how one might provide a
scalar value in a type constraint by not ever providing a scalar directly.
This proposal recognizes that constants can be used to specify array lengths.
It also notes that the value of `len()` can be computed as a compile-time
constant in some cases. Lastly, it observes that type lists could be extended
minimally to indicate a constraint that a type is an array of a given type
without constraining the length of the array.

### The vanilla generic B-Tree

Let's explore the example of a generic B-Tree with a fixed-size buffer. Find
such an example [here][vanilla btree].

```go
// These constants are the wart.
const (
degree = 16
maxItems = 2*degree - 1
minItems = degree - 1
)

func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] {
return &btree[K, V]{cmp: cmp}
}

type btree[K, V any] struct {
cmp LessFn[K]
root *node[K, V]
}

// ...

type node[K, V any] struct {
count int16
leaf bool
keys [maxItems]K
vals [maxItems]V
children [maxItems + 1]*node[K, V]
}
```

### Parameterized nodes

Then we allow parameterization on the node type within the btree implementation
so that different node concrete types with different memory layouts may be
used. Find an example of this generalization
[here][parameterized node btree].

```go
type nodeI[K, V, N any] interface {
type *N
find(K, LessFn[K]) (idx int, found bool)
insert(K, V, LessFn[K]) (replaced bool)
remove(K, LessFn[K]) (K, V, bool)
len() int16
at(idx int16) (K, V)
child(idx int16) *N
isLeaf() bool
}

func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] {
type N = node[K, V]
return &btree[K, V, N, *N]{
cmp: cmp,
newNode: func(isLeaf bool) *N {
return &N{leaf: isLeaf}
},
}
}

type btree[K, V, N any, NP nodeI[K, V, N]] struct {
len int
cmp LessFn[K]
root NP
newNode func(isLeaf bool) NP
}

type node[K, V any] struct {
count int16
leaf bool
keys [maxItems]K
vals [maxItems]V
children [maxItems + 1]*node[K, V]
}
```

This still ends up using constants and there's no really easy
way around that. You might want to parameterize the arrays into the node like
in [this example][bad parameterization btree]. This still
doesn't tell a story about how to relate the children array to the items.

### The proposal to parameterize the arrays

Instead, we'd like to find a way to express the idea that there's a size
constant which can be used in the type definitions. The proposal would
result in an implementation that looked like
[this][proposal btree].

```go

// StructArr is a constraint that says that a type is an array of empty
// structs of any length.
type StructArr interface {
type [...]struct{}
}

type btree[K, V, N any, NP nodeI[K, V, N]] struct {
len int
cmp LessFn[K]
root NP
newNode func(isLeaf bool) NP
}

// NewBTree constructs a generic BTree-backed map with degree 16.
func NewBTree[K, V any](cmp LessFn[K]) OrderedMap[K, V] {
const defaultDegree = 16
return NewBTreeWithDegree[K, V, [defaultDegree]struct{}](cmp)
}

// NewBTreeWithDegree constructs a generic BTree-backed map with degree equal
// to the length of the array used as type parameter A.
func NewBTreeWithDegree[K, V any, A StructArr](cmp LessFn[K]) OrderedMap[K, V] {
type N = node[K, V, A]
return &btree[K, V, N, *N]{
cmp: cmp,
newNode: func(isLeaf bool) *N {
return &N{leaf: isLeaf}
},
}
}

type node[K, V any, A StructArr] struct {
count int16
leaf bool
keys [2*len(A) - 1]K
values [2*len(A) - 1]V
children [2 * len(A)]*node[K, V, A]
}
```
### The Matrix example

The example of the omission in type parameter proposal could be achieved in
the following way:

```go
type Dim interface {
type [...]struct{}
}

type SquareFloatMatrix2D[D Dim] [len(D)][len(D)]float64
```

### Summary

1) Support type list constraints to express that a type is an array


```go
// Array expresses a constraint that a type is an array of T of any
// size.
type Array[T any] interface {
type [...]T
}
```

2) Support a compile-time constant expression for `len([...]T)`

This handy syntax would permit parameterization of arrays relative to other
array types. Note that the constant expression `len` function on array types
could actually be implemented today using `unsafe.Sizeof` by a parameterization
over an array whose members have non-zero size. For example `len` could be
written as `unsafe.Sizeof([...]T)/unsafe.Sizeof(T)` so long as
`unsafe.Sizeof(T) > 0`.

## Rationale

This approach is simpler than generally providing a constant scalar expression
parameterization of generic types. Of the two elements of the proposal, neither
feels particularly out of line with the design of the language or its concepts.
The `[...]T` syntax exists in the language to imply length inference for array
literals and is not a hard to imagine concept. It is the deeper requirement to
make this proposal work.

One potential downside of this proposal is that we're not really using the
array for anything other than its size which may feel awkward. For that reason
I've opted to use a constraint which forces the array to use `struct{}` values
to indicate that the structure of the elements isn't relevant. This awkwardness
feels justified to side-step introduces scalars to type parameters.

## Compatibility

This proposal is fully backwards compatible with all of the language and also
the now accepted type parameters proposal.

## Implementation

Neither of the two features of this proposal feel particularly onerous to
implement. My guess is that the `[...]T` type list constraint would be extremely
straightforward given an implementation of type parameters. The `len`
implementation is also likely to be straightforward given the existence of
both compile-time evaluation of `len` expressions on array types which exist
in the language and the constant nature of `unsafe.Sizeof`. Maybe there'd be
some pain in deferring the expression evaluation until after type checking.

[type parameters]: https://go.googlesource.com/proposal/+/refs/heads/master/design/go2draft-type-parameters.md
[block based data structures]: https://opensource.googleblog.com/2013/01/c-containers-that-save-memory-and-time.html
[vanilla btree]: https://go2goplay.golang.org/p/A5auAIdW2ZR
[parameterized node btree]: https://go2goplay.golang.org/p/TFn9BujIlc3
[bad parameterization btree]: https://go2goplay.golang.org/p/JGgyabtu_9F
[proposal btree]: https://go2goplay.golang.org/p/4o36RLxF73C

0 comments on commit 8e3bc88

Please # to comment.