Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rhyme Type Rules #50

Open
IsaacFleetwood opened this issue Oct 16, 2024 · 2 comments
Open

Rhyme Type Rules #50

IsaacFleetwood opened this issue Oct 16, 2024 · 2 comments
Labels
design Language design discussion

Comments

@IsaacFleetwood
Copy link
Contributor

I've put together a document of the current type rules, aswell as including some of the notes that we discussed in the meeting today. Every rule is labeled with a location, so we can identify the specific rule being discussed when talking about it.
TypingRules.pdf

There are a lot of notes regarding potential deviations in direction that could take place. They are labelled with a "TODO" in front.

The PDF does disclude the Any type. Given the requirement that a well-typed expression utilizing it must not error (as opposed to Unknown), this means that just about the only place where it can be used is in the General Stateful Operations, which

Aswell, we did discuss avoiding overflow by upcasting values, while the PDF runs under the assumption of preserving the input types when possible.

@TiarkRompf
Copy link
Member

TiarkRompf commented Oct 17, 2024

Very nice! Do you want to check in the source to the rhyme-papers repo?

Some thoughts right off the bat:

  • I think it's useful to emphasize the "types as sets of values" angle -- we can enumerate the values in the language, and think about singleton types (the types that contain just a single value) as the atomic building blocks of the type language
  • Starting from singletons, we can build types that describe larger sets of values, for example the union $\cup$ operation, with set union being the semantic interpretation of union types, and we also have types such as Int, Float, String, and their variants that describe specific sets of values.
  • Semantically, we can justify a subtyping relationship $A <: B$ as the reflection of subset inclusion back into the type language
  • Now where things get tricky is with Unknown: if unknown is supposed to be a legal type anywhere, it's semantic interpretation can't be a set of values -- it would have to be a subtype of all other types, and all other types would have to be a subtype of Unknown, so Unknown would simultaneously have to be the empty set and the set of all values, which would mean the entire system collapses and all types end up being equivalent
  • So for types like Unknown, one option is to settle for a weaker ordering relation $A \prec: B$ meaning A is convertible to B. This still means that every type is convertible to every other type (transitively via Unknown), so it seems useful to model errors specifically -- then the result becomes "every type is (transitively) convertible to every other type or it fails with an error".
  • Nothing and Error also don't strictly represent values -- this is a bit different in Rhyme than in JS, where "undefined" is a proper value, but we're treating "not producing a value" more as a property of evaluating an expression (i.e. an effect). For example, a type like Array[Int $\cup$ Error] doesn't seem to make a lot of sense. So it seems like a good idea to draw a distinction between "value types" and "expression types" and be clear about when we talk about which (one option in the doc is to use different meta variables, e.g. reserve S,T,U for value types and use A,B,C or P,Q,R for expression types)
  • Semantically, we're often thinking about objects as a kind of materialized function, so it would be useful to think about object types as function types $A \to B$. It seems like the "keyof" operator from TypeScript maps quite nicely to a dependent type "dom(f) <: A" that represents the domain of a function. It may still be useful to keep these object-as-function types separate from the function types assigned to UDFs imported from JS.
  • We should think about ways to include information about data structure representation in object types, e.g. if an object is a a plain JS object, a regular JS array, a JS Typed Array, a CSR compressed sparse matrix/vector, a view into a dense tensor, a red-black tree, etc.
  • We need to think about if we need intersection types. One could potentially assign type "*A: dom(data) $\cap$ dom(other)" when we have filter expressions "data.*A" and "other.*A". A question that arises is what should happen when this intersection is empty -- especially how this will relate to Nothing.
  • When specifying type systems, one typically tries to distinguish rules that are built into the subtyping or type assignment judgment, and derived rules that are "admissible", i.e. can be proved as lemmas from the base rules. Examples would be reflexivity or transitivity of subtyping (typically, one tries to only include reflexivity for base types in the judgment and prove the rest as lemmas), or in your doc, rule 1.3 would probably be a base rule (the definition of type equivalence) and 1.4 would be proved from that via inversion. Likewise, if you have 2.1 and a symmetric version for S as base rules, then 2.2-2.4 can likely be proved from those.

Edit: other things to consider for interaction of stateful ops with Nothing/Error:

  • We could have two versions of e.g. "sum", the regular one that always produces 0 even if no entry was produced, and a variant "sum?" that would return Nothing in that case
  • What should be the result of "sum data.*A" with input "{A: 7, b: "not a number", c: { nested: "object" }}"? Right now we're mapping most type mismatches to Nothing, which (if applied consistently) would mean we should return 7. This seems like a useful pattern to easily filter out all non-numeric entries. With the switch to more type safety and explicit errors the consistent result would be to have the sum op throw an error. It seems like we could still get the filtering result if we were to use an explicit check or conversion op -- either "sum (isInt? data.*A) & data.*A" or, probably nicer, "sum (toInt? data.*A)". The "toInt?" operator would return Nothing if the argument is not an int, and we can have a variant "toInt" that would throw an error.
  • We need a way to "observe" Nothing, e.g. "a ?? b" as in JS -- the semantics will be subtly different though, because we don't really treat undefined as a value. It may need to be stateful to make "a.*A ?? b" work.
  • Likewise, it seems useful to add an error recovery mechanism. e.g. "try (sum data.*A)" that turns Error into Nothing. In combination, "try e ?? v" would turn Error into a proper value. An explicit "error" op to cause errors/exceptions also seems useful.
  • Related: we also have to find a way to do outer joins, e.g. addition of two sparse vectors a and b where we want the union of the index spaces not the intersection -- we're still accessing both objects using the same variable "*A" to create enclosing loops but now we want to also hit cases where either a.*A or b.*A is Nothing. I imagine we could write something like "a.*A ?? 0 + b.*A ?? 0" similar to JS, but we'll need to figure out the details -- right now "a.*A" will never produce Nothing, so the "??" won't ever trigger.

@TiarkRompf TiarkRompf added the design Language design discussion label Oct 17, 2024
@IsaacFleetwood
Copy link
Contributor Author

Here's the updated type rules that I made from some of the things we have talked about in our meetings. It now includes definitions of Any and Never, Intersection rules, compile-time guarantees for non-empty intersections, and then, importantly, separation of Nothing and Error from value types.
Aswell, it includes an explicit way of determining the types of filter variables, including being able to do both inner joins and left/right outer joins (full outer joins to be determined).
TypingRules.pdf
Feel free to give comments as desired on it.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
design Language design discussion
Projects
None yet
Development

No branches or pull requests

2 participants