Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The initial PR. #1

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open

The initial PR. #1

wants to merge 46 commits into from

Conversation

scott-christopher
Copy link
Member

This PR is largely open for discussion and I expect it to be a bit of a living document while we carve out the final result.

This change effectively introduces some new immutable set data types for Sanctuary that can be available for both internal and external use.

There are two main variants of implementation included here:

  1. An AVL balanced tree for sets of ordered values – https://arxiv.org/abs/1602.02120
  2. A Patricia trie specialised for sets containing int32 values – http://ittc.ku.edu/~andygill/papers/IntMap98.pdf

Both of these appear to be performant enough (the IntSet performs particularly well for operations like union), with the performance largely affected by the ability to determine a consistent order for the stored values.

The Ordered constraint meant that an ordering scheme was necessary to consistently sort values like arrays and objects which don't exhibit a natural order. Given this may not always be necessary, I have created separated derivations of the AVL trees depending on whether a simple and faster primitive comparison can be performed, or the more involved Boxed version which will recurse into arrays and object values if necessary.

The complexity in the object and array comparison within the BoxedSet is particularly unfriendly, so I encourage people to cast a critical eye over those for bugs, improvements and optimisations in particular.

The implementations have largely been focussed on the "make it work" stage so far, so suggestions for the subsequent "make it right" and "make it fast" are very much welcome.

Documentation is also non-existent at this stage, so I'd welcome volunteers to help out here if possible as my available time will be limited over the coming weeks (more than happy to answer any questions though).

I also haven't plugged in sanctuary-def in any form yet, as I wasn't sure how/where this plans to sit amongst everything else yet. Similarly, I don't hold any strong opinions about folder/code structure, so feel free to cast judgement.

Related: sanctuary-js/sanctuary#233

@davidchambers
Copy link
Member

The property-based tests are very nice!

@EvanBurchard
Copy link

It looks like the contribution guidelines reference the readme being generated by index.js. If someone is open to starting that file and documenting one or two functions (and giving some indication for how the readme is generated), I'm happy to fill in some gaps to push the docs along.

@davidchambers
Copy link
Member

Thank you for volunteering, @EvanBurchard. :)

The process by which the documentation is generated is beautiful. It starts with a clean description format free of JSDoc-style cruft:

  //# min :: Ord a => a -> a -> a
  //.
  //. Returns the smaller of its two arguments.
  //.
  //. Strings are compared lexicographically. Specifically, the Unicode
  //. code point value of each character in the first string is compared
  //. to the value of the corresponding character in the second string.
  //.
  //. See also [`max`](#max).
  //.
  //. ```javascript
  //. > S.min(10, 2)
  //. 2
  //.
  //. > S.min(new Date('1999-12-31'), new Date('2000-01-01'))
  //. new Date('1999-12-31')
  //.
  //. > S.min('10', '2')
  //. '10'
  //. ```
  function min(x, y) {
    return x < y ? x : y;
  }
  S.min = def('min', {a: [Ord]}, [a, a, a], min);

A block begins with a //#-style comment containing the function's name and type signature. This is followed by any number of //.-style comments containing arbitrary Markdown. Finally, we have the function's implementation. The //# line should be as long as necessary; all other lines should be wrapped at 79 characters.

The JavaScript file is given to Transcribe to generate the Markdown readme. This is described in the project's makefile:

README.md: index.js
    $(TRANSCRIBE) \
      --heading-level 4 \
      --url 'https://github.com/sanctuary-js/sanctuary/blob/v$(VERSION)/{filename}#L{line}' \
      -- $^ \
    | LC_ALL=C sed 's/<h4 name="\(.*\)#\(.*\)">\(.*\)\1#\2/<h4 name="\1.prototype.\2">\3\1#\2/' >'$@'

The readme is updated via a prepublish script each time xyz is used to release a new version of the package. As a result, the readme should not be modified by hand, nor is it necessary to run make README.md manually after making changes to the source file.

The result can be seen at https://github.com/sanctuary-js/sanctuary#min. Note that Transcribe turns the //# line into a heading (as specified by --heading-level) which links to the corresponding line in the source code (as specified by --url).

The code examples should be in doctest format. The correctness of the examples is checked when make test is run (as it will be automatically whenever a pull request is opened).

Finally, a custom script is used to generate an HTML file from the Markdown file. The result can be seen at https://sanctuary.js.org/#min. Note that the code examples are editable in this context!

I plan to reuse the script to generate an HTML file for this project. It will be available at https://sanctuary.js.org/sanctuary-set/.

@faceyspacey
Copy link

@scott-christopher do you have any references to algorithms you were thinking of for non-sets? Out of curiosity, why did you go for sets first? The needs of a current project??

@davidchambers
Copy link
Member

Out of curiosity, why did you go for sets first?

One major distinction between Ramda and Sanctuary is the appropriateness of the data types used. Since Ramda is limited to built-in types, the types of several functions are approximations:

head :: Array a -> a?
match :: RegExp -> String -> Array String?
intersection :: Array a -> Array a -> Array a

The Sanctuary team, on the other hand, considers descriptive, restrictive types to be so important that we're willing to define our own types in cases where the language does not provide anything suitable. This would ideally enable the following:

head :: Array a -> Maybe a
match :: NonGlobalRegExp -> String -> Maybe { match :: String, groups :: Array (Maybe String) }
intersection :: Set a -> Set a -> Set a

There are two problems with approximations such as using Array a in place of Set a:

  • invariants are conveyed through documentation rather than encoded in the data types (we may promise that an array contains no duplicate values); and

  • the approximation may convey too much information (perhaps forcing us to specify an arbitrary but deterministic ordering for a collection which is conceptually unordered).

Once we have access to a Set type we'll be able to update functions such as keys, values, and pairs, and add Set-specific functions such as intersection and union.


const BaseSet = require('./BaseSet');

// TODO: Propose addition of Ord to FL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/FLSet.js Outdated

// TODO: Propose addition of Ord to FL.
module.exports = BaseSet(function compare(x, y) {
return x['fantasy-land/compare'](y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

return x['fantasy-land/lte'](y) ? -1 : y['fantasy-land/lte'](x) ? 1 : 0

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants