(WIP) Kappa5 #13

Frando · 2019-11-07T10:44:52Z

This PR pulls in the current state of my kappa5 branch. It has been talked about a bit already: It changes kappa-core to be dependency-free (it just connects sources to views) which should make it much easier to support flexible indexing flows and scenarios. And then it includes sources for hypercore and multifeed (and hyperdrives!).

This is not completely ready yet i think, but I wanted to put it up for review and discussion, and to agree on the best way on.

README with the new API

Open questions and missing features

State handling. In current kappa, the views need to provide storeState/fetchState handlers if they want to persist the indexer's state. I don't think this should stay with the view, as now a view can have many sources. There's two parts of state here: 1) the view version to trigger a rebuild on a version change 2) the indexing progress. For 2) a more complex (sparse) indexer/source would track the progress on its own (eg in bitfields), a simple source could still make use of a buffer to store its state.

So my current thinking is: Have the kappa track the state for view versions, and a buffer per flow (source instance-view combo), by default in-memory. And allow to supply storeState (key, state, cb), fetchState (key, cb) opts to the kappa-core. We could then also ship eg a simple implementation with tinybox, then only a random-access-storage instance would have to be passed into the kappa for persistence.
Naming: do people like the source term here? I was thinking if indexer would be better, but am not sure. Like, the createSource function does create a source for the kappa, which usually is a function that indexes a set of feeds or other datastructures. So creating a source for the kappa does usually not create datastructures, but only the function that indexes them.
Backwards compatibility: Currently there is the kappaClassic function in index.js that wires the new kappa-core together in an API-compatible way to the current kappa-core. I mostly did this for testing - it passes the cabal-core tests. However, this is based on current multifeed, which means hypercore 7. So actually, I'd propose to not have that, and have a backwards incompatible change.
What to include in kappa-core? Should kappa-core be just the kappa-core, or also include a set of useful sources? (the modules in /sources)

This is a dependency-free version of the kappa core idea. It connects sources to views and tracks state. Next commit will add a kappaClassic wrapper that support multifeed by default.

Also removes a "parent flow" leftover of the removed subflows pattern.

hackergrrl · 2019-11-27T00:56:24Z

This is super cool @Frando. I'm still thinking through everything, but here are some initial questions:

How important is supporting multiple sources flowing to a view? I'd be interested to hear about specific usecases you have in mind. If it's something that's not strictly needed I think it could simplify the code a fair bit.
This new setup assumes that sources will store their own state (what they've indexed / not indexed). This would have to happen in their pull() implementation. However, this is a bit tricksy, since the source would only be able to store state during the part of control flow where the view has not yet indexed the messages. This makes it hard for a source to update its state right after the view processes a batch. I think it'd be cool to see an example of a source that uses disk storage, to look at together & think through the implications of this.
Since we could imagine certain sources wanting to manage their own state in a special way (like a sparse indexer using a bitfield-db), maybe the pull() api can just be pull(next), and we assume it manages itself.
I like the idea of having a source manage both its own state and its version info. That way fewer storages need to be specified. Actually then the kappa-core instance wouldn't store any state! So you could do, say:

var kappa = require('kappa-core')
var hsource = require('kappa-source-hypercore')
var tinybox = require('tinybox')
var raf = require('random-access-file')
var bkdview = require('kappa-view-bkd')
var level = require('level')

var core = kappa()
var src = hsource(tinybox(raf))  // stores `version` and `state` in a random-access-* store
var view = bkdview(level('./foo'))  // store just the spatial database details
core.use('spatial', src, view)  // hooks up the source and view instances

core.api.spatial.query([-40,40,-80,80], (err, res) => { /*...*/ })

hackergrrl · 2019-11-27T00:56:51Z

btw, kappa-core is on hypercore-protocol@7 now! 🎸

Frando · 2019-12-03T13:48:43Z

So, now in some more words. I agree to noffle's remarks!

multiple sources: I do need them or think they are something for which a proper abstraction would be good to have. Like multifeed-index actually is a set of hypercore sources. But as long as it can be cleanly done in a multisource module, its all fine
adding a callback to signal back to the source that the view has completed indexing is good yes
and also removing the state storage from kappa and moving it towards the sources. I might want to add a SimpleStatefulSource or such to not have to rewrite the same code (similar to kappa-view-level)

I started to update the Kappa5 based on these observations. Before I continue I think I'd like for us to agree on the end result so that its not too much work rewriting things again.

Currently, a most simple example would look like this:

https://gist.github.com/Frando/21bc9e796544692b51de7e85edd1983a

Things to note:

Each use call creates a Flow, which is the combination of source + view. This makes things explicit, which is good. Its up to the consumer to create a source many times if it needs the same source for many views (thats how it always was, just happening inside kappa-core). I think I like this, and this also opens the door to possibly optimize for "one source for many views" scenarios
Both views and sources can expose an api. The view's api is mounted on kappa.api, the source's api on kappa.api.source. Is this good, or should this be structured differently?
One thing I'm still not totally sure is how the source can talk to its flow to request that pull be called again. Right now I pass the flow object into to open method, where it can then be stored somewhere, so that when the source has incoming messages, it can call flow.update to signal that its pull method should be called. Before (in the current kappa5 branch), it was passed into the constructor (there, the createSource constructor is called by kappa-core, now a constructed source would be passed in by the consumer - which is nice because its the same as with views).

Frando · 2019-12-05T10:28:41Z

I started upating the API after the discussions.

See https://github.com/Frando/kappa-core/tree/kappa5-new for now. Most tests are updated and pass.

Frando · 2019-12-07T15:58:46Z

This is continued in #14.

Frando and others added 30 commits October 22, 2019 13:45

Move index.js to kappa-old.js

7991a5d

Add new Kappa!

3442666

This is a dependency-free version of the kappa core idea. It connects sources to views and tracks state. Next commit will add a kappaClassic wrapper that support multifeed by default.

Add kappaClassic wrapper and source handlers

9472c73

Add thunky dependency

8160fb5

Fixes

2ae73cb

Add example as test

f298710

Cleanup

404c78f

Docs

e029dff

Rename method

f5aa224

Add flowchart

44cdd21

Minor changes to wiring

87a4a46

Add a hyperdrive source handler & test

a21b44c

Enable all tests

400d16b

Use svg for flow graph

f4ae334

Added kappa-graph.svg

a58dffc

Use svg for flow graph

03665c1

Merge branch 'kappa5' of github.com:Frando/kappa-core into kappa5

6eb507c

Remove png graph

7203eda

Default opts, and cleanup

8600c9d

Remove subflows for now

1a281b8

Fix hyperdrive, improve tests

c996fe6

Tests & cleanup

c32347e

Lint standard

11cfe8c

Make methods public

0046f80

Cleanup open & ready

d81e5ef

Also removes a "parent flow" leftover of the removed subflows pattern.

Better multifeed backward compatibility

4f21708

docs

6605098

context for views

30753f0

bump corestore to 4.0.0

e8cc5b1

add corestore source

ed0f44a

ameba23 and others added 8 commits November 7, 2019 17:18

add test for corestore source

967a3a3

add default error handler for hypercore source

37cecec

valueencoding = json in corestore test

8516f33

function for corestore constructor

f08b157

Add transform opt

d075ab4

Add stacked views.

53e908a

clearIndex for stacked views

6a3ff63

Merge remote-tracking branch 'cobox/corestoreSource' into kappa5

a615575

hackergrrl mentioned this pull request Nov 19, 2019

Community Question: Should multifeed use decentstack under the hood? kappa-db/multifeed#34

Closed

Frando mentioned this pull request Dec 7, 2019

WIP: kappa-next #14

Open

Frando closed this Dec 7, 2019

m4gpi mentioned this pull request Jan 15, 2020

Support for Frando's kappa next kappa-db/kappa-view-query#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Kappa5 #13

(WIP) Kappa5 #13

Frando commented Nov 7, 2019 •

edited

Loading

hackergrrl commented Nov 27, 2019

hackergrrl commented Nov 27, 2019 •

edited

Loading

Frando commented Dec 3, 2019

Frando commented Dec 5, 2019

Frando commented Dec 7, 2019

(WIP) Kappa5 #13

(WIP) Kappa5 #13

Conversation

Frando commented Nov 7, 2019 • edited Loading

hackergrrl commented Nov 27, 2019

hackergrrl commented Nov 27, 2019 • edited Loading

Frando commented Dec 3, 2019

Frando commented Dec 5, 2019

Frando commented Dec 7, 2019

Frando commented Nov 7, 2019 •

edited

Loading

hackergrrl commented Nov 27, 2019 •

edited

Loading