Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Add a WithTime node #707

Open
johnynek opened this issue Jan 12, 2017 · 6 comments
Open

Add a WithTime node #707

johnynek opened this issue Jan 12, 2017 · 6 comments

Comments

@johnynek
Copy link
Collaborator

Summingbird internally keeps a timestamp for all values, but we don't expose that to the user. It is a pain to always thread it through. We could add it back by adding a new node:

case class WithTime[P, T](p: Producer[P, T]) extends Producer[P, (T, Timestamp)]
case class ValueWithTime[P, K, V](p: Producer[P, (K, V)]) extends Producer[P, (K, (V, Timestamp))]

then at plan time, we can just treat this like a map that adds the timestamp, which we know at the time.

This would clean up some internal APIs we have if summingbird supported it, and it would also close #688 since can always recover the timestamp at any point.

@johnynek
Copy link
Collaborator Author

@pankajroark what do you think of this? I can work on adding it if we can ever get our tests to not OOM.

@oscar-stripe
Copy link
Contributor

ping on this @ttim ?

We need the time in user land a lot at Stripe. I can possibly find time to work on this unless you see any blockers.

@ttim
Copy link
Collaborator

ttim commented May 31, 2017

@johnynek it introduces (conceptually) notion of time into core platform.

Pros: it's already a case and makes everything more consistent.
Cons:

  • We need to change memory platform (not a big issue, let's assume 0 timestamp for memory platform in the beginning)
  • I had some thought how to integrate tsar functionality into SB. For example you can treat sumByKey as something which do aggregation over different ranges of time

In general I like the idea to put time into core and build everything else around.

@pankajroark
Copy link
Contributor

pankajroark commented Jun 1, 2017

Will this mean that users will be able to specify a summingbird job without time? That may not be a bad idea because that would support online only use cases more efficiently, right now users fix the timestamp for that.

Or are these nodes solely aimed at being able to extract time which is hidden. The api seems a bit magical in that case. It will be great if you could give an example.

@oscar-stripe
Copy link
Contributor

@pankajroark I don't think it helps you run without time as I am conceiving this, almost the opposite: you have to be able to give a time for each event.

So, what I want is this: we have a system similar to tsar which aggregates keys into many buckets. To do this, we need to know the time. We currently carry a copy of the time around in the value. That is a waste since internally summingbird knows the time. The .withTime method would make a copy of the internal time of the event out so we could bucked without carrying that copy everywhere (which is especially painful across store/sumByKey boundaries).

@pankajroark
Copy link
Contributor

Even though the extraction of time out of nothing seems a bit magical to me, I realize the practical utility. I'm onboard.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants