Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

make Channel lifecycle statemachine explicit #220

Merged
merged 1 commit into from
Mar 29, 2018

Conversation

weissi
Copy link
Member

@weissi weissi commented Mar 22, 2018

@normanmaurer / @Lukasa this is a preview of what I'm working on. It passes tests, also happy with some monkey channel test which previously used to crash stuff a lot. But we should have a chat what we want to do in which situations. Also CC @vlm .

Motivation:

We had a lot of problems with the Channel lifecycle statemachine as it
wasn't explicit, this fixes this. Additionally, it asserts a lot more.

Modifications:

  • made Channel lifecycle statemachine explicit
  • lots of asserts

Result:

  • hopefully the state machine works better
  • the asserts should guide our way to work on in corner cases as well

@weissi weissi requested review from normanmaurer and Lukasa March 22, 2018 18:05
@weissi weissi added the 🔨 semver/patch No public API change. label Mar 22, 2018
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, the general shape of this is really good. Some notes inline.

self.isActiveAtomic.store(false)
default:
()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the reason this conditional is a bit complex is to avoid doing too many atomic ops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa indeed, also don't think it's too complex tbh

}
}

@inline(__always) // so we're not actually returning a closure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to confirm that the compiler actually gets this right.

Presumably the other option is to pass a collection of non-escaping closures into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer / @Lukasa the problem here is that the compiler might change at any point in time and there's nothing that we can do to make sure that it's inlined unfortunately.

And passing the closures in is just so ugly, no?


switch (self.currentState, event) {
// origin: .neverRegistered
case (.neverRegistered, .activate):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to allow activating an unregistered socket? I think I'd kinda prefer we didn't.

self.badTransition(event: event)

case (.registeredNeverActivated, .register):
self.badTransition(event: event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mild nit, but it may be a bit clearer to read this state machine if you fold all the badTransition cases for a given state together, e.g

case (.registeredNeverActivated, .deactivate),
     (.registeredNeverActivated, .register):
    self.badTransition(event: event)

enum State {
case neverRegistered
case registeredNeverActivated
case registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, do we want allow a channel to go registered -> active -> registered -> active? I'd be inclined to want Channels to be single-use constructs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa I think you are right... it should be single use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

}
}

internal var isOpen: Bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a weird property. I wonder if we really need it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa we use it all over the place to check if a channel has not been closed. Sure, I could introduce a isClosed property but then we'd need to guard !...isClosed which gives us the inversion always...

self._pipeline = ChannelPipeline(channel: self)
self.lifecycleManager = SocketChannelLifecycleManager(eventLoop: self.eventLoop,
channelPipeline: ChannelPipeline(channel: self),
isActiveAtomic: Atomic(value: false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a reference cycle here, right? lifecycleManager holds a reference to self, meaning self holds a cyclic reference to self.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa no because ChannelPipeline holds Channel as an unowned reference. It's totally non-obvious but has always been like this.

@@ -373,6 +561,11 @@ class BaseSocketChannel<T: BaseSocket>: SelectableChannel, ChannelCore {
return
}

guard self.isActive else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably check the lifecycleManager directly, as it's definitionally called from on the event loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yes, forgot to record that change

@@ -410,7 +603,12 @@ class BaseSocketChannel<T: BaseSocket>: SelectableChannel, ChannelCore {

self.markFlushPoint(promise: nil)

guard self.isActive else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note, we're on the event loop and can avoid the atomic.

if !self.neverRegistered {
pipeline.fireChannelUnregistered0()
}
self.lifecycleManager.moveState(event: .close, promise: p)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to wrap these in convenience methods that let the states and inputs become private to lifecycleManager? e.g. self.lifecycleManager.close().

@@ -278,6 +278,9 @@ public enum ChannelError: Error {

/// A `DatagramChannel` `write` was made with an address that was not reachable and so could not be delivered.
case writeHostUnreachable

/// An operation that was inappropriate given the current `Channel` state was attempted.
case inappropriateOperationForState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need adding to the Equatable conformance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ianpartridge indeed

@weissi weissi force-pushed the jw-channel-lifecycle-statemachine branch 2 times, most recently from 3f920cb to f1c810e Compare March 26, 2018 15:47
Copy link
Member

@normanmaurer normanmaurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few style changes... Also did you run a benchmark to verify how much more costly this is with all the closures ?

}

@inline(__always)
internal mutating func register(promise: EventLoopPromise<()>?) -> (() -> Void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi Please change to use EventLoopPromise<Void> to be consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, will do

}

@inline(__always)
internal mutating func close(promise: EventLoopPromise<()>?) -> (() -> Void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi Please change to use EventLoopPromise<Void> to be consistent

}

@inline(__always)
internal mutating func activate(promise: EventLoopPromise<()>?) -> (() -> Void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi Please change to use EventLoopPromise<Void> to be consistent


// MARK: private API
@inline(__always) // so we're not actually returning a closure
private mutating func moveState(event: Event, promise: EventLoopPromise<()>?) -> (() -> Void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi Please change to use EventLoopPromise<Void> to be consistent

}

// MARK: private API
@inline(__always) // so we're not actually returning a closure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify this is to safe the heap allocation ? If so can you just add this to the comment ?

// MARK: API
internal init(eventLoop: EventLoop,
channelPipeline: ChannelPipeline,
isActiveAtomic: Atomic<Bool>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're injecting this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks, no there isn't anymore

Copy link
Member Author

@weissi weissi Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa now found the reason why this was injected: lifecycle manager is of course held mutable by the BaseSocketChannel. But we need to get to the atomic from any thread. Therefore the atomic needs to be stored on the BaseSocketChannel in a let and passed to the lifecycle manager. Sorry I did forget about that. Patch coming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> #294


// MARK: private API
@inline(__always) // so we're not actually returning a closure
private mutating func moveState(event: Event, promise: EventLoopPromise<()>?) -> (() -> Void) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still sure we don't heap-allocate a closure here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me read some SIL

Copy link
Member Author

@weissi weissi Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa ok I had a look at this and it seems totally fine. SIL didn't work as the compiler kept crashing, looked at the assembly instead. To make it easier I planted to bogus fcntl calls in there just so I know where we are:

screen shot 2018-03-27 at 11 56 10 am

and then I traced them back in the assembly which you can find below. I marked the beginning and the end of the interesting section as well as all the CALL instructions (which could lead to allocations):

screen shot 2018-03-27 at 11 52 56 am

as you can see all the calls are either retain or release, some optional projections or the call to badTransition which is expected. Do you agree that this is fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hang on, something's not quite right! I can't see the calls to pipeline.becomeActive and so either. LOoking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's slightly more complicated but the argument holds. Let me try to paste the right way through the code here:

part 1

part 2

part 3

part 4

part 5

part 6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that @Lukasa but this is the best I could do but there you can see quite well a run through of what happens:

  • the beginning (fcntl marker call)
  • fulfill the promise
  • fire channelActive
  • the end (fcntl marker call)

all the CALL instructions are highlighted and none of them should allocate. So it's all good I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, in step 4 we can also see the atomic store (to mark the channel as active) which is also important but I forgot to annotate.

}

// Call before triggering the close of the Channel.
pipeline.fireChannelReadComplete0()
if self.lifecycleManager.isActive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? It's not clear to me that the contract of readComplete requires that it only fire on active channels. If it does, is that contract in tension with the contract that readComplete always fires after channelRead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general it should only fire when the channel is active.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa, chatted to @normanmaurer and added a test which always gets into this situation by doing

  1. one successful read
  2. one read that fails with ECONNRESET
  3. in the errorCaught calls ctx.close to close the channel
  4. sees channelInactive (from the close)
  5. does not ever see channelReadComplete.

@weissi weissi force-pushed the jw-channel-lifecycle-statemachine branch 2 times, most recently from 9f109d8 to 87ba71c Compare March 27, 2018 18:04

// this is called from Channel's deinit, so don't assert we're on the EventLoop!
internal var canBeDestroyed: Bool {
if case .closed = self.currentState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified as:

return self.currentState == .closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @normanmaurer it can indeed now. Used to have more states :)

/// until the Channel is closed.
internal var isOpen: Bool {
assert(self.eventLoop.inEventLoop)
if case .closed = self.currentState {
Copy link
Member

@normanmaurer normanmaurer Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do:

return  self.currentState != .closed

internal let isActiveAtomic = Atomic(value: false)
// these are only to be accessed on the EventLoop
internal let channelPipeline: ChannelPipeline
private let eventLoop: EventLoop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove this and just use channelPipeline.eventLoop when needed ?

private mutating func doStateTransfer(newState: State, promise: EventLoopPromise<Void>?, _ callouts: @escaping (ChannelPipeline) -> Void) -> (() -> Void) {
self.currentState = newState

let pipeline = self.channelPipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we store the pipeline here first ? So we not escape self ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer yes so that self isn't captured. I can remove now as the @inline(_always) should make the whole closure go away anyway. Want me to?

return false
case .activated:
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use?:

return self.currentState == .activated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, same here, used to be more states

@weissi weissi added 🆕 semver/minor Adds new public API. and removed 🔨 semver/patch No public API change. labels Mar 28, 2018
@weissi weissi force-pushed the jw-channel-lifecycle-statemachine branch 3 times, most recently from 985466d to 20ea88f Compare March 28, 2018 16:08
@normanmaurer
Copy link
Member

@weissi please fix the test compile error.

@weissi weissi force-pushed the jw-channel-lifecycle-statemachine branch from 20ea88f to 7abbb03 Compare March 29, 2018 09:58
@weissi
Copy link
Member Author

weissi commented Mar 29, 2018

@normanmaurer sorry, should be fixed

@normanmaurer
Copy link
Member

@weissi can you also run some benchmarks with this ?

@weissi
Copy link
Member Author

weissi commented Mar 29, 2018

@normanmaurer the same:

before:

GO
Destination: [127.0.0.1]:9999
Interface lo0 address [127.0.0.1]:0
Using interface lo0 to connect to [127.0.0.1]:9999
Ramped up to 10 connections.
Total data sent:     32351.8 MiB (33923309052 bytes)
Total data received: 32347.9 MiB (33919252327 bytes)
Bandwidth per channel: 2713.545⇅ Mbps (339193.1 kBps)
Aggregate bandwidth: 13566.911↓, 13568.534↑ Mbps
Packet rate estimate: 1208692.7↓, 1194076.8↑ (11↓, 31↑ TCP MSS/op)
Test duration: 20.0012 s.

after:

GO
Destination: [127.0.0.1]:9999
Interface lo0 address [127.0.0.1]:0
Using interface lo0 to connect to [127.0.0.1]:9999
Ramped up to 10 connections.
Total data sent:     32794.0 MiB (34387041676 bytes)
Total data received: 32788.0 MiB (34380754909 bytes)
Bandwidth per channel: 2750.181⇅ Mbps (343772.6 kBps)
Aggregate bandwidth: 13749.648↓, 13752.162↑ Mbps
Packet rate estimate: 1217976.4↓, 1214771.9↑ (11↓, 32↑ TCP MSS/op)
Test duration: 20.0039 s.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ship it.

@Lukasa Lukasa added this to the 1.4.0 milestone Mar 29, 2018
Motivation:

We had a lot of problems with the Channel lifecycle statemachine as it
wasn't explicit, this fixes this. Additionally, it asserts a lot more.

Modifications:

- made Channel lifecycle statemachine explicit
- lots of asserts

Result:

- hopefully the state machine works better
- the asserts should guide our way to work on in corner cases as well
@weissi weissi force-pushed the jw-channel-lifecycle-statemachine branch from 5cc3082 to b1375ac Compare March 29, 2018 17:46
@normanmaurer normanmaurer merged commit 2374421 into apple:master Mar 29, 2018
@normanmaurer
Copy link
Member

@weissi thanks ... merged!

@weissi weissi deleted the jw-channel-lifecycle-statemachine branch April 2, 2018 11:23
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants