Skip to content

Discussion: data/stream/rpc reliability #702

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pblazej
Copy link
Contributor

@pblazej pblazej commented May 16, 2025

Problem

Our data-related APIs do not (or indirectly) validate the state of their destinationIdentity which seriously impacts the reliability of such channels. It moves the responsibility to handle these state transitions to the consumer, respond to proper Participant lifecycle events, etc.

Reasoning about the order of events is impossible without server-side knowledge, handling all the combinations - highly impractical, and counter-intuitive (like participant already subscribed to a track, but cannot receive data from a data channel).

Consumers shouldn't be exposed to transport internals and their distributed nature. In most practical applications, we can just assume that if the Participant.Identity is returned somewhere, it's also "valid" for communication (regardless of the channel).

Swift Status Quo

Sending messages to inactive/non-existing participants will lead to:

  • 🟡 room.localParticipant.performRpc - will timeout after 10s, but the API suggests:
    public enum BuiltInError {
        case connectionTimeout
        case responseTimeout // this is triggered
        case recipientDisconnected
        case sendFailed
        case recipientNotFound
  • 🔴 room.localParticipant.publish(data:) - only checks publisher's own transport via publisherTransportConnectedCompleter - will return immediately
  • 🔴 room.room.localParticipant.sendText and other streams - will just return immediately

Solution

Publisher

Always check for local transport, then await, reuse this part across 3+ APIs. This change is purely internal and should not break any existing behaviors.

Receivers

Add public option(s) to wait for the receivers with a consistent timeout, reuse this part across 3+ APIs. Technically, it's not super hard, as all the APIs are already async. New errors for opt-iners.

RPC example attached.

Questions

  • Which modes do we really need?

    • The most practical one is just waitUntilActive (mirroring publisher side), forming Participant.Identity from String should not be that popular, so we assume the Participant exists somewhere among allParticipants already. If not, just throw immediately.
  • "As soon as someone joins"

    • To handle that implicitly, we'd need an extremely long timeout, that could be covered by "joined event" → waitUntilActive sequence.
  • How to handle multiple destinations (any/all = optimistic/pessimistic)?

    • Can we even predict the use cases here?

@pblazej
Copy link
Contributor Author

pblazej commented May 16, 2025

Per our discussion with @lukasIO the bare minimum:

  • wait until publisher active (that's more SDK bug than actual change)
  • wait until receivers active
    • wait concurrently for each and every receiver → send
    • fail/throw if no one activates until timeout
    • success if at least one activates within timeout

@pblazej
Copy link
Contributor Author

pblazej commented May 20, 2025

@bcherry @lukasIO we can introduce waitUntilActive() for the local participant, but:

  • it must be called explicitly (no implicit benefits)
  • it does not solve the problem of [Participant.Identity] - for multiple receivers, unless we choose the convention here (don't wait for all e.g.)

@lukasIO
Copy link
Contributor

lukasIO commented May 20, 2025

what's the benefit on having waitUntilActive on the local participant?

@pblazej
Copy link
Contributor Author

pblazej commented May 21, 2025

what's the benefit on having waitUntilActive on the local participant?

No benefit, apart from what's actually implemented for data packets:

    func send(dataPacket packet: Livekit_DataPacket) async throws {
        func ensurePublisherConnected() async throws {

For remotes, as I said doing waitUntilActive() on a single Participant.Identity is also cumbersome, as all the APIs are based on `[Participant.Identity] - consumer needs to do the heavy lifting (fork, async stuff, etc.).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants