Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix configuration issue check for decoder protocol #3764

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Outfluencer
Copy link
Collaborator

@Outfluencer Outfluencer commented Nov 30, 2024

With this pr we keep track of the configuration state as we get wrong results in the server connector if we dont.
i tried to run the code that is causing the issue in the connecting players eventloop but this did not fixed the issue so i used AtomicOperations and it worked

This is the error that occours without the fix

16:59:48 [SEVERE] [Outfluencer] -> UpstreamBridge - encountered exception
io.netty.handler.codec.EncoderException: java.lang.IllegalArgumentException: Cannot get ID for packet class net.md_5.bungee.protocol.packet.PlayerListHeaderFooter in phase CONFIGURATION with direction TO_CLIENT
	at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:125)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:893)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:956)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:982)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:950)
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1019)
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:310)
	at net.md_5.bungee.netty.ChannelWrapper.write(ChannelWrapper.java:103)
	at net.md_5.bungee.UserConnection$1.sendPacket(UserConnection.java:155)
	at net.md_5.bungee.UserConnection.lambda$sendQueuedPackets$1(UserConnection.java:217)
	at net.md_5.bungee.netty.ChannelWrapper.scheduleIfNecessary(ChannelWrapper.java:240)
	at net.md_5.bungee.UserConnection.sendQueuedPackets(UserConnection.java:208)
	at net.md_5.bungee.connection.UpstreamBridge.handle(UpstreamBridge.java:374)
	at net.md_5.bungee.protocol.packet.FinishConfiguration.handle(FinishConfiguration.java:35)
	at net.md_5.bungee.netty.HandlerBoss.channelRead(HandlerBoss.java:128)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:289)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalArgumentException: Cannot get ID for packet class net.md_5.bungee.protocol.packet.PlayerListHeaderFooter in phase CONFIGURATION with direction TO_CLIENT
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:463)
	at net.md_5.bungee.protocol.Protocol$DirectionData.getId(Protocol.java:992)
	at net.md_5.bungee.protocol.MinecraftEncoder.encode(MinecraftEncoder.java:27)
	at net.md_5.bungee.protocol.MinecraftEncoder.encode(MinecraftEncoder.java:10)
	at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:107)
	... 48 more

My testplugin to replicate the race condition
You need 2 backends names lobby and lobby2


import net.md_5.bungee.api.ProxyServer;
import net.md_5.bungee.api.chat.TextComponent;
import net.md_5.bungee.api.config.ServerInfo;
import net.md_5.bungee.api.connection.ProxiedPlayer;
import net.md_5.bungee.api.event.ChatEvent;
import net.md_5.bungee.api.event.ServerKickEvent;
import net.md_5.bungee.api.event.ServerSwitchEvent;
import net.md_5.bungee.api.plugin.Listener;
import net.md_5.bungee.api.plugin.Plugin;
import net.md_5.bungee.event.EventHandler;

import java.util.concurrent.TimeUnit;

public final class MoveToFallback extends Plugin implements Listener {

    ServerInfo info;
    ServerInfo info2;

    @Override
    public void onEnable() {
        getProxy().getPluginManager().registerListener(this, this);
    }

    @EventHandler
    public void onServerKick(ServerKickEvent event) {
        event.setCancelled(true);
    }

    @EventHandler
    public void onChar(ChatEvent event) {
        if (event.getMessage().startsWith("bert")) {
            info = ProxyServer.getInstance().getServers().get("lobby");
            info2 = ProxyServer.getInstance().getServers().get("lobby2");
            bert((ProxiedPlayer) event.getSender());
        }
    }

    final int delay = 300;

    public void bert(ProxiedPlayer player) {
        if (player.isConnected())
            ProxyServer.getInstance().getScheduler().schedule(this, () -> {
                player.connect(info, (aBoolean, throwable) -> bert2(player));
            }, delay, TimeUnit.MILLISECONDS);
    }

    public void bert2(ProxiedPlayer player) {
        if (player.isConnected())
            ProxyServer.getInstance().getScheduler().schedule(this, () -> {
                player.connect(info2, (aBoolean, throwable) -> bert(player));
            }, delay, TimeUnit.MILLISECONDS);
    }

    @EventHandler
    public void onServerSwitch(ServerSwitchEvent event) {
        ProxiedPlayer player = event.getPlayer();
        player.setTabHeader(new TextComponent("" + System.currentTimeMillis()), new TextComponent("b " + System.currentTimeMillis()));
    }
}

@Outfluencer Outfluencer requested a review from md-5 November 30, 2024 18:06
@md-5
Copy link
Member

md-5 commented Dec 1, 2024

This does not seem right - can you explain why the issue arises and this fixes it?

@Outfluencer
Copy link
Collaborator Author

Outfluencer commented Dec 1, 2024

i am not really sure

it arises if you switch servers quickly

i looked at it again and i debugged the encoder and decoder protocols
The problem is that user.unsafe().sendPacket( new StartConfiguration() ); is called wrongly i guess and changes the prtocol
the sync stuff is jst fixing it because it doesnt allow the 2. startconfig to be sent. but thats not optimal

we are currently always have the encoder protocol set to GAME if the part of the code is called and the decoder protocol is GAME if it works and config if it does not so i think we dont need to check if the encoder protocol is not config, we need to check if the decoder protocol is.

@Outfluencer
Copy link
Collaborator Author

I tested this and it works for me like this without any issues

@Outfluencer
Copy link
Collaborator Author

Outfluencer commented Dec 1, 2024

@BoomEaro is there a reason you added the check for the encoder protocol?
should we maybe check for both encoder and decoder protocol?

@Outfluencer Outfluencer changed the title Fix configuration race condition that caused disconnects Fix configuration issue check for decoder protocol Dec 1, 2024
@Janmm14
Copy link
Contributor

Janmm14 commented Dec 1, 2024

can you explain why the issue arises and this fixes it?

explain why the issue arises pls, outfluencer.

Is there a particular reason why you cancel the ServerKickEvent in your example plugin?
How did you set up priorities in the bungee config?

.

What I have to say for this from a purely I-watched-at-the-code standpoint:
I do not see how there is a need for an atomic value as we should be always on the user's event loop.

What is happening which causes, the configuration restarting before FinishConfiguration packet of the server arrives?

As there seem to be seemingly interweaving connection changes somehow, I do not get how skipping configuration starts could keep connections working correctly.

If we are inside configuration mode already, we cannot simply decide now to connect to another server - we need to restart the configuration phase for the new server (according to wiki.vg it seems we have to at least watch whether "clientbound known packs" has been sent already and send it if its not sent already so we can finish config phase and start configuration phase again).

As far as I see, current bungee also does not attempt to reset configuration state - it expects configuration state to either not really start at all (i.e. no connection to target server), or finish through or hope that it all goes well when target server changes, not just because multiple connection attempts are made at the same time, but also when a kick happens.
or does the client somehow reset its configuration state or allows overriding its config (testing this might be hard i guess)?

And the code for handling connection loss in ServerConnector is also not really player-friendly - in this case it seems chanches are great in kicking the player instead of trying to send him back to the previous server.

My final list for 1.20.2+ clients:

  • Don't start connect0 attempt to different server while previous is still in configuration phase (keep only newest connection request?)
  • Add graceful configuration phase re-starting, if needed by client
  • Should we abort server switching in-progress when newer connection request exists, or only if its failing?
  • Do we have to prevent alternating connection attempts if we are aborting server switches while they're running (and is there sufficient log output to detect such a thing)?
  • What ServerConnectRequest.Result do we present if we abort a connection request due to a newer one?

.
.
Why does existing code scheduleIfNeccessary inside UserConnection#sendQueuedPackets, right now its only called by UpstreamBridge's handle(FinishConfiguration)?

@Leymooo
Copy link

Leymooo commented Dec 1, 2024

I think the main issues is how UpstreamBridge#FinishConfiguration is handled. I think there can be multiple race conditions in how bungee handles server switching and configure states.

In older version server switching was very simple, wait for Login packet from new server -> send cleanup packets, send change dimension screen packets, close connection to the old server, start passing all new packets to client from a new server. You do all this stuff in single batch (in a single packet handle call). Also event order is ServerConnectedEvent then ServerSwitchEvent.

In new versions it very different, and too many edge cases when something can go wrong, because you need to wait for client responses, to continue server switching procedure.

CEP = Bungee->Client encode protocol
CDP = Client -> Bungee decode protocol
SEP = Bungee -> Server encode protocol
SDP = Server -> Bungee decode protocol

  1. Wait for LoginSuccess packet from server -> close connection to old server, send StartConfiguration to client(changes CEP= Configuration), start passing packets from new server to client (if any) (Calls ServerSwitchEvent)

  2. Wait for StartConfiguration from client (changes CDP=Configuration) -> send LoginAcknowledged to new server (changes SEP,SDP=Configuration) to continue configuring

  3. Wait for FinishConfiguration from server(Changes SDP=Game) -> propage packet to client (Changes CEP=Game)

  4. Wait for FinishConfiguration from client(Changes CDP=Game) -> flushes delay queue, propagate packet to server (Changes SEP = game)

  5. Wait for Login packet from server -> send cleanup packets and dimension change screen, continue to forward all packets from server to client. Calls ServerConnectedEvent.

Final event order is ServerSwitchEvent then ServerConnectedEvent (vs ServerConnectedEvent -> ServerSwitchEvent on legacy versions)

As you can see there is too many places when edge case race condition can happen.

For example, what it we start to connect to a new server when we are waiting for StartConfiguration from client (2), so when we will start to handle LoginSuccess from server, CEP will already in Confuguration state and will get Cannot get ID for packet (this is answer for @BoomEaro is there a reason you added the check for the encoder protocol?)

What if we recieve StartConfiguration from client and did remote server configure(UpstreamBridge#configureServer) and then started to connect player to new server, so we never got FinishConfiguration from old server, and also did not send StartConfiguration to client because were already in CONFIGURATION, so transition to new server will never happen, because UpstreamBridge#configureServer will never called.

What if we recieve FinishConfiguration from server and forward it to client, but while waiting FinishConfiguration packet from client , we connect player to another server? Then in time when we recieve FinishConfiguration from client, CEP will be again in CONFIGURATION state(1), and we will flush delayed packets while being in transit to new server.
In this case i think FinishConfiguration will also leak to backend server, while backend being in LOGIN protocol, so we will get Cannot get ID for packet when forwaring this packet to backend server

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants