Theia 1.23: socket.io upgrade requires stickiness on load balancer

We have a Theia-based application which is multi-user (I know it’s not recommended!) and we deploy multiple instances of Theia behind a load balancer.

When we upgraded to Theia 1.23, our deployed enviroment just started to break and we had to rollback.

I was seeing the following periodic error in the console (polling):

{ code: 1, message: "Session ID unknown" }

I have looked at the Changelog, and didn’t see anything there that could impact us. But investigating further and looking at the error, I found out that the websocket layer has been refactored to use Socket.io in version 1.23.

Then looking at the socket.io documentation, I found that Socket.io requires you setup sticky-sessions on the load balancer. Here is the doc.

So to fix the issue, we just had to enable sticky session on our load balancer and it did the trick.
We could then upgrade to Theia 1.23.

In case you’re using an Elastic Load Balancer on AWS like we do, here is the doc.
You do it on the EC2 Target Group, I’m attaching our settings if this could help others.

I think it would be a good idea to mention the upgrade of the websocket layer to use Socket.io as a potential breaking change in the Changelog. But I’ll leave this to you.

We did include it as part of the changelog, under breaking changes, but it can be sneaky.

We should add what you just found out with regards to load balancing in the migration guide.

We should add what you just found out with regards to load balancing in the migration guide.

@drochgenius can you open a PR doing this?

Sorry, not sure why I missed the mention in the changelog.
Sure, I’ll open a PR for the migration guide.

Here is the PR: https://github.com/eclipse-theia/theia/pull/10838

1 Like

But don’t websockets need to be sticky in any case? I mean: you can’t just switch to a different back end in the middle of a socke connection, right? Do we understand why we need the flag now?

I can confirm that prior to the Socket.io migration, the previous implementation of Theia websockets were resilient to load balancing. We’ve been scaling Theia that way for 3 years, without any issue. That’s why we never thought of turning on sticky sessions on the load balancer. As of version 1.23, it just doesn’t work anymore without sticky sessions, the Theia app is not even loading. That’s why I think it’s a good idea to flag it now.

I have not much experience with implementing web socket connections myself, but normally, you are right, I would think web sockets would require stickiness.

Maybe socket.io is hitting the fallback case where the old implementation wasn’t? Is that something we should report to socket.io? I would expect stickiness for websockets to be handled automatically in the load balancer, since it’s a standard protocol. @msujew do you know more?

@tsmaeder there is no real “fallback-case”. Socket.io initiates the connection with a HTTP-Handshake and only after that switches to a websocket connection. If that gets rejected it simply continues with the existing HTTP connection. It’s kind of built into the protocol.

1 Like