Now that it's time to think about implementation, one obvious question is how can we implement realtime communication over the web.
The HTTP protocol is used by web applications as their primary communication method.
It works on a request-response model which is simply:
This works great for many things on the web like submitting forms, fetching specific data, creating posts on social media platforms etc.
However, this request-response model means HTTP is a one-way communication protocol. Only the client can initiate communication with the server. As you can probably imagine, this creates a problem when trying to build realtime features.
Realtime features require the server to be able to send data to the client without the client asking.
For example, take a chat application like WhatsApp:
But here's the issue: with HTTP, only clients can initiate communication. That means User B won't know there's a new message unless they actively send a request to check. Even then, when do they know to send the request?
There a few solutions to this:
Polling is where the client simply sends a request at regular intervals to check for updates.
This is simple to implement, yet this solution is very wasteful:
Long Polling is an improvement on Polling:
This reduces the number of connections made from Polling but still has issues:
Despite this, long polling can be useful:
Why exactly this is the case will make more sense when we talk about WebSockets...
WebSockets are the ideal solution for applications that need true, two-way communication between the client and server which makes them perfect for applications with crucial realtime features like Mini Kahoot.
WebSockets offer a different paradigm to what we have seen with the polling techniques. WebSockets establish a single, long-lived connection between the client and server. Once its connection is established, data can flow freely in both directions at any time, without the need for repeated requests.
WebSockets improve on the polling techniques from before, effectively solving:
This long-lived connection is different from what happens in Long Polling - something that initially confused me when I first learned about WebSockets was how it could be more efficient when both held long connections.
In Long Polling, the connection closes as soon as the server responds with data (which follows the standard behavior of HTTP). A new connection must then be created for the next update, introducing overhead.
In contrast, WebSockets use a separate, lightweight protocol that keeps the connection open even after data is sent, only closing when the client or server explicitly decides to end it.
Let's dive into exactly how WebSockets provide true, bidirectional communication over the web.
The aim of this handshake is to upgrade the existing HTTP connection to a WebSocket connection.
WebSockets start as a standard HTTP request and response. In this interaction, the client asks to open a WebSocket connection and, if it is able to, the server responds, successfully completing the handshake.
The handshake follows these steps:
1. Client Request
The client sends a HTTP GET request to a WebSocket URI. WebSocket URIs look the same to HTTP URIs, except they begin
with a ws:
or wss:
(secure web socket) instead of http:
or https:
The request includes the following headers:
Connection: Upgrade
indicates we want to use a different protocol to HTTP.
Upgrade: websocket
The Upgrade
header in general is used to switch the connection to the given protocol. It
could be a list of protocols also, which specify decreasing order of preference for the protocol switch.
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
The Sec-WebSocket-Key
is a random 16-byte value that has been
base64-encoded. Its use will be explained later.
Sec-WebSocket-Version: 13
Currently, the only accepted version of the WebSocket protocol is 13. No other version
will work.
2. Server Response
The server response is a HTTP 101 Switching Protocols response and includes the the following headers:
Connection: Upgrade
Confirms that the connection has been upgraded.
Upgrade: websocket
Confirms that the connection has been upgraded.
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
This value is computed by concatenating
258EAFA5-E914-47DA-95CA-C5AB0DC85B11 to the key received from the client and performing a SHA-1 hash followed by a
Base64 encode on the result. The client performs the same algorithm on the key it sent earlier, ensuring the computed
values match. This prevents malicious users from tricking servers into treating non-WebSocket connections as WebSocket
connections, which could lead to unpredictable behaviour.
After the server response, the handshake is complete and the client and server have agreed to use the existing TCP/IP connection established for the HTTP request as a WebSocket connection. Data can flow both ways in this connection via a simple framed protocol.
In the WebSocket protocol, data is split into frames which can be sent by both the client and the server.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
Taken from the RFC 6455 (where you can find full details about this), this is what a WebSocket frame looks like.
The important parts of the frame to highlight are:
A closing frame is sent (opcode 0x08
) to close a connection. If either side of a connection receives a close frame
, it must send a close frame in response. Once the close frame has been received by both parties, the server initiates
closing the TCP connection.
That was an overview of how the WebSocket protocol works! In practice, you don't have to worry about these granular details as there are many libraries available that take care of handling these connections for us.
If the protocol connection is no longer HTTP, how is data sent between the client and server?
Instead of requests, messages are passed between parties through events. Each message over a WebSocket connection is structured as an event, identified with a unique name with relevant data. The receiving party listens for specific event names, allowing it to handle incoming data appropriately based on the type of event received.
Events are classed into two types:
Mini Kahoot requires WebSockets for the following features:
Now equipped with the knowledge of how realtime communication works over the web, the next posts will cover the database design, backend REST API design and WebSocket server design to handle the realtime features.