-
Notifications
You must be signed in to change notification settings - Fork 79
Styx Data Path
This document discusses Styx forwarding path for HTTP requests. Writing it here for the benefit of new developers.
Specifically we will cover the following:
- Styx live request/response messages
- Content body publishers
- Http Pipeline Handler
Styx is a cut-through proxy (https://en.wikipedia.org/wiki/Cut-through_switching). It begins forwarding the received message as soon as the HTTP headers are received. It won’t wait for the message body. The body will stream through later as soon as it is received.
Styx LiveHttpRequest
and LiveHttpResponse
message types are designed for the cut-through proxying. They expose all HTTP headers, but the content is available asynchronously via ByteStream
publisher:
class LiveHttpRequest {
..
public ByteStream body() { ... }
..
}
The ByteStream is just a reactive streams Publisher
:
public class ByteStream implements Publisher<Buffer> { … }
A LiveHttpRequest
propagates synchronously through Styx Core. It is passed on the call stack,
from a handler to handler, and from an interceptor to an interceptor. But in the opposite direction,
a LiveHttpResponse
comes back as an asynchronous reactive Publisher
event.
Both request and response body always flows through asynchronously via ByteStream publisher.
ByteStream
must support flow control. Without flow control Styx will ingest too much data at once
and run out of memory. We will discuss flow control in depth later in this document.
A ByteStream
is a Reactive Streams Publisher
, and as such it supports non blocking back pressure.
There are other good sources of information about reactive back pressure protocol, and I won’t go
into details here. But in a nutshell:
- The
Subscriber
requests for N events. - The
Publisher
will emit up to the requested N events, and never more than the requested N events. - When
Publisher
produces or receive more than N events, it must discard, queue, or otherwise stop emitting the excess events until the subscriber requests for more.
Within Styx there are few ByteStream
consumers with different back pressure characteristics. But
because the ByteStream
is consumable with standard Reactive Streams APIs, we aim to keep the underlying
implementation flexible enough to cater for different use cases. The consumers within Styx are:
-
HttpResponseWriter
- Consumes the response body in order to proxy it. That is, to send it across the network to the remote recipient. -
HttpRequestOperation
- as above, but for the request body. - HTTP Content Aggregator - Aggregates the entire request/response body to a one single buffer. Therefore it disables the back pressure content, and consumes all at once.
- Intermediate handlers and interceptors. Although usually they are neutral from back pressure point of view.
When the ByteStream
is sourced from a Netty channel (or more generally from a Network socket), it can't discard user data packets. Instead it queues all excess data until its consumer (reactive streams Subscriber
) is able to accept more.
Queuing will become a problem when the upstream or downstream is significantly faster than another. Consider a slow client downloading a large, say a 1GB file from a fast server. The file would end up sitting in Styx receive queue until the client gradually consumes it. We can't take many such clients before running out of memory:
- Styx will exhaust Netty direct memory.
- Or it will exhaust JVM heap. Normally we need less direct memory than heap.
Styx implements Flow Control to prevent this problem. Flow control adjusts the upstream and downstream speeds to minimise the need for queuing. This is how it works.
-
ByteStream
data producer queues all excess data packets. - It suspend Netty channel by setting its
autoRead
property to false. This stops Netty from issuing morechannelRead
events until we ask for more. This then fills the TCP receive window, and eventually slows down the remote sender. - Reactive
request()
s from a subscriber will dequeue some queued content. When the queue is exhausted, Styx asks Netty for more data.
Ultimately ByteStream
subscriber uses back pressure request
s to control the rate at which publisher produces events.
Styx HttpResponseWriter
(and request writer) consumes ByteStream
one buffer at a time. It requests for the next one only after the preceding chunk has successfully been written to the Netty channel.
The ByteStream
is non-opinionated of its underlying data source. You can create a ByteStream
from Java String
like so:
ByteStream.from("Hello!", UTF_8);
Or from another Publisher
:
Publisher p = createPublisher(...);
new ByteStream(p);
The FlowControllingPublisher
and FlowControllingHttpContentProducer
classes implement a reactive Publisher
to source ByteStream
content from Netty channel. The former is just a shallow shim to implement the Publisher
interface, while the latter does all the heavy lifting.
The primary purpose of FlowControllingHttpContentProducer
is to implement non blocking back pressure, and to enforce the Reactive Streams event order. It also mediates between asynchronous event sources:
- Netty pipeline - produces content data (Buffer) events on the Netty IO thread.
- Content subscriber - generates reactive back pressure requests that, and runs on a separate thread.
As it publishes "real time" networking events, it has some additional considerations:
- It cannot lose any user data. All content chunks are queued until a subscriber subscribes.
- It allows only one subscription. There is little point having a 2nd subscription half way through the stream.
- It has to deal with network errors, timeouts etc.
We implemented FlowControllingHttpContentProducer
using a Finite State Machine (FSM).
An FSM is a great tool to manage complexity in asynchronous environment. It makes it easy to reason about the content stream state. This would be notoriously difficult otherwise. Perhaps most importantly we can enumerate all possible state/transition pairs to ensure all possible scenarios are covered.
Other important scenarios to consider, are:
- Subscriber times out
- Network times out
- Threading model, and lock free operation
- TCP connection closure
The detailed design for the finite state machine is described here: https://github.com/HotelsDotCom/styx/wiki/Flow-Controlling-Content-Producer