Wednesday, August 30, 2023

Timeouts In Envoy

Just a quick summary post. Envoy allows configuring various timeouts that allow tweaking the behavior of HTTP as well as general TCP traffic. Here is a summary of a few common ones. 

Why am I writing this? Like with a lot of Envoy documentation, all of this is documented, but not in a shape that is easy and quick to grok. Note that you need to understand Envoy and the structure of its configuration to fully understand some of the details referred to, but you can form an idea even without it if you understand TCP and HTTP in general.

There are chiefly three kinds of timeouts:

  1. Connection timeout: how long does it take to establish a connection?
  2. Idle timeout: how long does the connection exist without activity?
  3. Max duration: after how long is the connection broken irrespective of whether it is active or not? This is often disabled by default.

These often apply reasonably to both downstream and upstream connections, and configured appropriately either under a listener (in HTTP Connection Manager or TCP proxy) or in a cluster.

Connection timeouts

How long does it take to establish a connection?

This is a general scenario which can apply to either plain TCP or HTTP connection. There is also an HTTP analog in the form of stream timeout or the time it takes to establish an HTTP/2 or HTTP/3 stream.

A very HTTP-specific timeout is: How long would the proxy wait for an upstream to start responding after completely sending an HTTP request to it?

This is called a route timeout, that is set at the route level, and defaults to 15s. It can of course be overridden for individual routes.

Idle timeouts

How long can a connection stay idle without traffic in either direction?

Again, a general scenario that could apply to either plain TCP or HTTP connections. With HTTP/2 and above, idleness would require no active streams for a certain period. There is also an HTTP analog for streams in the form of an idle timeout for individual streams. These can also be overridden at the HTTP route level.

Here is another one. How long should a connection from the proxy to an upstream remain intact if there are no corresponding connections from a downstream to the proxy?

This is called TCP protocol idle timeout and is only available for plain TCP and is in fact a variation of the idle timeout.

Max duration

How long can a connection remain established at all, irrespective of whether there is traffic or not? This is normally disabled by default. It is not available for plain TCP, only for HTTP. Even when enabled, if there are active streams, those are drained before the connections is terminated. May be useful in certain situations when we want to avoid stickiness, or upstream addresses have changed and need reconnection without the older endpoints going away. There is an HTTP analog for maximum stream duration. These can also be overridden at the HTTP route level.

There are a few other timeouts with specific uses available, but the above is a good summary.


No comments: