Monday, November 13, 2023

Writing an Envoy Filter like a mere mortal (not a Ninja)

This article, like its predecessor Quickly Building Envoy Proxy, is an attempt to document what should have been widely documented but isn't. Serious open source communities sometimes function in an elitist way to perhaps keep the entry bar high. Maybe that's why they consciously avoid being too focused on documenting the basic mechanisms that people need in order to work with the code base. But for commoners like me this lack of documentation becomes motivation to figure things out and write about them with the hope that someone else finds it easier.

If you're building Envoy from source code, then maybe you've got reason to modify Envoy source code, and one of the most common things that people need to do with the Envoy code base is to write new filters.

What are filters?

You possibly already know this, but filters are pluggable code that you can run within Envoy to customize how you process incoming requests, what responses you send, etc. When a request enters Envoy, it enters through a listener (a listener port). This request then goes through a filter chain - which might be chosen from among multiple filter chains based on some criteria - like the source or destination address or port. Each filter chain is a sequence of filters - logic that runs on the incoming request and perhaps modifies it or determines whether it would be processed further and how.

Filters can be written in C++, Lua, Wasm (and therefore any language that compiles to Wasm), and apparently in Go and Rust too. I know precious little about the last ones but they sound interesting. Lua filters are quite limited in many ways and hence I decided not to focus on them. Native filters in C++ seem to be adequate for most purposes so this post is about them.

What are the different types of filters?

There are different kinds of filters. Listener filters are useful if some actions need to be performed while accepting connections. Then there are network filters which operate at the L4 layer on requests and responses. Two such filters are HTTP Connection Manager (HCM) which is used to process http traffic, and TCP Proxy, which is used to route generic TCP traffic. Within HCM, it is possible to load a further class of filters which are aware of the http protocol - these are called HTTP filters and operate at the L7 layer. In this article, we will focus on L4 layer filters or network filters.

Where does one write filters?

Filter writing seems to have gotten a wee bit easier over the successive Envoy versions and seemed somewhat agreeable when I tried it on version 1.27.x. One can write a filter in a separate repo and include all of the Envoy code as a sub-module. However, I wanted to write an in-tree filter - just like the several filters already part of the code base.

How to write your first filter?

We will write a network filter which will log the destination IP of a request as it was received, and then forward the request to some destination based on a mapping of destination IPs to actual target addresses. This is actually useful functionality which isn't available out of the box from Envoy, but requires just a small amount of filter code to get going. We will call this filter the Address Mapper filter. So what do we need?

The config proto

Most filters need some configuration. In our case, our filter would take a map of IP addresses to cluster names - Envoy clusters representing one or more endpoints where the traffic could be sent. So essentially, we are looking for a map<string, string> as input to the filter. However, to make things a little bit more type-safe, Envoy needs to know exactly what is the type of the input config. So we must define a protobuf message describing this input config. We create this under api/envoy/extensions/filters/network/address_mapper/v3. The filter would be a network filter, and it would be called address_mapper. So we created a directory under api/envoy/extensions/filters/network, by the name address_mapper. One further sub-directory under it, v3, holds the actual protos. v3 represents the current generation of Envoy's config API - v1 and v2 are obsolete versions. The proto file, address_mapper.proto, is placed under v3 and has the following content.

syntax = "proto3";

package envoy.extensions.filters.network.address_mapper.v3;

import "udpa/annotations/status.proto";

option java_package = "io.envoyproxy.envoy.extensions.filters.network.address_mapper.v3";
option java_outer_classname = "AddressMapper";
option java_multiple_files = true;
option go_package = "github.com/envoyproxy/go-control-plane/envoy/extensions/filters/network/address_mapper/v3;address_mapperv3";
option (udpa.annotations.file_status).package_version_status = ACTIVE;

// [#protodoc-title: Address mapper]
// Connection limit :ref:`configuration overview <config_network_filters_connection_limit>
// [#extension: envoy.filters.network.address_mapper]

message AddressMapper {
  // address_map is expected to contain a 1:1 mapping of
  // IP addresses to other IP addresses or FQDNs.
  map<string, string> address_map = 1;
}

We must also create a Bazel BUILD file in the same directory, and that's the limit of what I am qualified to say about these abominations used to build the whole Envoy binary and its various parts. So

# DO NOT EDIT. This file is generated by tools/proto_format/proto_sync.py.

load("@envoy_api//bazel:api_build_system.bzl", "api_proto_package")

licenses(["notice"])  # Apache 2

api_proto_package(
    deps = ["@com_github_cncf_udpa//udpa/annotations:pkg"],
)

If at this time you want to interject profanities about Bazel (or at any other time), you know it is wrong.

Anyway, so you need to link your proto up to the build chain. So you have to make entries inside api/BUILD and api/versioning/BUILD. Make the following entry under the v3_protos library in api/BUILD, and under active_protos in api/versioning/BUILD.

"//envoy/extensions/filters/network/address_mapper/v3:pkg",

We must also create a type URL that Envoy would recognize and instantiate the config message of the correct type. To do this we create an entry for the AddressMapper message inside source/extensions/extensions_metadata.yaml.

envoy.filters.network.address_mapper:
  categories:
  - envoy.filters.network
  security_posture: robust_to_untrusted_downstream_and_upstream
  status: stable
  type_urls:
  - envoy.extensions.filters.network.address_mapper.v3.AddressMapper

This introduces the new filter, and a type URL for the config proto on the last line. We must also tell Bazel where the source code for the new filter is present. To do this we edit source/extensions/extensions_build_config.bzl creating the following entry in the network filters section:

"envoy.filters.network.address_mapper":                       "//source/extensions/filters/network/address_mapper:config",

Envoy must also recognize the fully-qualified string representing the new network filter we are going to create. Because it is a network filter, we add it in source/extensions/filters/network/well_known_names.h. Inside the class NetworkFilterNameValues, we add the following const member.

// Address mapper filter
const std::string AddressMapper = "envoy.filters.network.address_mapper";

The filter logic

We must add the filter logic somewhere. To do this, we create a new directory called address_mapper under source/extensions/filters/network/address_mapper/. We first add the AddressMapperFilter filter class definition in address_mapper.h and also an AddressMapperConfig class which wraps the config message passed via the Envoy config. These are all inside the Envoy::Extensions::NetworkFilters::AddressMapper namespace.

class AddressMapperConfig {
public:
  AddressMapperConfig(const FilterConfig& proto_config);

  absl::string_view getMappedAddress(const absl::string_view& addr) const;

private:
  absl::flat_hash_map<std::string, std::string> addr_map_;
};

The filter takes a shared_ptr to the above config class.

using AddressMapperConfigPtr = std::shared_ptr<AddressMapperConfig>;

class AddressMapperFilter : public Network::ReadFilter, Logger::Loggable<Logger::Id::filter> {
public:
  AddressMapperFilter(AddressMapperConfigPtr config);

  // Network::ReadFilter
  Network::FilterStatus onData(Buffer::Instance&, bool) override {
    return Network::FilterStatus::Continue;
  }

  Network::FilterStatus onNewConnection() override;

  void initializeReadFilterCallbacks(
          Network::ReadFilterCallbacks& callbacks) override {
    read_callbacks_ = &callbacks;
  }

private:
  Network::ReadFilterCallbacks* read_callbacks_{};
  AddressMapperConfigPtr config_;
};

The implementation of the onNewConnection method is in the address_mapper.cc file. For example, we can get the original destination address like this.

Network::Address::InstanceConstSharedPtr dest_addr =
      Network::Utility::getOriginalDst(const_cast<Network::Socket&>(read_callbacks_->socket()));

We can then map this address to the target cluster, etc.

Someone has to instantiate this filter and pass it the correct type of argument (AddressMapperConfigPtr). That responsibility falls with the glue code or filter factory, which we look at next.

The glue code

We define the config factory (AddressMapperConfigFactory) class inside the config.h header in the filter directory. These are all inside the Envoy::Extensions::NetworkFilters::AddressMapper namespace.

class AddressMapperConfigFactory
    : public Common::FactoryBase<
       envoy::extensions::filters::network::address_mapper::v3::AddressMapper> {
public:
  AddressMapperConfigFactory() : FactoryBase(NetworkFilterNames::get().AddressMapper) {}

  /* ProtobufTypes::MessagePtr createEmptyConfigProto() override; */
  std::string name() const override { return NetworkFilterNames::get().AddressMapper; }

private:
  Network::FilterFactoryCb createFilterFactoryFromProtoTyped(
      const envoy::extensions::filters::network::address_mapper::v3::AddressMapper& proto_config,
      Server::Configuration::FactoryContext&) override;
};

We now add the implementation for createFilterFactoryFromProtoTyped, which is the entry point for filter instantiation.

Network::FilterFactoryCb AddressMapperConfigFactory::createFilterFactoryFromProtoTyped(
    const envoy::extensions::filters::network::address_mapper::v3::AddressMapper& proto_config,
    Server::Configuration::FactoryContext&) {

  AddressMapperConfigPtr filter_config = std::make_shared<AddressMapperConfig>(proto_config);
  return [filter_config](Network::FilterManager& filter_manager) -> void {
    filter_manager.addReadFilter(std::make_shared<AddressMapperFilter>(filter_config));
  };  
}
Given the protobuf input from the configuration, this code gives back an instance of the actual filter initialized with this config.

How to compile your first filter?

You need to ensure that your filter code is included in the BUILD. Create the BUILD file in your filter directory with the following content.

load(
    "//bazel:envoy_build_system.bzl",
    "envoy_cc_extension",
    "envoy_cc_library",
    "envoy_extension_package",
)

licenses(["notice"])  # Apache 2

envoy_extension_package()

envoy_cc_library(
    name = "address_mapper",
    srcs = ["address_mapper.cc"],
    hdrs = ["address_mapper.h"],
    deps = [
        "//envoy/network:connection_interface",
        "//envoy/network:filter_interface",
        "//source/common/common:assert_lib",
        "//source/common/common:minimal_logger_lib",
        "//source/common/tcp_proxy",
        "//source/common/protobuf:utility_lib",
        "//source/common/network:utility_lib",
        "@envoy_api//envoy/extensions/filters/network/address_mapper/v3:pkg_cc_proto",
    ],
    alwayslink = 1,
)

envoy_cc_extension(
    name = "config",
    srcs = ["config.cc"],
    hdrs = ["config.h"],
    deps = [
        ":address_mapper",
        "//envoy/registry",
        "//envoy/server:filter_config_interface",
        "//source/extensions/filters/network/common:factory_base_lib",
        "//source/extensions/filters/network:well_known_names",
        "@envoy_api//envoy/extensions/filters/network/address_mapper/v3:pkg_cc_proto",
    ],
)

The exact dependencies listed depend on what you need to call from within your filter code (something we haven't yet shown). For example, the protobuf utility_lib or network utility_lib are listed, as is the network connection_interface.

The previous article in this series already shows how to build Envoy. That is all you need to do to build Envoy with this filter enabled. One handy ability is to build Envoy with debug symbols. This is quite easy:

bazel build envoy -c debug

The binary is created under ./bazel-bin/source/exe/envoy-static.

Configuring Envoy to run your filter

In our case, we want the filter to be put before a TCP Proxy filter. So the config should look like this:

           - name: envoy.filters.network.address_mapper
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.address_mapper.v3.AddressMapper
                address_map:
                  "169.254.1.2": "cluster_1"
                  "169.254.1.3": "cluster_2"
            - name: envoy.filters.network.tcp_proxy
              typed_config:
                ...

The assumption is that the clusters cluster_1 and cluster_2 are separately defined elsewhere in the config. Our filter checks if the original destination IP of the incoming request matches the IPs listed in the address map and if it does, then it sets a connection streamInfo filter-state metadata (TcpProxy::PerConnectionCluster) that tells the ensuing TCP proxy filter to forward the request to the mapped cluster.

Conclusion

There are lots of gaps in this article (because it was hurriedly written), but refer to existing filter code to fill those gaps in. It should be fairly straightforward.



Read more!

Monday, October 02, 2023

Quickly Building Envoy Proxy

Building Envoy isn't all that hard. We have to use Bazel / Bazelisk for the process. Here are the steps summarized for quick reference:

cd ~/Downloads
wget https://github.com/bazelbuild/bazelisk/releases/latest/download/bazelisk-linux-amd64:
sudo mv ~/Downloads/bazelisk-linux-amd64 /usr/local/bin/bazel
sudo chmod +x /usr/local/bin/bazel

Install / upgrade some local packages:
sudo apt install autoconf libtool curl patch python3-pip unzip virtualenv

Download Envoy source code:
mkdir -p github.com/envoyproxy
git clone https://amukherj@github.com/envoyproxy/envoy

Download and install clang+llvm:
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
tar xf -C tools clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
ln -s clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04 ~/devel/tools/clang+llvm

Install additional go utilities:
go install github.com/bazelbuild/buildtools/buildifier@latest
export BUILDIFIER_BIN=/home/amukher1/devel/go/bin/buildifier
go install github.com/bazelbuild/buildtools/buildozer@latest
export BUILDOZER_BIN=/home/amukher1/devel/go/bin/buildozer

Build the code. This step can take well over an hour, depending on your machine resources.
bazel build envoy


Read more!

Monday, September 18, 2023

My Driving Principles

The other day, I was thinking of things I heard or read, that left a lasting impression on me and changed how I approached life. In most cases, I remember when I heard or read them, who said it or where it was written, and in some cases, why it had an impression on me.

  1. Pride comes before a fall. Aka, don't be a narcissist.
  2. We don't deserve all the credit for our successes, nor all the blame for our failures. So judge kindly.
  3. Doing the same thing over and over again expecting different results is tantamount to insanity.
  4. Patience is godliness.
  5. The difference between a job done well and a shoddy job is often a only little extra time, effort, and care.
  6. The wise resist pleasure while the fool become its slave. Without moderation the greatest pleasures fade.
  7. Success comes to those who do what they must, even if they don't feel like doing it. Every single day.
  8. You never forget what you learn by teaching others.
  9. Always learn from your mistakes, but try as much to learn from that of others.
  10. To live is to learn every single day.
  11. You can't always repay kindness, but pay it forward to someone else.
  12. Being able to accommodate someone's imperfection and not making them feel bad about it is a great virtue. (I can't always practice this in my inner circle and this has been the most challenging principle for me.)
There might be a few others, but these have pretty much the principles that have defined and shaped how I think. In most respects, I am a work-in-progress in the light of these principles - but they give me direction.

Read more!

Sunday, September 17, 2023

Go versioning and modules

Go has evolved into the de facto language for building control plane and management plane services, and along the way the Go tooling has picked up quite a few semantics around versioning and dependency management. This is my attempt at a round up of it all, leaving the details to the fantastic articles linked at the end of this post.

Package versions

If you maintain a Go package on a git repo that other Go programs import, then it is strongly recommended to have a notion of versioning and release defined for your package and repo.

  • The standard practice is to use semver (semantic versioning) of the form <MAJOR_VER>.<MINOR_VER>.<PATCH_VER>.
  • An optional pre-release version tag can be suffixed using a hyphen, such as 1.2.0-alpha or 1.2.9-beta.2. A pre-release version is considered an unstable version.
  • Major versions indicate API generations.
    • Major version 0 indicates evolution and potentially unstable interfaces and implementations.
    • Major version 1 indicates that the API has stabilized, although additional interfaces could be added.
    • Further major version updates are mandated if and only if there are breaking changes in the API.
  • Minor versions indicate API and implementation progression that maintain backward compatibility.
  • Patch versions indicate bug fixes and improvements without API changes.
A point to note is that major version 0 is treated a bit differently. You could have breaking changes between 0.1 and 0.2, for example, and backward compatible API changes between 0.1.1 and 0.1.2. This is unlike say 1.1 and 1.2, which cannot have breaking changes between them, and 1.1.1 and 1.1.2 which cannot have API changes (even backward-compatible ones) between them.

For reasons that would become clear shortly, it is important to tag specific commits with these versions in the repo, using git tags. This allows the git repo containing your Go code to participate in versioned dependency management that go tools support. The convention for tags is the letter v followed by the version string.

Modules and packages

Modules are now the standard way for doing dependency management in Go. A module is just a directory structure with Go source code under it, and a go.mod file at the root of the directory structure acting as a manifest. The go.mod file contains the module path name that would be used to address the module, and the Go version used to generate the module. In addition, it contains a list of external packages that the code inside the module depends on, and their versions. Three things are important:
  • Code inside and outside the module should import packages inside the module using the module path and the path of the package within the module relative to the module root.
  • Go tooling automatically determines a module version to use for each package dependency.
    • If available, the latest tagged stable version is used. Here stable refers
    • If not, then the latest pre-release (i.e. unstable) version is used.
    • If not, then the latest untagged version is used. A pseudo-version string is generated for this of the form v0.0.0-<yyyymmddhhmmss>-<commit_hash>. This is the reason, it is better to have tagged versions indicating stability and backward compatibility.
  • Once a dependency of a module on a specific version of another package is established, it would not be automatically changed.
There are several important commands related to go modules that we need to know, and use as needed.

To create a module in a directory, we must run the following at the module root:
go mod init <module_path>
This generates the go.mod file.

To update module dependencies, including cleaning up obsolete dependencies, we must run the following at the module root:
go mod tidy
This updates the dependencies in the go.mod file.

By default, just building code within a package also updates the dependencies inside go.mod. However it does not clean up obsolete dependencies. We can also add a new dependency explicitly, without running a build, using:
go get <package>

To update to latest the minor version or patch version of a dependency, we can run:
go get -u <package>
Or, to specifically upgrade only the patch version without upgrading the minor version:
go get -u=patch <package>
One can also upgrade to a specific minor / patch version:
go get -u=patch <package>@<semver>
You'd need this when you want to test your code against a pre-release version of a package (that has used some discipline to also define and tag stable versions).

Major version upgrades

A major version upgrade for a Go package should be rare, and it would typically be rarer after a couple of major version upgrades. Why? Because a major version upgrade typically represents a new API and a new way of doing things. It necessarily breaks the older API. It means that the older API must continue to be supported for a decent time for clients to move to the newer version (unless for some specific or perverse reason you can leave your clients in the lurch). In that way, Go tooling treats v2 and beyond differently from v0 and v1.

Essentially, your code could link against two different major version of a given package. This is not possible to do with two different minor or patch versions under the same major version of a given package. This allows parts of your code to start using a newer major version of a dependency without all the code moving at the same time. This may or may not always be technically feasible but when it is, this is convenient.

  • The package maintainer would typically create their new major version package in a separate subdirectory of the older package, and name it v2 or v3, or so on, as a convention.
    • The code for the new major version could be a copy of the old code that is then changed, or a complete rewrite, or some mix of the two. Internally, the code for the new major version may continue to call older code that is still useful. These details are hidden from the client.
  • The client would import the package with the new version by including the /v2 or /v3 etc. version directory suffix in the package path.
    • Usually v0 and v1 do not require a suffix. But v2 onwards, the suffix is recommended.
    • If two different major versions are imported, a different explicit alias is used for the higher version. For example, mypkg and mypkgV2.
    • Going ahead, at some point if all dependencies on v0/v1 are removed, the mypkgV2 alias can be removed as well and Go compiler would import the mypkg/v2 package with the alias mypkg automatically.

Private repos

<Stuff to cover about the GOPRIVATE environment variable, personal access tokens, etc.>

References

The sequence of articles starting here is actually all you need and more.

  1. Using Go Modules
  2. Go Workspace


Read more!

Wednesday, August 30, 2023

Timeouts In Envoy

Just a quick summary post. Envoy allows configuring various timeouts that allow tweaking the behavior of HTTP as well as general TCP traffic. Here is a summary of a few common ones. 

Why am I writing this? Like with a lot of Envoy documentation, all of this is documented, but not in a shape that is easy and quick to grok. Note that you need to understand Envoy and the structure of its configuration to fully understand some of the details referred to, but you can form an idea even without it if you understand TCP and HTTP in general.

There are chiefly three kinds of timeouts:

  1. Connection timeout: how long does it take to establish a connection?
  2. Idle timeout: how long does the connection exist without activity?
  3. Max duration: after how long is the connection broken irrespective of whether it is active or not? This is often disabled by default.

These often apply reasonably to both downstream and upstream connections, and configured appropriately either under a listener (in HTTP Connection Manager or TCP proxy) or in a cluster.

Connection timeouts

How long does it take to establish a connection?

This is a general scenario which can apply to either plain TCP or HTTP connection. There is also an HTTP analog in the form of stream timeout or the time it takes to establish an HTTP/2 or HTTP/3 stream.

A very HTTP-specific timeout is: How long would the proxy wait for an upstream to start responding after completely sending an HTTP request to it?

This is called a route timeout, that is set at the route level, and defaults to 15s. It can of course be overridden for individual routes.

Idle timeouts

How long can a connection stay idle without traffic in either direction?

Again, a general scenario that could apply to either plain TCP or HTTP connections. With HTTP/2 and above, idleness would require no active streams for a certain period. There is also an HTTP analog for streams in the form of an idle timeout for individual streams. These can also be overridden at the HTTP route level.

Here is another one. How long should a connection from the proxy to an upstream remain intact if there are no corresponding connections from a downstream to the proxy?

This is called TCP protocol idle timeout and is only available for plain TCP and is in fact a variation of the idle timeout.

Max duration

How long can a connection remain established at all, irrespective of whether there is traffic or not? This is normally disabled by default. It is not available for plain TCP, only for HTTP. Even when enabled, if there are active streams, those are drained before the connections is terminated. May be useful in certain situations when we want to avoid stickiness, or upstream addresses have changed and need reconnection without the older endpoints going away. There is an HTTP analog for maximum stream duration. These can also be overridden at the HTTP route level.

There are a few other timeouts with specific uses available, but the above is a good summary.



Read more!

Friday, August 25, 2023

All about JWKS or JSON Webb Key Sets

What are JSON Web Key Sets?

Refer to this too to understand how it looks: https://auth0.com/docs/secure/tokens/json-web-tokens/json-web-key-set-properties

In addition, refer to this: https://redthunder.blog/2017/06/08/jwts-jwks-kids-x5ts-oh-my/.

Besides, here are some handy commands.

First up, to get the public key from the cert, run:

openssl x509 -pubkey -noout -in <cert_file>

To generate the value of n, run:

openssl rsa -pubin -modulus -noout < public.key

Finallly to get the exponent (e), run:

openssl rsa -pubin -inform PEM -text -noout < public.key

The kid field needs to be some value that uniquely identifies which key was used for encryption. x5t is SHA-1 thumbprint of the leaf cert but is optional and can be skipped.

What good are they?

They are used to put together multiple cert bundles, which could be used to validate auth tokens such as JWS tokens. Many systems including Envoy takes the bundle in JWKS format, and this also works well with SPIFFE/SPIRE type systems.



Read more!

Sunday, June 11, 2023

Configuring Calico CNI with VPP Dataplane for Kubernetes

This is a quick run-down of how to configure Calico for pod networking in a Kubernetes cluster. Calico comes in several flavors, and we look at Calico with the VPP data plane, as opposed to classic Calico. One reason for looking at this option is to be able to use encryption at the L2 layer using IPsec, which is supported by VPP but not classic Calico.

Installing Kubernetes

This article doesn't cover how to install Kubernetes - there are several guides for doing that, including this one. Once you have installed Kubernetes on your cluster nodes, and the nodes have all joined the cluster, it is time to install the CNI plugin to allow pods to communicate across nodes. However, there are a few things that need to be ensured even while configuring Kubernetes, before we actually get to installing the CNI plugin.

Calico by default uses the subnet 192.168.0.0/16 for the pod network. It's good to use this default if you can, but if you cannot, choose the subnet you want to use. Then set Kubernetes up with the alternative CIDR you have in mind. If you use kubeadm to set up Kubernetes, then use the --pod-network-cidr command-line option to specify this CIDR. Here is an example command-line to do this on the (first) master node:

    kubeadm init --apiserver-advertise-address=10.20.0.7 --control-plane-endpoint=10.20.0.7:6443 --pod-network-cidr=172.31.0.0/16

The output of this command would contain the kubeadm command-line to run on the other nodes to register them with the master. At this point, running kubectl get nodes would list the cluster nodes but they would be shown in the NotReady state. To change that, we would need to install Calico.

Installing Calico

The page here already summarizes the process of installing Calico with VPP quite well, but there are a few things that need to be called out.

Hugepages and vfio-pci

This step is required only if you want to choose a specific VPP driver that would be used to drive the physical interface, namely virtio, dpdk, rdma, vmxnet3 (VMware), or avf (certain Intel drivers). If this is not explicitly chosen but left to the default, even then these settings can improve performance, but the memory requirements would typically be higher per node.

On each cluster node, create a file called /etc/sysctl.d/calico.conf and add the following content.

vm_nr.hugepages = 512

Then run:

    sudo sysctl -p

Similarly, on each cluster node create a file call /etc/modules-load.d/calico-vfio-pci.conf, and put the following content line it.

vfio-pci

On CentOS / RedHat, this should be vfio_pci instead. Then run:

    modprobe vfio-pci   # or vfio_pci on CentOS / RedHat

Finally, reboot the node.

Customizing the installation config

Create the Tigera operator:

    kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.6/manifests/tigera-operator.yaml

Download the installation-default.yaml file and modify it as suggested below:

    wget https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/calico/installation-default.yaml

In this file, there would be two objects listed. An Installation object, and an APIServer object. We would only edit the manifest of the first. Under the spec.calicoNetwork sub-object of Installation, add the ipPools attribute as shown below:

spec:
  calicoNetwork:
    linuxDataplane: VPP
    ipPools:
    - cidr: 172.31.0.0/16    # or whatever you chose for your pod network CIDR
      encapsulation: VXLAN   # or IPIP
      natOutgoing: Enabled

While not commonly required, there is an option to override the image prefix used, in order to download images from non-default image registries / mirrors. This came in handy for me because I could use a corporate mirror instead of the default docker.io which had strict rate limits imposed. Use it thus:

spec:
  imagePrefix: some-prefix-ending-in-fwd-slash

Then apply this edited manifest:

    kubectl create -f installation-default-edited.yaml

This would create a number of pods, including the calico API controller, calico node daemonset pods, calico typha daemonset pods, etc. The calico node daemonset pods would not be up though till the VPP dataplane is installed.

Installing VPP

To install VPP, you got to use one of two manifests, depending on whether you configured hugepages *(use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp.yaml) or not (use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp-nohuge.yaml). Download the appropriate YAML, and then make the following edit if needed.

The vpp_dataplane_interface attribute should be set to the name of the NIC that would be used or the node-to-node communication. By default it is set to eth1, but if that's not the interface that would be used on your node (e.g. on my cluster, I am using eth0), then set this appropriately.

Then apply:

    kubectl apply -f calico-vpp.yaml   # or calico-vpp-nohuge.yaml

This would install the calico-vpp-dataplane daemonset on the cluster nodes. If all went well, then all the pods related to calico, and the core-dns pods should be up and running in a few minutes.

Enabling IPsec

For this, the instructions here are already adequate. You need to create a secret and put a pre-shared key in it:

    kubectl -n calico-vpp-dataplane create secret generic \
    calicovpp-ipsec-secret \
    --from-literal=psk="$(dd if=/dev/urandom bs=1 count=36 2>/dev/null | base64)"

Then patch the calico-vpp-node daemonset with the ipsec configuration:

    kubectl -n calico-vpp-dataplane patch daemonset calico-vpp-node \
    --patch "$(curl https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/components/ipsec/ipsec.yaml)"


Packet Tracing and Troubleshooting

WIP, but this is where the docs are sketchy.


Read more!

Saturday, April 22, 2023

Go virtualenvs (sort of)

Well how do you run multiple versions of the Go compiler on the same sandbox. This isn't quite the same as different virtualenvs for Python - because that is a run-time construct while this is a pure compile-time mechanism. But objectives are analogous. On a given machine, I want to be able to write Go code and build it using different versions of the Go compiler.

Assuming you already have some version of Go compiler installed, and you've set GOPATH (and GOROOT, GOBIN, etc.), here is a way to deploy additional versions of the compiler.

go install golang.org/dl/go1.17.12@latest

The above command downloads an installer binary for Go version 1.17.12 (just a random version) and places it under $GOBIN/. If you now want to install go 1.17.12, you have to run the following command.

$GOBIN/go1.17.12 download

This installs go 1.17.12 side-by-side with other versions of Go that might already be present on the box. Now, run the following command to determine the installation location.

$GOBIN/go1.17.12 env GOROOT

Use the GOROOT for the go1.17.12 installation (or whatever your chosen version is) to set environment variables. Maybe define a .goenv1.17.12 script that you can source in the shell. Each time you want to switch to this version of the Go compiler, source this script. You would need to keep your sources separate for each version I think - I am not sure if you can switch between Go versions on the same repo.


Read more!

Thursday, April 06, 2023

Distributed Tracing: A basic setup on Kubernetes

If you're trying to quickly get up to speed with distributed tracing and want to try it out in a Kubernetes environment, this post will help you set up the architectural pieces and try to see tracing in action.

Architecture

We would be running a Jaeger collector back-end that would collect all traces from everywhere. This could run outside Kubernetes too, as long as its ports are accessible from within the Kubernetes pods. Workloads generating traces would be simulated using pods running otel-cli. Each Kubernetes node would also run an OTel agent. The pods would send the traces to the agent on the local node, which in turn would forward them to the Jaeger collector.

Deployment

Deploy a recent version (>=1.35) of the Jaeger all-in-one collector for the back end, on a machine that is accessible to all the Kubernetes cluster that would be producing traces. The version is important because we want Jaeger to be capable of accepting OTLP payload which OTel libraries and agents emit. By default, it would use in-memory storage for keeping traces - they wouldn't be persistent.

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true   -p 16686:16686   -p 4317:4317   -p 4318:4318   jaegertracing/all-in-one:1.35

On the Kubernetes clusters where you want to run applications that generate traces, deploy OTel agent daemon sets using the following manifest.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      otlp:
        endpoint: "192.168.219.1:4317"
        tls:
          insecure: true
        sending_queue:
          num_consumers: 4
          queue_size: 100
        retry_on_failure:
          enabled: true
    processors:
      batch:
      memory_limiter:
        # 80% of maximum memory up to 2G
        limit_mib: 400
        # 25% of limit up to 2G
        spike_limit_mib: 100
        check_interval: 5s
    extensions:
      zpages: {}
      memory_ballast:
        # Memory Ballast size should be max 1/3 to 1/2 of memory.
        size_mib: 165
    service:
      extensions: [zpages, memory_ballast]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-agent-config.yaml"
        image: otel/opentelemetry-collector:0.75.0
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 4317 # Default OpenTelemetry receiver port.
          hostPort: 4317
        - containerPort: 8888  # Metrics.
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: /conf
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol

In the above, the address 192.168.219.1 is where the Jaeger all-in-one collector is running on my setup. Yours would be different.

Finally deploy your application which produces traces using OTel libraries and configure it to send the traces to the local node IP on port 4317. This would send the traces to the OTel agent daemon set. This section will be expanded to add Golang code samples using the OTel SDK. For now skip to the next section to see how you can test trace generation using a CLI tool.

Trying it out

Install the otel-cli in your Go build environment:

go install github.com/equinix-labs/otel-cli@latest

This would put the otel-cli binary under your $GOPATH/bin. Put this binary inside a container image, and create a pod from that image that periodically runs the following commands:

$ export OTEL_EXPORTER_OTLP_ENDPOINT=<IP>:4317
$ otel-cli exec --service my-service --name "curl google" curl https://google.com 

If you point your browser to <IP>:16686, where <IP> is the address of the machine running the Jaeger all-in-one collector, you should be able to see Jaeger UI and look up traces generated by your service.

 
 
 


Read more!

Saturday, March 25, 2023

Setting up SPIRE for your Kubernetes cluster

About SPIFFE and SPIRE

If you're reading this, you likely already know what SPIFFE and SPIRE are. But in case you don't here is a really short summary: SPIFFE (Secure Production Identity Framework for Everyone) is a specification and SPIRE (SPIFFE Runtime Environment) an implementation of that specification for securely issuing identities to workloads running in different compute environments, and for managing these identities (such as refreshing, revoking, etc.). Why is it useful? Well, it lets your services, such as pods running in Kubernetes, to have their own certificates and signed JWTs, which are automatically refreshed, etc. and using which they can authenticate themselves to other services, and communicate securely with them. For example, the certificates could be used to create mTLS connections with other workloads, or signed tokens could be used as a proof-of-possession for authentication.

One key problem of securely issuing identities is the security of the initial handshake, for the initial request asking for identity. SPIRE solves this in a novel way using agents that are capable of querying the compute environment about the workloads requesting identities, and then issuing the identities only if these workloads satisfy certain criteria. By keeping the agents local to the node where the workloads run, concerns about the initial secrets are addressed. Of course there is a lot more to it, and the right place to head to for more details is here.

Motivation

I had trouble wrapping my head around exactly what was going on and I still have many questions about it, but I figured that the best way to learn about this was to try it out. The purpose of this post was to document the steps for doing so, focusing on deploying SPIRE for workloads on a Kubernetes cluster. It's not particularly hard to do this by following the official documentation, but this is a more linear version of it, focusing on a specific and commonly useful scenario. So I hope to make it a wee bit easier with this post.

Architecture

We will build a setup that can serve one or more Kubernetes clusters. In order to support this, we would use a relatively recent Kubernetes version (1.22+) that supports projected service account tokens (PSATs).

We will deploy a single SPIRE server running outside the k8s cluster(s). In each cluster (we will use only one), we will deploy a SPIRE agent daemonset. Each SPIRE agent instance would connect to the SPIRE server (either securely or insecurely) at bootstrap and receive the node identity. The SPIRE server and each SPIRE agent instance would securely connect to the kube API of each k8s cluster to query k8s metadata about nodes and workloads.

Deploying the SPIRE Server

Deploy the server on a Linux box which can access your k8s cluster API. For example, I am using my Ubuntu laptop which acts as the host for my k8s VMs.

Create the spire user and spire group. 

$ sudo groupadd spire
$ sudo useradd -g spire -s /bin/false -M -r spire

Download the binary bundle and copy it to /opt/spire.

$ wget https://github.com/spiffe/spire/releases/download/v1.6.1/spire-1.6.1-linux-x86_64-glibc.tar.gz
$ tar xfz spire-1.6.1-linux-x86_64-glibc.tar.gz
$ sudo cp -r spire-1.6.1/ /opt/spire/
$ sudo find /opt/spire -type d -exec chmod 755 {} \;
$ sudo chmod 755 /opt/spire/bin/spire-server
$ sudo chown spire:spire /opt/spire/conf/server/server.conf
$ sudo ln -s /opt/spire/bin/spire-server /usr/bin/spire-server

Edit the server configuration present at /opt/spire/conf/server/server.conf thus.

server {
    bind_address = "192.168.219.1"
    bind_port = "8081"
    trust_domain = "everett.host"

    data_dir = "/var/opt/spire/data/server"
    log_level = "DEBUG"
    log_file = "/var/opt/spire/log/server.log"
    ca_ttl = "168h"
    default_x509_svid_ttl = "28h"
}

Set the bind_address to an address that is accessible from the k8s nodes. In my case, it is the VirtualBox host network IP of the Ubuntu host, which happens to be the gateway IP for that network. Keep the bind_port as 8081 or a different port above 1023 if you know 8081 conflicts with another application.

Set the trust_domain to a unique string - it need not be DNS resolvable. In my case, the hostname of the Ubuntu laptop where the SPIRE server is running is everett, so I set trust_domain to everett.host. Also ensure that ca_ttl is at least six times the value in default_x509_svid_ttl.

The data_dir directory specifies a directory under which most application data would be persisted. Likewise, the log_file directive specifies the log file name. I made sure I ran the following commands to make these directories accessible.

$ sudo mkdir -p /var/opt/spire/data
$ sudo mkdir -p /var/opt/spire/log
$ sudo chown -R spire:spire /var/opt/spire/


In the plugins section of the same file, update the DataStore, KeyManager, UpstreamAuthority, and NodeAttestor plugins.

For the SQL data store,  keep the database_type as the default sqlite3, and update the connection string to point to the location under data_dir where the data files would be stored, as shown below.

plugins {
    DataStore "sql" {
        plugin_data {
            database_type = "sqlite3"
            connection_string = "/var/opt/spire/data/server/datastore.sqlite3"
        }
    }

For the key manager disk plugin, specify the location where the server private keys would be kept as shown. Here we use an unencrypted directory store for our purposes, but for better security one should use more secure secret stores or key management systems.

     KeyManager "disk" {

        plugin_data {
            keys_path = "/var/opt/spire/data/server/keys.json"
        }
    }

Add the UpstreamAuthority stanza if it's not present, or configure it as shown below. It configures the X509 certificate and private key used by the server to issue certificates. You need to know what you're doing, but you can read the nifty script at this site, then tweak it as needed and use that to generate a root cert and key pair, and a leaf cert and key from it. Rename the root cert and key to bootstrap.crt and bootstrap.key, copy them over to /var/opt/spire/data/server, and remember to update their permissions so that they are accessible by the spire user and spire group.

    UpstreamAuthority "disk" {
        plugin_data {
            key_file_path = "/var/opt/spire/data/server/bootstrap.key"
            cert_file_path = "/var/opt/spire/data/server/bootstrap.crt"
        }
    }

Finally, for each k8s cluster that this SPIRE server needs to serve, ensure a section is available as shown in bold below. The kube_config_file points to the location of the k8s config used to access the cluster's kube-api. Ensure that this file is copied from the k8s cluster to the location listed here. The cluster identifier in this case is e1, which is arbitrarily chosen - you can give it any name but make sure to use the same name to refer to it elsewhere too. The service_account_allow_list lists the service accounts from the e1 cluster that are allowed to connect to the SPIRE server. The SPIRE server would validate the service account token by using the k8s Token Review API on the cluster e1.

    NodeAttestor "k8s_psat" {
        plugin_data {
            clusters = {
                "e1" = {
                    service_account_allow_list = ["spire:spire-agent"]
                    kube_config_file = "/home/amukher1/.kube/config"
                }
            }
        }
    }

Ensure that the spire user has read access to the path listed for kube_config_file.

Next, create a systemd module for automatically starting and stopping the SPIRE server on this node. Create the file /etc/systemd/system/spire-server.service and set its content to the following.

[Unit]
Description=SPIRE Server

[Service]
User=spire
Group=spire
ExecStart=/usr/bin/spire-server run -config /opt/spire/conf/server/server.conf

[Install]
WantedBy=multi-user.target


Then enable and start the service, and check its status using:

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now spire-server
$ sudo systemctl status spire-server

In case you are running this on a host that runs guest VMs in VirtualBox, and the SPIRE server listens on an IP of the host-only network that the VMs too are a part of, then that network would not be up till you bring up your first VM. Till such time, the SPIRE server could fail to start as it is unable to bind to an address off of that network. This is the reason I have not configured the service to restart automatically. Instead, you can just manually start it using the following command, once you've started the first VM:

$ systemctl start spire-server


Deploying the SPIRE Agent

The agent must be deployed as a daemonset on each k8s cluster that you want to manage via this server. We would need certain manifest yamls for deploying the agent. We could get this from the source bundle, downloaded from the page here.

$ wget https://github.com/spiffe/spire/archive/v1.6.1.tar.gz
$ tar -xf v1.6.1.tar.gz spire-1.6.1/test/integration/suites/k8s/conf/agent/spire-agent.yaml
 --strip-components=7

Then edit the file spire-agent.yaml as below.

In the spire-agent ConfigMap manifest inside this file, edit the contents of agent.conf, setting the following keys:

server_address = "<SPIRE_server_addr>"

This should be the IP or the FQDN where the SPIRE server is accessible - typically the same as the bind_address in the server config, or an FQDN resolving to it.

Also set the trust_domain to the same value as set for the server:

trust_domain = "everett.host"

Make a note of the trust_bundle_path attribute. This is the location within the SPIRE agent pod where the CA cert bundle of the SPIRE server should be mounted. It is okay to keep it set to its default value of /run/spire/bundle/bundle.crt. On first run you may want to comment out this attribute and instead add the following.

insecure_bootstrap = true

Within the NodeAttestor stanza, set the cluster attribute to e1.

      NodeAttestor "k8s_psat" {
        plugin_data {
          cluster = "e1"
        }
      }

In the WorkloadAttestor stanza, set the directive skip_kubelet_verification to false, and set the kubelet_ca_path attribute to the location CA cert for this k8s cluster as shown below.

      WorkloadAttestor "k8s" {
        plugin_data {
          ...
          # skip_kubelet_verification = false
          kubelet_ca_path =
"/run/spire/bundle/kubelet-ca.crt"
        }

We will ensure that the k8s cluster CA cert is mounted at the location pointed at by kubelet_ca_path, via a ConfigMap.

Further down in the manifest, edit the containers section of the DaemonSet spec, updating the SPIRE agent image location, and image pull policy.

     containers:
        - name: spire-agent
          image: ghcr.io/spiffe/spire-agent:1.6.1
          imagePullPolicy: IfNotPresent
 

Copy the CA cert of your k8s cluster from /etc/kubernetes/pki/ca.crt to the local directory, naming the target file kubelet-ca.crt.

If you did not set insecure_bootstrap to true earlier, then retrieve the CA cert bundle for the SPIRE server that was set aside, and copy it to some location from where you can create ConfigMaps on this cluster. Rename the file to bundle.crt

Then run the following command, include the bundle.crt only if you copied it.

$ kubectl create configmap spire-bundle -n spire --from-file=kubelet-ca.crt --from-file=bundle.crt

Finally, apply this edited manifest on your k8s cluster. 

$ kubectl apply -f spire-agent.yaml

If all goes well, you should have a working SPIRE installation on your k8s cluster. You can verify that the SPIRE agent daemonset has pods running on each worker node of your cluster by running the following command.

$ kubectl get pods -n spire

Make sure that the pods are running and ready. Also run the following commands on the SPIRE server to verify that the SPIRE agents on your nodes have been attested and received SPIFFE ids. You can get agent SPIFFE ids from the output of the first command.

$ spire-server agent list
$ spire-server agent show -spiffeID <spiffe_ID>


Cleaning up

While experimenting with the setup, one would likely need to clean up and recreate the setup several times. When doing that, it's good to follow a certain discipline. With all nodes of the cluster up, run the following commands.

$ kubectl delete -f spire-agent.yaml -n spire
$ kubectl delete configmap spire-bundle -n spire
$ kubectl delete ns spire
$ kubectl get all,configmap,sa -n spire

Other than the default service account and a ConfigMap called kube-root-ca.crt, all other resources should be deleted. Still check for any stray pods:

$ kubectl get pods -n spire


Using SPIRE

In this section, we shall see how to use SPIRE to have identities issued to workloads running within your Kubernetes cluster. 

Architecture

We would deploy a pod on Kubernetes to fetch its identity bundle from the SPIRE agent running locally on that node. It would connect to the agent over a Unix domain socket mounted from a host path. We must also create registration entries for the workloads. Typically we would want to do this by defining criteria for choosing a workload, using selectors. We want to create the rules such that given a pod running a particular image X and with a particular label app=Y, we would always issue it the same identity no matter which worker node it runs on. To do this, the parent spiffe ID of the registration entry cannot be that of a single worker node, but of an alias to all the worker nodes. The following section describes these in detail.

Process

Run the command below on the server to create a node alias SPIFFE id that applies to all worker nodes of the cluster. This will allow us to create registration entries for workloads that would give them the same identity based on their image and a label, irrespective of which cluster node they run on.

/opt/spire/bin/spire-server entry create -node -spiffeID spiffe://everett.host/ns/spire/sa/spire-agent/cluster/e1 -selector k8s_psat:cluster:e1 -selector k8s_psat:agent_ns:spire -selector k8s_psat:agent_sa:spire-agent

Next, create a binary using the following Go code. This binary fetches its identity and bundles from the SPIRE agent running locally on a worker.

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "strings"
    "time"

    "github.com/spiffe/go-spiffe/v2/spiffeid"
    "github.com/spiffe/go-spiffe/v2/svid/jwtsvid"
    "github.com/spiffe/go-spiffe/v2/workloadapi"
)

const (
    socketPath = "unix:///tmp/spire-agent/api.sock"
)

func main() {
   ctx := context.Background()
    for err := run(ctx); ; {
        if err != nil {
            log.Fatal(err)
        }
        time.Sleep(60 * time.Second)
    }
}

func run(ctx context.Context) error {
    // Set a timeout to prevent the request from hanging if this workload is not properly registered in SPIRE.
    ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    client := workloadapi.WithClientOptions(workloadapi.WithAddr(socketPath))

    // Create an X509Source struct to fetch the trust bundle as needed to verify the X509-SVID presented by the server.
    x509Source, err := workloadapi.NewX509Source(ctx, client)
    if err != nil {
        fmt.Printf("unable to create X509Source: %v", err)
        return fmt.Errorf("unable to create X509Source: %w", err)
    }
    defer x509Source.Close()

    fmt.Printf("Received trust budle: %v", x509Source)
    serverID := spiffeid.RequireFromString("spiffe://example.org/server")

    // By default, this example uses the server's SPIFFE ID as the audience.
    // It doesn't have to be a SPIFFE ID as long as it follows the JWT-SVID guidelines (https://github.com/spiffe/spiffe/blob/main/standards/JWT-SVID.md#32-audience)
    audience := serverID.String()
    args := os.Args
    if len(args) >= 2 {
        audience = args[1]
    }

    // Create a JWTSource to fetch JWT-SVIDs
    jwtSource, err := workloadapi.NewJWTSource(ctx, client)
    if err != nil {
        fmt.Printf("unable to create JWTSource: %v\n", err)
        return fmt.Errorf("unable to create JWTSource: %w", err)
    }
    defer jwtSource.Close()

    // Fetch a JWT-SVID and set the `Authorization` header.
    // Alternatively, it is possible to fetch the JWT-SVID using `workloadapi.FetchJWTSVID`.
    svid, err := jwtSource.FetchJWTSVID(ctx, jwtsvid.Params{
        Audience: audience,
    })
    if err != nil {
        fmt.Printf("unable to fetch SVID: %v\n", err)
        return fmt.Errorf("unable to fetch SVID: %w", err)
    }
    fmt.Printf("Received JWT svid: %v", svid)
    return nil
}

I named this binary id-client, and created a docker image tagged amukher1/id-client and pushed it to DockerHub. You can use a name of your choice. Deploy this binary with the following manifest:

apiVersion: v1
kind: Pod
metadata:
  name: id-client
  labels:
    app: id-client
spec:
  containers:
  - name: id-client
    image: amukher1/id-client
    volumeMounts:
    - name: spire-agent-socket
      mountPath: /tmp/spire-agent
      readOnly: false
  volumes:
  - name: spire-agent-socket
    hostPath:
      path: /run/spire/agent-sockets
      type: DirectoryOrCreate

Make a note of the image tag of the container image. I used the following command and copied the first value inside the RepoDigests array:

docker inspect amukher1/id-client

Now create a registration entry for a workload matching this binary, using the node alias as the parent SPIFFE id.

/opt/spire/bin/spire-server entry create -spiffeID spiffe://everett.host/image/id-client -parentID spiffe://everett.host/ns/spire/sa/spire-agent/cluster/e1 -selector k8s:pod-image:docker.io/<image tag> -selector k8s:pod-label:app:id-client

With the above steps, you should be able to fetch x509 certificates as well JWT tokens.


Read more!

Tuesday, February 28, 2023

Routing TCP over Envoy using HTTP CONNECT

Tunneling TCP traffic using the L4 proxy capabilities of Envoy works well, but due to the nature of TCP, very little metadata useful for routing can be propagated via the TCP protocol itself. Using the HTTP CONNECT verb however, it is possible to instruct a proxy to tunnel the subsequent data as raw TCP to some target without interpreting it as http or some other L7 protocol. The way it works is listed below:

  1. A caller A wants to send some TCP traffic to a service B.
  2. The caller A calls some proxy P by making an HTTP request with the following header: CONNECT http://<address_of_B>[:port] HTTP/1.1
  3. The proxy P opens a TCP connection to the address and the optional port, and keeps the connection from A open.
  4. The caller A then uses its connection to P to stream the TCP payload it needs to send to B. P relays this traffic to B.
In the above, P is said to terminate the HTTP CONNECT. Equally well, it could be configured to propagate the HTTP CONNECT instead of terminating it, proxying everything it receives to a downstream proxy (upstream, if we use Envoy terminology). The final proxy in this chain would then terminate the HTTP CONNECT and forward the request to the target.

The elegance in this approach is that by encapsulating the request in an HTTP shim, we open up HTTP headers as a mechanism for specifying routing directives that the intermediate proxies could use. If the caller uses TLS, they can specify the serverName in TLS headers and use SNI for routing the request. The actual target address of B need not even be routable from A - it only needs to be routable from the final proxy in the chain (the one that terminates the HTTP CONNECT). With HTTP/2, the CONNECT header even allows a URL path. I'm not sure but perhaps this URL path too could be used for routing purposes just as with regular requests. With HTTP/2, multiple TCP streams could be multiplexed over a single HTTP/2 connection, achieving improved resource usage and better latencies when reusing connections from a pool.

The obvious downside is that the caller A would need to know the mechanics of HTTP CONNECT and have a dependency on it. But this is a small price to pay for not having to deal with TCP routing.

Envoy has supported HTTP CONNECT for a few years now (possibly since 1.14.x). Here is a small sample configuration which uses two Envoys, one propagating the HTTP CONNECT and the other terminating it, to route TCP traffic to a destination.

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route
            virtual_hosts:
            - name: connect_tcp
              domains: ["fubar.xyz:1234"]
              routes:
              - match: { headers: [{name: ":authority", suffix_match: ":1234"}], connect_matcher: {} }
                route: { cluster: ssh, upgrade_configs: [{upgrade_type: "CONNECT", connect_config: {}}] }
              - match: { prefix: "/" }
                route: { cluster: ssh }
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: ssh }
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          upgrade_configs:
          - upgrade_type: CONNECT
          http2_protocol_options:
            allow_connect: true
  - name: listener_1
    address:
      socket_address: { address: 127.0.0.1, port_value: 9000 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          route_config:
            name: local_route1
            virtual_hosts:
            - name: connect_fwd
              domains: ["fubar.xyz", "fubar.xyz:*"]
              routes:
              - match: { connect_matcher: {} }
                #### route: { cluster: conti, upgrade_configs: [{upgrade_type: "websocket"}]}
                route: { cluster: conti, timeout: "0s"}
              - match: { prefix: "/" }
                route: { cluster: conti1 }
            - name: local_service
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route: { cluster: conti1 }
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          upgrade_configs:
          - upgrade_type: CONNECT
          http2_protocol_options:
            allow_connect: true
  clusters:
  - name: ssh
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: ssh
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 22
  - name: conti
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: conti
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 10000
  - name: conti1
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: conti
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8000
(More explanation to follow.)

Read more!

Tuesday, February 14, 2023

Charting Your Course Through the Knowledge Economy

This is the age of the knowledge economy and the age of the autodidact. Formal education matters less than skills and the ability to think deeply, and to apply one's understanding of an area. Very little of the learning we need to do from here on would be formulaic. Multiple levels of formulaic thinking, with sophisticated patterns, would be delegated to machines and human intellectual power would be repeatedly summoned to solve hard problems. But that journey of a thousand miles would still need to start with a single step. This article is about building the intellectual discipline to absorb and produce knowledge, and hopefully, some wisdom too.

Learning discipline for the attention challenged

How do you study deep when you have attention span issues (like most of the early 21st century workforce) and need to understand and remember quickly? I don't have an answer, I am seeking answers myself. Two things I can think of though, are:

  1. Drawing diagrams with labels, *on paper*, illustrating the main ideas, if that's feasible.
  2. Avoiding phone, social media, and similar modern day "necessities" at all costs at study time.
There is a third thing, and this was popularized by the course / book on learning how to learn, but is possibly well-known:
  1. Recap what you read later (maybe the next morning or the same day in the night) using the drawing, and perhaps write notes. And then of course read this in about a week.

Building a reading list

If you're like me, you'd buy books, but read only a small subset of them. It's not inherently problematic and there is some evidence to say that having books you haven't yet gotten to reading might be a good thing - but it can only be so as long as you read with some regularity and burn down a reading list. Which means sporadic reading is not such a great idea.

An even harder problem presents itself with e-books. We download a great e-book that we always wanted to read and then forget about it. The greatest challenge with e-books is that they are not in front of your eyes, on your bookshelf or table, constantly reminding you of their existence. How does one remember them, and then make a mental note of a plan to read them? Again I don't know the answer, but here is something I can think of.
  1. Have a reading list, or rather a few reading lists.
    1. For example have a fiction reading list, a self-help / mgmt reading list (if that's your thing), and then stuff that is specific to your domain. If you're into software engineering for example, you would likely have lots to catch up on in various areas: distributed systems, security, concurrency, networking, data structures, operating systems. It's fine to have a list in each area - but make it a really short list.
    2. Curate the list. Especially for technical subjects, find those books which would help you learn faster and get to the next level. Be prepared to churn these lists, replacing some choices with others as you figure what works and what does not.
  2. Devote some time each day, even if that's 30 minutes, to reading. Identify ahead of time what you would be reading. Make a couple of hours maybe during weekends and holidays, when you can.
  3. Keep the books from your reading list close at hand.
  4. By all means, catalog your e-books (and physical books too, but especially e-books) in an online catalog such as Goodreads or LibraryThing or some such. And then gawk at your own collection fishing for the next book ideas at least once every week.
    1. This requires discipline - every time you download a book, you have to put an entry your online catalog. But then as Pythagoras once said, there's no royal road to geometry, nor to deep reading if I may add.
There is no reading without writing. Taking notes, and thinking about what you read make all the difference. Finding special interest groups / meetups in your locality that discuss what they are reading are a fantastic way to maximize your reading muscle.

Creating a body of work

This is perhaps the most important topic, and the one that requires maximum discipline. A body of work typically means a set of documents of some manner. This could be a set of academic papers, books or monographs, useful blog articles, instructional videos, significant long term contributions to one or more open source (or closed source) software, etc. or some combination thereof. It could even be photographs, paintings, performances as an actor or a musician or in some other performing art. However, here I would mostly focus on the former kind because this article lacks the space or the scope to talk about artistic creativity.

Your body of work is really a document of what you've built with your intellect. Instead of getting too prescriptive, I would like to focus on four aspects that are essential.
  1. Have a vision of what purpose your body of work serves.
    1. Maybe it teaches people difficult concepts in a simple way.
    2. Maybe it reveals new insights about some subject.
    3. Maybe it's an aid for other teachers, or researchers.
  2. Identify the form-factor of your body of work.
    1. Would it be a series of blog articles?
    2. Would it be a book, or a book series?
    3. Would it be code in a set of repositories?
  3. Identify how would build that body of work.
    1. Identify key concepts and notions, insights or even discoveries you've made, and start documenting them on an ad hoc basis.
    2. Periodically collate your notes and documents, and start building rough drafts of your book / blog / video, whatever.
    3. Publish, share with your consumers, and seek early feedback. Even when you want to directly earn money for your publications, this is still viable as lots of highly qualified people would happily read and review your work for free or for a small fee.
  4. Make sure you are making progress on building this, every week, month, and year.
    1. Progress can be slow, especially in the beginning. Speed (like in running, playing an instrument, or making money) is usually something you leave for later.
This is the undertaking that is the hardest to start on, but has the maximum bang for the buck. Having a vision, and having the discipline to pursue it are the key to achieving these goals.

Postscript

There are many other aspects that I haven't spoken of here. Building focus through mindfulness techniques, building a general outlook in life that is conducive to focus and attention (stoicism, anyone?), and the ability to think critically as well as pragmatically, are all vital ingredients of a solid knowledge career. The above article in some sense is more logistics, than principles. But hopefully the techniques give you a useful blueprint to follow.


Read more!