Thursday, April 06, 2023

Distributed Tracing: A basic setup on Kubernetes

If you're trying to quickly get up to speed with distributed tracing and want to try it out in a Kubernetes environment, this post will help you set up the architectural pieces and try to see tracing in action.

Architecture

We would be running a Jaeger collector back-end that would collect all traces from everywhere. This could run outside Kubernetes too, as long as its ports are accessible from within the Kubernetes pods. Workloads generating traces would be simulated using pods running otel-cli. Each Kubernetes node would also run an OTel agent. The pods would send the traces to the agent on the local node, which in turn would forward them to the Jaeger collector.

Deployment

Deploy a recent version (>=1.35) of the Jaeger all-in-one collector for the back end, on a machine that is accessible to all the Kubernetes cluster that would be producing traces. The version is important because we want Jaeger to be capable of accepting OTLP payload which OTel libraries and agents emit. By default, it would use in-memory storage for keeping traces - they wouldn't be persistent.

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true   -p 16686:16686   -p 4317:4317   -p 4318:4318   jaegertracing/all-in-one:1.35

On the Kubernetes clusters where you want to run applications that generate traces, deploy OTel agent daemon sets using the following manifest.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      otlp:
        endpoint: "192.168.219.1:4317"
        tls:
          insecure: true
        sending_queue:
          num_consumers: 4
          queue_size: 100
        retry_on_failure:
          enabled: true
    processors:
      batch:
      memory_limiter:
        # 80% of maximum memory up to 2G
        limit_mib: 400
        # 25% of limit up to 2G
        spike_limit_mib: 100
        check_interval: 5s
    extensions:
      zpages: {}
      memory_ballast:
        # Memory Ballast size should be max 1/3 to 1/2 of memory.
        size_mib: 165
    service:
      extensions: [zpages, memory_ballast]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-agent-config.yaml"
        image: otel/opentelemetry-collector:0.75.0
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 4317 # Default OpenTelemetry receiver port.
          hostPort: 4317
        - containerPort: 8888  # Metrics.
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: /conf
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol

In the above, the address 192.168.219.1 is where the Jaeger all-in-one collector is running on my setup. Yours would be different.

Finally deploy your application which produces traces using OTel libraries and configure it to send the traces to the local node IP on port 4317. This would send the traces to the OTel agent daemon set. This section will be expanded to add Golang code samples using the OTel SDK. For now skip to the next section to see how you can test trace generation using a CLI tool.

Trying it out

Install the otel-cli in your Go build environment:

go install github.com/equinix-labs/otel-cli@latest

This would put the otel-cli binary under your $GOPATH/bin. Put this binary inside a container image, and create a pod from that image that periodically runs the following commands:

$ export OTEL_EXPORTER_OTLP_ENDPOINT=<IP>:4317
$ otel-cli exec --service my-service --name "curl google" curl https://google.com 

If you point your browser to <IP>:16686, where <IP> is the address of the machine running the Jaeger all-in-one collector, you should be able to see Jaeger UI and look up traces generated by your service.

 
 
 

No comments: