Sidecar container lifecycle changes in Kubernetes 1.18
Tuesday, February 4th, 2020
Update: Looks like the solution described below isn't going to land in Kubernetes after all. The developers went back to the drawing board and they will try to come up with a solution that's best for everyone.
The sidecar concept in Kubernetes is getting more and more popular, and for a good reason. In the container world, it's a common principle that a container should address a single concern only, but do it well. The sidecar pattern helps achieving this principle by decoupling the main business logic from supplementary tasks that extend the original functionality.
In Kubernetes, a pod is a group of one or more containers with shared storage and network. A sidecar is a utility container in a pod that's loosely coupled to the main application container.
Perhaps the most well known use case of sidecars is proxies in a service mesh architecture, but there are other examples, including log shippers, monitoring agents or data loaders.
Sidecars have been used for a long time in Kubernetes, but the pattern was not supported as a built-in feature in Kubernetes. It was only a nominal distinction, and sidecar containers were basically regular containers in a pod. But the more applications started embracing the pattern, the more issues have turned up. Soon it became clear that Kubernetes should formally identify sidecars, and handle the lifecycle of such containers differently.
At Banzai Cloud we've been working with sidecar containers either with our container management platform for hybrid clouds, Pipeline or our service mesh product Backyards (now Cisco Service Mesh Manager) and have seen and implemented all kind of workarounds to manage the lifecycle dependencies of containers within a pod.
From Kubernetes 1.18 containers can be marked as sidecars, so that they startup before normal containers and shutdown after all other containers have terminated. So they still behave as normal containers, the only difference is in their lifecycle.
All problems with sidecar containers are related to container lifecycle dependency. As Kubernetes didn't make a difference between containers in a pod, it couldn't be controlled which container starts first, or terminates last. But properly running sidecar containers are often requirements for the main application container to behave correctly.
Let's take a look at an Istio service mesh example. An Envoy
sidecar proxies all incoming and outgoing traffic to the
application container. So until it's up and running, the
application may fail to send or receive traffic. Readiness
probes don't help if the application is trying to talk
outbound. It's not always a breaking issue, because the
container will probably be able to recover, but you'll
probably see error messages in the logs, or
CrashLoopBackoffs when the application container fails to
start. In some cases when the application is not resilient
enough of restarts, or some startup scripts are not
idempotent, it may not start at all.
This issue is quite easy to solve with an ugly workaround, by adding a few seconds delay in the application container's startup script. It is the recommended solution by Istio as well, but as said in this issue it is more of a hack, and quite painful to change every developer container in the mesh.
With the Kubernetes Sidecar feature, the pod startup lifecycle will be changed, so sidecar containers will start after init containers finished, and normal containers will only be started once the sidecars become ready. It ensures that sidecars are up and running before the main processes start.
If a Kubernetes Job has a sidecar container, it will carry on running even after the primary container finishes, and the job itself will never reach completed status. This problem is a bit harder to workaround than the previous one, because the only way to overcome it is to somehow send a signal to the sidecar container to exit when the main process finishes.
And this workaround comes with a few issues: it means
extending all jobs with custom logic, and somehow
synchronise between containers: through a shared scratch
volume, or some ad-hoc solution, like Envoy's and
/quitquitquit endpoint. It's an even more complex issue
with third party containers. In that case you'll probably
need some kind of wrapper script that can be passed to the
container as an
entrypoint, but it feels like a hack, and
it isn't always feasible with minimal containers without a
From Kubernetes 1.18, if all normal containers have reached
a terminal state (
restartPolicy=Never), then all
sidecar containers will be sent a
Pod shutdowns have a similar problem than pods startups. If the sidecar terminates before the primary process, it can cause a high amount of errors during the graceful teardown of the main application. During a graceful shutdown applications can execute some kind of cleanup logic, like closing long-lived connections, rolling back transactions, or saving state to an external store like s3. If the sidecar is killed first, it can prevent the cleanup logic to run properly.
A good example of that is an
reported in the argo project. Argo attempts to store
container logs in s3, but fails to do that if the
istio-proxy is killed first, since all traffic should flow
The solution for this kind of problem is similar to the
startup issue. Pod termination lifecycle will be changed to
SIGTERM to all normal containers first, and once
all of them exited, send a
SIGTERM to all sidecar
containers. If normal containers don't exit before the
TerminationGracePeriod, they are sent a
as before, but
SIGTERM will be sent to sidecars only after
are also sent to sidecars.
Labelling a container as a sidecar will be as easy as
container.lifecycle.type entry in the
PodTemplate spec. Type can be
it's not set, the default is of course
apiVersion: v1 kind: Pod metadata: name: bookings-v1-b54bc7c9c-v42f6 labels: app: demoapp spec: containers: - name: bookings image: banzaicloud/allspark:0.1.1 ... - name: istio-proxy image: docker.io/istio/proxyv2:1.4.3 lifecycle: type: Sidecar ...
In Kubernetes 1.18, this feature will be behind a feature gate, as usual with new features that come with API changes, so you'll need to enable it on the API server explicitly.
Once landed in Kubernetes 1.18, this feature will help overcome a lot of currently existing issues with sidecars. If you want to track the progress on the original issue, go to Github, or if you're keen to learn the details you can read the full feature proposal here.
If you'd like to give it a try and need a Kubernetes cluster (once 1.18 is released) try PKE, our lightweight and super easy to install CNCF certified Kubernetes distribution.