At Banzai Cloud we run and deploy containerized applications to Pipeline, our PaaS. Those of you who (like us) run Java applications inside Docker, have probably already come across the problem of JVMs inaccurately detecting available memory when running inside a container. Instead of accurately detecting the memory available in a Docker container, JVMs see the available memory of the machine. This can lead to cases wherein applications that run inside containers are killed whenever they try to use an amount of memory that exceeds the limits of the Docker container.
A follow up and complementary post
The incorrect detection of available memory by JVMs is
associated with Linux tools/libs that were created for
returning system resource information (e.g. /proc/meminfo
,
/proc/vmstat
) before cgroups even existed. These
return the resource information of a host (whether that host
is a physical or virtual machine).
Let's explore this process in action by observing how a simple Java application allocates a percentage of memory while running inside a Docker container. We're going to deploy the application as a Kubernetes pod (using Minikube) to illustrate how the issue is also present on Kubernetes, which is unsurprising, since Kubernetes uses Docker as a container engine.
package com.banzaicloud;
import java.util.Vector;
public class MemoryConsumer {
private static float CAP = 0.8f; // 80%
private static int ONE_MB = 1024 * 1024;
private static Vector cache = new Vector();
public static void main(String[] args) {
Runtime rt = Runtime.getRuntime();
long maxMemBytes = rt.maxMemory();
long usedMemBytes = rt.totalMemory() - rt.freeMemory();
long freeMemBytes = rt.maxMemory() - usedMemBytes;
int allocBytes = Math.round(freeMemBytes * CAP);
System.out.println("Initial free memory: " + freeMemBytes/ONE_MB + "MB");
System.out.println("Max memory: " + maxMemBytes/ONE_MB + "MB");
System.out.println("Reserve: " + allocBytes/ONE_MB + "MB");
for (int i = 0; i < allocBytes / ONE_MB; i++){
cache.add(new byte[ONE_MB]);
}
usedMemBytes = rt.totalMemory() - rt.freeMemory();
freeMemBytes = rt.maxMemory() - usedMemBytes;
System.out.println("Free memory: " + freeMemBytes/ONE_MB + "MB");
}
}
We use a
Docker build file
to create the image that contains the jar
that's built
from the Java code above. We need this Docker image in order
to deploy the application as a Kubernetes Pod.
Dockerfile
FROM openjdk:8-alpine
ADD memory_consumer.jar /opt/local/jars/memory_consumer.jar
CMD java $JVM_OPTS -cp /opt/local/jars/memory_consumer.jar com.banzaicloud.MemoryConsumer
docker build -t memory_consumer .
Now that we have the Docker image, we need to create a pod definition to deploy the application to kubernetes:
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
restartPolicy: Never
This pod definition ensures that the container is scheduled to a node that has at least 64MB of free memory and that it will not be allowed to use more than 256MB of memory.
$ kubectl create -f memory-consumer.yaml
pod "memory-consumer" created
Output of the pod:
$ kubectl logs memory-consumer
Initial free memory: 877MB
Max memory: 878MB
Reserve: 702MB
Killed
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 OOMKilled 0 1m
The Java application that was running inside the container
detected 877MB of free memory and consequentially attempted
to reserve 702MB of it. Since we previously limited the
maximum memory usage to 256MB
, the container was killed.
To avoid this outcome, we need to instruct the JVM as to the
correct maximum amount of memory it can reserve. We do that
via the -Xmx
option. We need to modify our pod definition
to pass an -Xmx
setting through the JVM_OPTS
env variable
to the Java application in the container.
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
env:
- name: JVM_OPTS
value: "-Xms64M -Xmx256M"
restartPolicy: Never
$ kubectl delete pod memory-consumer
pod "memory-consumer" deleted
$ kubectl get po --show-all
No resources found.
$ kubectl create -f memory_consumer.yaml
pod "memory-consumer" created
$ kubectl logs memory-consumer
Initial free memory: 227MB
Max memory: 228MB
Reserve: 181MB
Free memory: 50MB
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 Completed 0 1m
This time the application ran successfully; it detected the
correct available memory we passed via -Xmx256M
and so did
not hit the memory limit memory: "256Mi"
specified in the
pod definition.
While this solution works, it requires that the memory limit
be specified in two places: once as a limit for the
container memory: "256Mi"
, and once in the option that is
passed to -Xmx256M
. It would be much more convenient if
the JVM accurately detected the maximum amount of available
memory based on the memory: "256Mi"
setting, wouldn't it?
Well, there's a change in Java 9 that makes it Docker aware, which has been backported to Java 8.
In order to make use of this feature, our pod definition has to look like this:
memory-consumer.yaml
apiVersion: v1
kind: Pod
metadata:
name: memory-consumer
spec:
containers:
- name: memory-consumer-container
image: memory_consumer
imagePullPolicy: Never
resources:
requests:
memory: "64Mi"
limits:
memory: "256Mi"
env:
- name: JVM_OPTS
value:
"-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=1 -Xms64M"
restartPolicy: Never
$ kubectl delete pod memory-consumer
pod "memory-consumer" deleted
$ kubectl get pod --show-all
No resources found.
$ kubectl create -f memory_consumer.yaml
pod "memory-consumer" created
$ kubectl logs memory-consumer
Initial free memory: 227MB
Max memory: 228MB
Reserve: 181MB
Free memory: 54MB
$ kubectl get po --show-all
NAME READY STATUS RESTARTS AGE
memory-consumer 0/1 Completed 0 50s
Please note the -XX:MaxRAMFraction=1
through which we tell
the JVM how much available memory to use as a max heap size.
Having a max heap size set that takes into account the
available memory limit, either through -Xmx
or dynamically
with UseCGroupMemoryLimitForHeap
, is important since it
helps notify the JVM when memory usage is approaching its
limit in order that it should free up space. If the max heap
size is incorrect (exceeds the available memory limit), the
JVM may blindly hit the limit without trying to free up
memory, and the process will be OOMKilled.
The java.lang.OutOfMemoryError
error is different. It
indicates that the max heap size is not enough to hold all
live objects in memory. If that's the case, the max heap
size needs to be increased via -Xmx
or, if
UseCGroupMemoryLimitForHeap
is being used, via the memory
limit of the container.
The use of cgroups
is extremely practical when running
JVM-based workloads on k8s. We touched on this subject in
our Apache Zeppelin
notebook post,
which further highlights the benefits of using a JVM
configuration.