Outshift Logo

PRODUCT

7 min read

Blog thumbnail
Published on 04/22/2018
Last updated on 03/21/2024

Advanced logging on Kubernetes

Share

Update: Logging operator v3 (released March, 2020)

We're constantly improving the logging-operator based on feature requests of our ops team and our customers. The main features of version 3.0 are:
  • Log routing based on namespaces
  • Excluding logs
  • Select (or exclude) logs based on hosts and container names
  • Logging operator documentation is now available on the Banzai Cloud site.
Check The Kubernetes logging operator reloaded post for details.
In this blog we'll continue our series about Kubernetes logging, and cover some advanced techniques and visualizations pertaining to collected logs. Just to recap, with our open source PaaS, Pipeline, we monitor and collect/move a large number of the logs for the distributed applications we push to Kubernetes. We are expending a lot of effort to monitor large and federated clusters, and to automate these with Pipeline, so that our users receive out of the box monitoring and log collection for free. This blog will build on previous posts, so you may want to read those as well.
Logging series: Centralized logging under Kubernetes Secure logging on Kubernetes with Fluentd and Fluent Bit Advanced logging on Kubernetes

Collecting logs

Logs are usually collected for debugging and auditing purposes. However, since the emergence of microservices and containerization, it's become increasingly time-consuming to manually check and analyze logs. This is one of the main reasons why structured logging is becoming an industry standard. Nearly every language has it's own logging library to help developers create clean, well structured logs. Under ideal circumstances, you design logs in the project's design phase, the same way you architect its other components. Because we'll be focusing on Kubernetes and Fluentd, there are several logging libraries that can help us.
Logging libraries with Fluentd support Go - Fluent (official) Go - Logrus Python - Fluent (official) Java - Fluent (official)
Needless to say, we often don't have the luxury of rewriting a legacy application to produce structured logs - this is when a little tooling comes in handy.

Parsing logs

Sometimes we can't change the logging framework of our application, and we need to use a standard output with clear text messages. These parsers can be custom plugins like the MySQL slow-query log or a simple Regexp like the apache access log. Note that fluentd and fluent-bit can parse raw inputs, as well. To get the hang of it, you can start with some of the official options. Parsing logs Example Apache parse config
expression /^(?<host>[^ ]_) [^ ]_
(?<user>[^ ]_) \[(?<time>[^\]]_)\] "(?<method>\S+)(?:
+(?<path>[^ ]_) +\S_)?" (?<code>[^ ]_) (?<size>[^ ]_)(?:
"(?<referer>[^\"]_)" "(?<agent>[^\"]_)")?$/ time_format
%d/%b/%Y:%H:%M:%S %z
The problem, now, is how to identify the different log messages; applying regexp to each message would cause enormous overhead. Instead of that, we'll modify the metadata to route different types of messages to different filters.
In the Fluent ecosystem, log messages are routed by their Tag label. Tags are dot separated hierarchical metadata labels.

Filtering messages based on app labels

Kubernetes helps us filter messages based on app labels; because we're running Pods on our cluster, we're able to add metadata to containers. One of the most common labels is app or application. As fluent-bit enriches the log messages with the Kubernetes metadata we can use that to identify our log. Changes to the routing tags of fluentd messages can be handled with a great plugin called rewrite_tag_filter. Make sure you name your tags carefully to make efficient log flows. Naming is done in accordance with specific needs, however, an acceptable generic template we can use is: $app_name.$cluster_name.$other_information.
Global variable is a useful Helm feature you can read about here
Here's an example of a relabel config that uses the app label and cluster global variable:
<match kubernetes.**>
  @type rewrite_tag_filter
  <rule>
    key $['kubernetes']['labels']['app']
    pattern ^(.+)$
    tag $1.{{ .Values.global.cluster }}
  </rule>
</match>
Example raw tag
var.log.containers.monitor-apache-5b5b9dd84c-g2gx8_default_prometheus-apache-3d1c87b16947cdd686f5964ca668041eefe4f8c0b88bc21a77a52a6b7abd1b0f.log

```

Example **rewritten tag**

```

apache.test.kubernetes
Good, now we have a nice, clean tag, and we can apply our parsing configuration.
<filter apache.*>
  @type parser
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)$/
  time_format %d/%b/%Y:%H:%M:%S %z
  key_name log
</filter>

Store logs

After we've parsed logs we can ingest them into an analyzer like ElasticSearch. There are several benefits to storing logs in this way; you can create dashboards (with Kibana) for huge amounts of data. With simple queries you can check the logs from any pod or even from multiple clusters. If you don't want to analyse your logs you can simply use file or S3 output.
Elasticsearch is a distributed, RESTful search and analytics engine.
We used a custom Helm chart from Lalamove when we installed Elasticsearch.
[~]$ git clone
https://github.com/lalamove/helm-elasticsearch [~]$ helm
install helm-elasticsearch
Note, to use Elasticsearch for real-life workloads, default resource limits must be significantly increased: at least a 1 GB heap size with a 2 GB memory limit.
In a default configuration there are three Master, two Data and two Client pods. You can fine tune these settings as needed. For testing purposes one Master, one Client and two data nodes is sufficient.
[~]$ kubectl get po \*[master] NAME
READY STATUS RESTARTS AGE
elasticsearch-client-654d4894-6nlvq 1/1 Running 0 2m
elasticsearch-client-654d4894-wwk46 1/1 Running 0 2m
elasticsearch-data-0 1/1 Running 0 4m elasticsearch-data-1
1/1 Running 0 4m elasticsearch-master-0 1/1 Running 0 6m
elasticsearch-master-1 1/1 Running 0 7m
elasticsearch-master-2 1/1 Running 0 6m
After a successful deployment you can check the cluster status on the REST API.
[~]$ curl
http://elasticsearch:9200/\_cluster/health

{ "cluster_name": "quarrelsome-elk", "status": "green",
"timed_out": false, "number_of_nodes": 7,
"number_of_data_nodes": 2, "active_primary_shards": 0,
"active_shards": 0, "relocating_shards": 0,
"initializing_shards": 0, "unassigned_shards": 0,
"delayed_unassigned_shards": 0, "number_of_pending_tasks":
0, "number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100 }
If we configured our fluentd output to elasticsearch, we can check incoming indices as well.
[~]$ curl
http://elasticsearch:9200/\_cat/indices?v

health status index uuid pri rep docs.count docs.deleted
store.size pri.store.size green open logstash-2018.04.21
QgGbsv3AQMSwVb2hsOzW-Q 5 1 3873 0 4.1mb 2.1mb green open
logstash-2018.04.22 CLc8clbnQcW5hBDxrsXNNA 5 1 129811 2827
102.3mb 56.2mb green open .kibana Y3y0eh7qQ5iBQ92oRk03rw 1 1
2 0 12.4kb 6.2kb
Aggregating different log types into the same index can cause problems, so we recommend that you use index/logtype naming.
Here is an example configuration of fluentd with Elasticsearch output with log type prefixes.
<match **>
  @id elasticsearch
  @type elasticsearch
  @log_level info
  include_tag_key true
  host elasticsearch
  port 9200
  logstash_prefix ${tag}
  <buffer>
    @type file
    path /var/log/fluentd-buffers/elasticsearch.all.buffer
    flush_mode interval
    flush_interval 5s
    flush_thread_count 3
    retry_max_interval 60
    retry_forever
    retry_type exponential_backoff
    chunk_limit_size 20
    queue_limit_length 5
    overflow_action block
  </buffer>
</match>
Unfortunately, we don't have the space to introduce you to every feature of Elasticsearch, but there are plenty of useful How-tos on its official page.

Kibana

There is a great Open Source tool from Elastic, named Kibana, that allows you to visualize your data. With it, you can spot anomalies and trends in a user friendly way. Setting up kibana from the stable Helm repository is easy.
[~]$ helm install stable/kibana
--name kibana --set
env.ELASTICSEARCH_URL=http://elasticsearch:9200
kibana

Monitoring legacy applications

Most of the applications we push to Kubernetes with Pipeline are legacy applications that are not monitoring friendly. These are mostly applications where you can't create custom metric endpoints for a Prometheus exporter, but which have log messages that contain all the necessary information. To expose this information to Prometheus, you can use the prometheus input plug-in to fluentd (which we also use to monitor our logging deployments), then custom metrics can be set-up via a filter plugin. Here is a simple example filter that counts response code by method type
<filter message>
  @type prometheus
  <metric>
    name response_code_count
    type counter
    desc The total number of requests labeled by status code and method
    key code
    <labels>
      method ${method}
      code ${code}
    </labels>
  </metric>
</filter>
Example metrics output
# TYPE response_code_count counter

# HELP response_code_count The total number of requests labeled by status code and method

response_code_count{code=&quot;200&quot;,method=&quot;GET&quot;} 6
response_code_count{code=&quot;201&quot;,method=&quot;DELETE&quot;} 1
That's it. Afterward, you can use this data the same way you might use exporter output. You can even create dashboards and alerts like we did in our Prometheus related posts.

Conclusion

The emerging Kubernetes ecosystem has made it much easier to handle monitoring and logging in a standardized way. In this post we've tried to demonstrate the power of these tools without modifying a single bit of your application. We hope you enjoyed it, and we'll continue our efforts to provide out of box solutions for your Kubernetes deployment. Charts and configurations are available in our helm chart repository
http://kubernetes-charts.banzaicloud.com/
We also use our own container images with plugins pre-installed
Flunt-bit - upstream master branch build Fluentd - stable build with plugins
Subscribe card background
Subscribe
Subscribe to
the Shift!

Get emerging insights on emerging technology straight to your inbox.

Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach

Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.

thumbnail
I
Subscribe
Subscribe
 to
the Shift
!
Get
emerging insights
on emerging technology straight to your inbox.

The Shift keeps you at the forefront of cloud native modern applications, application security, generative AI, quantum computing, and other groundbreaking innovations that are shaping the future of technology.

Outshift Background