Understanding MySQL’s InnoDB buffer pool

MySQLFor many years now, I’ve heard a lot about famously known InnoDB’s buffer pool: what size should be set for innodb_buffer_pool_size parameter, what it contains, how it could be warmed up.

I’ve also been told several times that “once your tables are in memory, all queries are super fast as there is no more disk I/O”.

All these statements lead me to write this blog post to (try to) explain how InnoDB’s buffer pool works and what it contains.

Buffer pool content

InnoDB data is stored in 16 KB pages (blocks), either on disk (ibdata files) or in memory (buffer pool). Each one of these pages may contain one or more row.

The buffer pool is basically a cache for these pages: once a page’s content is requested by a query, the page is cached in the buffer pool.

You may be wondering what kind of data is stored in these pages. Short answer: indexes (on a size point of view).

MySQL offers a very useful INFORMATION_SCHEMA database, which contains since version 5.5 a table named innodb_buffer_page. This table holds 1 record (row) per page in the buffer pool, including interesting data such as what the page contains. Take a look at MySQL’s official documentation of this table if you want details on it’s columns and data.

Now let’s have a little fun with this table.

Number of pages in buffer pool

Query:

select count(*) from information_schema.innodb_buffer_page;

Output:

+----------+
| count(*) |
+----------+
| 262142   |
+----------+

We’ve got 262 142 pages in our buffer pool × 16 KB per page = 4 194 272 KB. This value matches innodb_buffer_pool_size parameter.

Page types in buffer pool

Query:

select
page_type as Page_Type,
sum(data_size)/1024/1024 as Size_in_MB
from information_schema.innodb_buffer_page
group by page_type
order by Size_in_MB desc;

Result:

+-------------------+--------------+
| Page_Type         | Size_in_MB   |
+-------------------+--------------+
| INDEX             | 158.66378689 |
| UNKNOWN           | 0.00000000   |
| TRX_SYSTEM        | 0.00000000   |
| SYSTEM            | 0.00000000   |
| FILE_SPACE_HEADER | 0.00000000   |
| IBUF_BITMAP       | 0.00000000   |
| EXTENT_DESCRIPTOR | 0.00000000   |
| ALLOCATED         | 0.00000000   |
| INODE             | 0.00000000   |
| BLOB              | 0.00000000   |
| UNDO_LOG          | 0.00000000   |
| IBUF_FREE_LIST    | 0.00000000   |
| IBUF_INDEX        | 0.00000000   |
+-------------------+--------------+

As you can see, merely INDEX pages are cached in the buffer pool.

Some quick explanations about the most important page types:

  • INDEX: B-Tree index
  • IBUF_INDEX: Insert buffer index
  • UNKNOWN: not allocated / unknown state
  • TRX_SYSTEM: transaction system data

But where the heck is table rows data? In an index! The clustered index, which is almost always based on table’s primary key (internally generated if missing), and stores data in it’s leaves. As nodes/leaves are sorted upon their primary key value, it is recommended to use auto increment, or an always increasing value.

Note that (except for fullscans) this index needs to be traveled using the primary key(s) of the row(s) we want to retrieve.

Also, primary key is always stored in (secondary) indexes, for InnoDB to lookup for requested row’s data.

Buffer pool usage per index

Query:

select
table_name as Table_Name, index_name as Index_Name,
count(*) as Page_Count, sum(data_size)/1024/1024 as Size_in_MB
from information_schema.innodb_buffer_page
group by table_name, index_name
order by Size_in_MB desc;

Result:

+--------------------------------------------+-----------------+------------+-------------+
| Table_Name                                 | Index_Name      | Page_Count | Size_in_MB  |
+--------------------------------------------+-----------------+------------+-------------+
| `magento`.`core_url_rewrite`               | PRIMARY         |       2829 | 40.64266014 |
| `magento`.`core_url_rewrite`               | FK_CORE_URL_... |        680 |  6.67517281 |
| `magento`.`catalog_product_entity_varchar` | PRIMARY         |        449 |  6.41064930 |
| `magento`.`catalog_product_index_price`    | PRIMARY         |        440 |  6.29357910 |
| `magento`.`catalog_product_entity`         | PRIMARY         |        435 |  6.23898315 |
+--------------------------------------------+-----------------+------------+-------------+

We can see here the clustered (PRIMARY) indexes holding rows data.

InnoDB buffer pool size

So how much memory should we allocate in the buffer pool size setting? I recommend the following rule of thumb:

rows data size + indexes size (excl. clustered) + 20%

Altough it might be a “secure” option to calculate this size based on the whole data set, you might prefer calculate it based on your working set size, which corresponds to frequently used data (do you really need those old log rows in memory?).

This is especially useful when your whole dataset wont fit into memory.

InnoDB buffer pool usage examples

Buffer pool size vs. disk reads

1024MB IDB buffer pool

In the above graph, we can clearly see the impact on disk reads (status variables innodb_buffer_pool_reads) getting closer to 0 as the buffer pool size grows.

On the contrary, in the graph below, with a buffer pool smaller than the size of the working set, disk reads remain at the same level, as old pages are constantly flushed from the buffer pool for new ones.

256MB IDB buffer pool

 

Quite obviously, buffer pool size also has a significant impact on performance (~ 1024MB data set):

256MB vs. 1024MB IDB buffer pool

InnoDB buffer pool warmup

It may take a while under a regular workload to store your entire working set data into the buffer pool, after restarting MySQL for instance.

“Manual” warmup

Running SELECT queries against your InnoDB tables will load necessary pages into memory (the buffer pool).

Therefore, SELECT COUNT(*) may be very useful, as it will load the whole clustered index into memory (actually as much as available).

Secondary indexes may be loaded into memory with simple queries, for instance by adding a “catch-all” (ie. <> 0) WHERE clause on the first column of an index. Using a given index could be forced if needed, see MySQL index hints.

Dump & restore

For those who use recent versions of MySQL (5.6+), Percona Server (5.5.10+) or MariaDB (10.0+), automatic buffer pool content dump at shutdown and restore on startup can be enabled.

In MySQL 5.6+, define the following configuration variables:

  • innodb_buffer_pool_dump_at_shutdown=ON
  • innodb_buffer_pool_load_at_startup=ON

What is Docker and how to use it ?

I must admit that it took me some time to really understand what is Docker and the concepts behind it.

Docker logo

This post’s goal is to share my experience with people who heard of Docker and would like to know more about it, or understand it better.

Docker’s website describes it as “an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.”.

This may be a bit abstract at first, so I’ll try to explain what Docker is (and is not) from my developer point of view :

  • Docker is built on top of LXC, and thereforce runs containers, not VMs as Virtualbox for instance

  • Docker containers are made of portable “images”, similar to LXC/VZ templates, but much more powerful (versionning, inheritance …)

  • Docker “images” can easily be created via Dockerfile’s, where are set the base image and the steps to run in order to create your image

  • Docker allows to have run multiple instances of your container without needing to copy the image (base system) files

  • Docker daemon (which manages / runs LXC containers) provides a REST API used by Docker CLI utility … but this REST API can be used by any application (read doc here)

  • Docker runs on virtually all operating systems (Linux, Mac OS, Windows …) and platforms (Google Cloud Platform, Amazon EC2) : read more about installing Docker

Oh, and I forgot to tell you Docker is developed in Go (see sources on GitHub) !

Let’s run our first containers

Once you’ve installed Docker on your computer, you can now create your first container (check that docker -d is running) :

$ sudo docker run ubuntu /bin/echo hello world

If you see “hello world”, that’s it !

So, you may be wondering, what is this docker run command actually doing ?

First, it downloads the necessary Ubuntu image (keep this keyword in mind) to run your container.

Then, it creates a container running the Ubuntu image, “starts it” and runs the supplied command (/bin/echo hello world) and prints the output, before “stopping” it.

Note that Docker actually did a lot more, but let’s keep it simple for now (see “Run ‘hello world‘” section here).

docker run command accepts many command-line options, let’s have a quick look at some very useful ones :

  • -d : run container in detached mode, printing the resulting container id

  • -i : keep stdin open, ie. when you want to take control of a shell within the container

  • -t : allocate a pseudo-tty (useful when running a shell)

  • -name : specify container’s name ; usage : -name myubuntu

  • -w : set current working directory inside the container (create it if not present) ; usage : -w /some/path/

  • -expose : expose a specified port in container’s network interface ; usage : -expose <port-number>

  • -p : forward a port from from host to container ; usage : -p 127.0.0.1:8080:80 (will bind local port 8080 to container’s port 80)

Running your container with shell command as argument, in detached mode, will give a similar behavior as a regular LXC :

$ sudo docker run -d -i -t ubuntu /bin/bash

This command should output the newly created container id. You may also get it (in it’s short version) running sudo docker ps (only active containers are shown, use -a option to show all) :

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
ecd101259f8f        ubuntu:12.04        /bin/bash           2 seconds ago       Up 2 seconds                            focused_franklin

Focused franklin” container is up and running Ubuntu 12.04 since 2 seconds ! Try to attach it :

$ sudo docker attach <your-container-id>

Install “ping” utility (yes, image really is a bare Ubuntu) running apt-get update && apt-get install ping net-tools, and try to ping www.google.com. Wow, network is already fully working, you can see details with a simple ifconfig.

Exit (and stop) the container by pressing CTRL+D or typing exit.

You might also want to get some more informations about any of your container or image by running :

$ sudo docker inspect <container-id>

Cleaning up

If you wish to clean / remove all the Docker containers you’ve created, run this line (this will remove all your existing containers) :

$ sudo docker ps -a | grep ^[0-9a-f] | cut -d " " -f 1 | xargs sudo docker rm

Images, containers …

So far, you may still be wondering what’s the difference between images and containers (see this Wikipedia article about OS-level virtualization if you are not familiar with containers).

In my opinion, images are what make Docker a really powerful and interesting tool.

Think of Docker images as the evolved version of LXC/OpenVZ templates : an image is a read-only layer, containing an operating system with some applications for instance, on top of which your container will be built (with its own read-write layer).

Very important thing to keep in mind : images can never be altered. Each change made to an image actually creates a new image that references the one it’s been built from.

This offers a nice inheritance system : you can create & maintain images that reference each other. For example, this Elasticsearch image in Docker index references a Java image, which references a custom Ubuntu image, on top of the Docker’s base Ubuntu image. See the power of Docker ?

You can even visualize this with little help of Graphviz’s dot utility ; I pulled both ubuntu:latest and debian:wheezy images, here is the images graph that I get :

See, even Debian and Ubuntu image reference a same base image !

Layers

Let’s say you create an image based on Ubuntu, and add (install) PHP in it, a new image will be created with base Ubuntu image as parent (see explicit schemas at Docker.io).

Any container using this newly created image will therefore be made of 3 layers :

  • Container’s read-write layer

  • “PHP install” image’s read-only layer

  • Ubuntu image’s read-only layer

This union of read-only/read-write layers is called Union File System.

Within the container, no changes can be made on the images files : these actually create a copy of the modified file(s) inside the container’s read-write layer.

You can use the docker diff command to view added (A) / changed (C) / deleted (D) files in your container’s read-write layer, in comparison with it’s image. Here’s an example :

$ sudo docker diff <container-id>
C /dev
A /dev/kmsg
C /tmp
C /tmp/hsperfdata_root
C /usr
C /usr/share
C /usr/share/elasticsearch
A /usr/share/elasticsearch/data
A /usr/share/elasticsearch/data/elasticsearch
A /usr/share/elasticsearch/data/elasticsearch/nodes
A /usr/share/elasticsearch/data/elasticsearch/nodes/0
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/_state
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/_state/global-3
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/_state
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/_state/state-4
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/index
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/index/_checksums-1391120105322
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/index/segments.gen
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/index/segments_4
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/translog
A /usr/share/elasticsearch/data/elasticsearch/nodes/0/indices/index/0/translog/translog-1391119827027
...

Creating an image with container’s changes

You made changes in your container and would like to save the result as an image ? Nothing simpler to do with the docker commit command.

Tags

Last but not least, images have tags, allowing you to identify a specific version of an image.

For instance, base Ubuntu image has 3 tags (versions) available : latest (default), quantal (12.10) and precise (12.04).

You can specify which tag you want to use by using this syntax : image:tag (ie. docker pull ubuntu:latest).

Docker images index

When you begin using Docker, you’ll pull (more on docker pull command) images from the main Docker index, browseable through https://index.docker.io/ (and docker search command).

You can also use private repositories (on top of your own/private Docker registry) : read more about private repositories in the official documentation.

The Dockerfile

You have several ways to build your custom image. I already talked about the “docker commit” way (see above), let’s talk about the Dockerfile way.

A Dockerfile simply is a set of instructions to run over a base image. Docker will build an image from your Dockerfile, which will result in a final image.

Note that an intermediate image is built after each instruction is ran, so you can revert to any step of the build process.

Elasticsearch Dockerfile sample

Let’s take a sample Dockerfile that simply install and runs Elasticsearch on a base Debian Wheezy image, in which I commented each line :

# You must specify a base image ; in this example,
# tag wheezy of image debian will be pulled
# from Docker’s public repository
FROM        debian:wheezy
# Who maintains this Dockerfile
MAINTAINER  Michael BOUVY <michael.bouvy@gmail.com>
# Each RUN command creates a new version of your image.
# We use it here to install necessary tools / applications
# needed by Elasticsearch
RUN         apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -y install adduser openjdk-7-jre-headless
# Download the Elasticsearch DEB file into /tmp
# Prefer using this method instead of downloading
# (ie. with wget) the file, so Docker can cache it
ADD        https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.10.deb /tmp/elasticsearch.deb
# And install it with RUN command
RUN         dpkg -i /tmp/elasticsearch.deb
# Expose port of container
EXPOSE      9200
# Default command to run within container
CMD         ["/usr/share/elasticsearch/bin/elasticsearch", "-f"]

The resulting image is available in Docker Index and the source Dockerfile on GitHub.

As you can see, each line is a couple of an instruction (RUN, CMD, etc.) and it’s arguments.

Let’s now build a container from our newly created Dockerfile :

$ sudo docker build -t elasticsearch .

And run it (in detached mode) :

$ sudo docker run -d elasticsearch

Note the container ID, and get it’s IP address :

$ sudo docker inspect <container-id> | grep IPAddr | cut -d "\"" -f 4

Try to reach http://<container-ip-address>:9200 in your browser, you should see Elasticsearch’s output.

Ship logs to Logstash with Lumberjack / Logstash Forwarder

In my previous post, I explained how to set up Logstash instances on your servers, acting as logs data shippers.

However, as you may already have noticed, Logstash instances have a non-negligible memory footprint on your servers, preventing it’s use where memory is limited. Furthermore, you must have Java installed on each platform you want to run Logstash.

This is where Logstash Forwarder (formerly Lumberjack) becomes interesting : this small tool, developed in Go, allows to securely ship compressed logs data (to a Logstash “indexer” for instance), with minimal resources usage, using the Lumberjack protocol.

I’ve hence decided to replace all of the Logstash “shipper” instances by Logstash Forwarder. This also means no more using Redis as a logs data broker, as Logstash Forwarder won’t talk with Redis (no encryption support). In consequence, if your Logstash indexer stops running, you may lose data once Logstash Forwarder’s spool max size is reached.

Installing Logstash Forwarder

To install Logstash Forwarder on your log shippers, we’ll need to compile it from sources : the full procedure is very well described in the project’s readme. I strongly recommend that you compile it once and make a package (either RPM or DEB) so yo can easily deploy it on all of your other servers.

Init script

Once installed from a package, Logstash Forwarder is located in /opt/logstash-forwarder. We’ll use the init script available in LF’s repository to handle startup :

$ cd /etc/init.d/
$ sudo wget https://raw.github.com/elasticsearch/logstash-forwarder/master/logstash-forwarder.init -O logstash-forwarder
$ chmod +x logstash-forwarder
$ update-rc.d logstash-forwarder defaults

SSL certificate generation

First of all, we’ll need to generate a SSL certificate that will be used to secure communications between your shippers and indexer :

$ openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout logstash-forwarder.key -out logstash-forwarder.crt

Now move freshly created logstash-forwarder.key in /etc/ssl/private/ and logstash-forwarder.crt in /etc/ssl/certs/. Note that you’ll need both of these files on each of your shippers and indexer.

Configuration

We’re now ready configure Logstash Forwarder : config file is in JSON format, and will preferably be saved in /etc/logstash-forwarder (yes, it’s a file), as it’s the location defined in the init script we installed above.

Note : if you need to override any of the init script’s parameters (ie. config file location), create a file /etc/default/logstash-forwarder and set your custom parameters.

In the following configuration example, we’ll assume you want to track your iptables and Apache logs data, and that your indexer’s IP is 10.0.0.5 :

{
  "network": {
    "servers": [ "10.0.0.5:5043" ],
    "ssl certificate": "/etc/ssl/certs/logstash-forwarder.crt",
    "ssl key": "/etc/ssl/private/logstash-forwarder.key",
    "ssl ca": "/etc/ssl/certs/logstash-forwarder.crt"
  },
  "files": [
    {
      "paths": [ "/var/log/syslog" ],
      "fields": { "type": "iptables" }
    },
    {
      "paths": [ "/var/log/apache2/*access*.log" ],
      "fields": { "type": "apache" }
    }
  ]
}

iptables logs filtering

To avoid processing and transmitting all of syslog’s file data to our indexer, I recommend to filter your iptables log entries to have them in a separate file.

First of all, you need to have a specific criteria to filter on ; you may simply add “IPTABLES” to the log-prefix value of your iptables log rules, so it looks something like :

/sbin/iptables -A LogAndDrop -p tcp -j LOG --log-prefix "IPTABLES RULE 1 -- DROP" --log-level=info

If using rsyslog, you’ll have to create an iptables.conf file in /etc/rsyslog.d/ (usually all files in this directory will be read by rsyslog) and set up a very basic filtering rule :

if $programname == 'kernel' and $msg contains 'IPTABLES' then /var/log/iptables.log

Restart rsyslog. You can now replace the iptables log file path in your Logstash Forwarder config file.

Indexer side : Logstash configuration

Next step, edit the config of Logstash on your indexer server, and add the following input :

lumberjack {
  port => 5043
  type => "logs"
  ssl_certificate => "/etc/ssl/certs/logstash-forwarder.crt"
  ssl_key => "/etc/ssl/private/logstash-forwarder.key"
}

Also add these filters to extract fields from logs data :

filter {
  if [type] == "apache" {
    grok {
      pattern => "%{COMBINEDAPACHELOG}"
    }
  }

  if [type] == "iptables" {
    grok {
      patterns_dir => "/usr/share/grok/patterns/iptables"
      pattern => "%{IPTABLES}"
    }
  }
}

You may also want logs data not to be stored in Elasticsearch if Grok patterns didn’t match. In this case, add the following in the output section, surrounding your output plugins (elasticsearch for instance) :

if !("_grokparsefailure" in [tags]) {
  elasticsearch { bind_host => "10.0.0.5" }
}

Collect & visualize your logs with Logstash, Elasticsearch & Redis

Update of December 6th : although Logstash does the job as a log shipper, you might consider replacing it with Lumberjack / Logstash Forwarder, which needs way less resources, and keep Logstash on your indexer to collect, transform and index your logs data (into ElasticSearch) : check out my latest blog post on the topic.

Kibana Dashboard

Kibana Dashboard

Even if you manage a single Linux server, you probably already know how hard it is to keep an eye on what’s going on with your server, and especially tracking logs data. And this becomes even worse when you have several (physical or virtual) servers to administrate.

Logstash logo

Although Munin is very helpful monitoring various informations from my servers / VMs, I felt the need of something more, and bit less static / more interactive.

There are 3 kind of logs I especially wanted to track :

  • Apache 2 access logs
  • iptables logs
  • Syslogs

After searching arround on the internet for a great tool that would help me, I read about the open source log management tool Logstash which seems to perfectly suit a (major) part of my needs : logs collecting / processing.

For the purpose of this post, I will take the following network architecture and assume and I want to collect my Apache, iptables, system logs from servers 1/2/3 (“shippers”) on server 4 (“indexer”) and visualize them :

Logstash architecture

As you can see, I am using 4 complementary applications, the role of each one being :

  • Logstash : logs collector, processor and shipper (to Redis) on log “shippers” 1-3 ; logs indexer on server 4 (reads from Redis, writes to Elasticsearch)
  • Redis : logs data broker, receiving data from log “shippers” 1-3
  • Elasticsearch : logs data persistent storage
  • Kibana : (time-based) logs data visualization (graphs, tables, etc.)

Installation

As shown on the schema above, I will describe how to install all of Logstash + Redis + Elasticsearch + Kibana on the same “indexer” server. You may want to separate these on different servers for any reason, just set the correct IPs / hostnames accordingly, in the examples below.

Redis

Redis

First of all, let’s install Redis on our indexer server (right, that’s #4 on the schema). As the versions of Redis available in Linux distributions repositories are not up to date, we’ll download the last stable release from Redis’ website :

$ sudo aptitude install gcc
$ wget http://download.redis.io/releases/redis-2.6.16.tar.gz
$ tar xzf redis-2.6.16.tar.gz
$ cd redis-2.6.16
$ make MALLOC=libc
$ sudo cp src/redis-server /usr/local/bin/
$ sudo cp src/redis-cli /usr/local/bin/

Launch redis-server (sudo redis-server), try to ping Redis to see if server is working :

$ redis-cli ping

If you get a PONG reply, your Redis server works fine. You might want to install Redis more properly, if so, follow this excellent guide at Redis.io.

You’re now ready to ship logs data from your servers to Redis. Note that Redis listens on it’s default port (tcp/6379) and accepts incoming connections from any IP :

$ netstat -tanpu|grep redis
tcp   0   0   0.0.0.0:6379   0.0.0.0:*   LISTEN   16955/redis-server

Logstash (shippers)

You will need to set an instance of Logstash on each of your servers you want to collect data from, it will act as a “logs shipper”.

Open a shell on one of the servers you want to collect log data from, and dowload logstash.

$ sudo mkdir /opt/logstash /etc/logstash
$ sudo cd /opt/logstash
$ sudo wget https://download.elasticsearch.org/logstash/logstash/logstash-1.2.2-flatjar.jar

Create a Logstash config file in /etc/logstash :

input { stdin { } }
output { stdout { codec => rubydebug } }

Now launch the logstash agent and type something, you should get something like this :

$ java -Xmx256m -jar logstash-1.2.2-flatjar.jar agent -f logstash-test.conf
hello world
{
  "message" => "hello world",
  "@timestamp" => "2013-11-17T18:35:56.672Z",
  "@version" => "1",
  "host" => "myhostname"
}

Logstash works fine, let’s configure it to work with our previously-installed Redis instance. Create a new config file :

input { stdin { } }
output {
  stdout { codec => rubydebug }
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

You’ll of course need to replace “10.0.0.5” by the IP of your server Redis is running on.

Launch logstash agent with logstash-redis.conf as config file and type something as above. Then, on your indexer server (where Redis is installed), launch redis-cli :

redis 127.0.0.1:6379> LLEN logstash
(integer 1)
redis 127.0.0.1:6379> LPOP logstash
"{\"message\":\"hello redis\",\"@timestamp\":\"2013-11-17T20:35:13.910Z\",\"@version\":\"1\",\"host\":\"myhostname\"}"

Here it is, we’ve got our message transmitted by Logstash to our Redis server. You’ve probably noticed that Logstash added a few fields to our initial (minimalistic) data (@timestamp, @version and host).

Now that we’ve got Logstash able to send data to Redis, we can begin processing our Apache 2 and iptables logs.

Apache 2 logs processing

Create a new config file in /etc/logstash :

input {
  file {
    path => "/var/log/apache2/*access.log"
    type => "apache"
  }
}

filter {
  if [type] == "apache" {
    grok {
      pattern => "%{COMBINEDAPACHELOG}"
    }
  }
}

output {
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

This config is quite self-explanatory ; few things although :

  • type => “apache” allows us to use conditionals further
  • pattern => “%{COMBINEDAPACHELOG}” is a built-in regex-like used to match against our Apache logs lines and extract fields (request, host, response, etc.)

Launch the logstash agent, and you’re done. It’s that simple ! You should now see the logstash list count grow in Redis (LLEN logstash) as your Apache gets hits.

iptables logs processing

There is no built-in grok pattern available to extract data from iptables logs, but there’s one available in Logstash’s cookbook config snippets.

Create a directory where you will keep your custom grok patterns (i.e. /usr/share/grok/patterns) and create a new file called iptables :

# Source : http://cookbook.logstash.net/recipes/config-snippets/
NETFILTERMAC %{COMMONMAC:dst_mac}:%{COMMONMAC:src_mac}:%{ETHTYPE:ethtype}
ETHTYPE (?:(?:[A-Fa-f0-9]{2}):(?:[A-Fa-f0-9]{2}))
IPTABLES1 (?:IN=%{WORD:in_device} OUT=(%{WORD:out_device})? MAC=%{NETFILTERMAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip}.*(TTL=%{INT:ttl})?.*PROTO=%{WORD:proto}?.*SPT=%{INT:src_port}?.*DPT=%{INT:dst_port}?.*)
IPTABLES2 (?:IN=%{WORD:in_device} OUT=(%{WORD:out_device})? MAC=%{NETFILTERMAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip}.*(TTL=%{INT:ttl})?.*PROTO=%{INT:proto}?.*)
IPTABLES (?:%{IPTABLES1}|%{IPTABLES2})

You’ll also need to declare this directory in Logstash’s config file (see below). Now let’s process our iptables logs, create or edit a logstash config file :

input {
  file {
    path => [ "/var/log/syslog" ]
    type => "iptables"
  }
}

filter {
  if [type] == "iptables" {
    grok {
      patterns_dir => "/usr/share/grok/patterns/iptables"
      pattern => "%{IPTABLES}"
    }
  }
}

output {
  # Check that the processed line matched against grok iptables pattern
  if !("_grokparsefailure" in [tags]) {
    redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
  }
}

Actually, despite the very useful Grok Debugger, I couldn’t get these this pattern working. Plus, you will have to guess one way or another wether the log line is a REJECT, DROP, ACCEPT or whatever.

To make this simpler, you may use iptables rules like this :

iptables -N LogAndDrop
iptables -A LogAndDrop -p tcp -j LOG --log-prefix "RULE 1 -- DROP " --log-level=info
iptables -A LogAndDrop -j DROP

You can also create rules for REJECT / ACCEPT rules following this one.

Good thing is that your iptables log lines will now be prefixed with a “DROP” (or REJECT / ACCEPT), allowing you to process these log lines in a different way, measuring ACCEPT vs. DROP/REJECT count for instance. Here is the grok pattern you can use :

IPTABLES (.*RULE \d? -- (%{WORD:action})?.*SRC=(%{IP:src_ip}).*DST=(%{IP:dst_ip}).*PROTO=(%{WORD:protocol}).*SPT=%{INT:src_port}?.*DPT=%{INT:dst_port}?.*)

The following fields will be extracted for you iptables logs :

  • action = depending on what you set in your custom iptables rules, may be REJECT, DROP, ACCEPT …
  • src_ip = source IP address
  • dst_ip = destination IP address
  • protocol = protocol (TCP, UDP, ICMP, etc.)
  • src_port = source port number
  • dst_port = destination port number

You’ll probably notice that not all the data available in the logs is exctracted, feel free to adapt the grok pattern upon your specific needs.

Note that if you decide to create a “log & accept” iptables rule, it’s definitely NOT a good idea to systematically use it instead of the regular ACCEPT one. You’d rather use it to track connections from specific IP addresses ranges for example.

system logs (syslog) processing

Edit your existing one or create a new Logstash config file :

input {
  file {
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
    type => "syslog"
  }
}

output {
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

As each log line may have a different format, they will each be stored “as is” in the “message” field in Elasticsearch. Anyway, this will not prevent you from analyzing this data (by example getting the number of (un)successful authentications from auth.log).

Elasticsearch

Elasticsearch
Thanks to a Debian package available on Elasticsearch’s official download page, a few command lines only will be sufficient to get it up and running :

$ sudo aptitude install openjdk-7-jre-headless
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.7.deb
$ sudo dpkg -i elasticsearch

Elasticsearch should have started automatically, open your browser and reach http://yourhostname:9200/. If everything wen fine, you should get a JSON response looking like this :

{
  "ok": true,
  "status": 200,
  "name": "Alibar",
  "version": {
    "number": "0.90.7",
    "build_hash": "36897d07dadcb70a865b7f149e645ed3d44eb5f2",
    "build_timestamp": "2013-11-13T12:06:54Z",
    "build_snapshot": false,
    "lucene_version": "4.5.1"
  },
  "tagline": "You Know, for Search"
}

If necessary, you can tune Elasticsearch’s run parameters in /etc/default/elasticsearch and configuration parameters in /etc/elasticsearch/[elasticsearch,logging].yml.

Note for OpenVZ users

After (too much) hours of searching and trying various configurations, I still couldn’t get my Elasticsearch running in an OpenVZ Debian container, or more precisely, it won’t listen for incoming (HTTP) connections on it’s default port 9200 (visible process, but nothing with netstat).

It actually seems to be a common issue with Java running in an OpenVZ container, and I finally found a solution in this post from OpenVZ forums.

In short, edit your CT config file (usually /etc/vz/conf/CTID.conf), comment out the CPUS line and add a CPULIMIT line as following :

CPUUNITS="1000"
# CPUS="1"
CPULIMIT="100"

(Re)start your container, Elasticsearch should now work fine.

Logstash (indexer)

Thanks to a comment from DJP78, I realized that I forgot to explain how to configure Logstash on the indexer side : pulling logs data from Redis and storing them into Elasticsearch.

Here is the Logstash config you can use (note that I also process local [indexer] system logs) :

input {
  file {
    type => "syslog"
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
  }
  redis {
    host => "127.0.0.1"
    data_type => "list"
    key => "logstash"
    codec => json
  }
}
output {
  elasticsearch { bind_host => "127.0.0.1" }
}

You can check if Logstash is correctly doing his job on the indexer, by either watching the list size decrease in Redis (redis-cli and then LLEN logstash) or searching your Elasticsearch index via a HTTP GET request : http://yourElasticSearchHostname:9200/_search?q=_index%20like%20logstash%25&sort=@timestamp:desc.

Kibana

Finally, let’s install Kibana. Kibana is a modern & dynamic (AngularJS based) frontend for Logstash / Elasticsearch, allowing you to get charts, tables, etc. from your collected logs data.

Kibana

All you need to use Kibana is a HTTP web server and access to Elasticsearch’s port 9200 (from your browser).

It’s installation is quite straight-forward :

$ sudo aptitude install git
$ cd /var/www
$ git clone https://github.com/elasticsearch/kibana.git kibana

Now open http://yourhostname/kibana/ in your browser. Tada !

Note that if Elasticsearch is not installed on the same server (or available through the same hostname) as Elasticsearch, you’ll need to configure it’s hostname (and possibly port) in config.js at Kibana’s root.

On first launch, Kibana offers you to use a “Logstash dashboard”, click on the link. You can now see your logs data in a table, try to activate some useful fields in the left column, or create your first graph :-).

tl;dr

  • Download Logstash on all your “shippers” and your “indexer”
  • Install and launch Redis on your “indexer”
  • Install and launch Elasticsearch on your “indexer”
  • Clone Kibana git repository on your “indexer” in /var/www
  • Create Logstash config files for your shippers and indexer (see above), launch all Logstash instances

Varnish : use multiple backends depending on host / URL

With the lack of public IPv4 address and the growth of virtualization, one of the commons solutions consists in using revese proxies to distribute the traffic of a physical host, having a public IPv4 address, between multiple virtual machines (VM), as shown in the schema below :

Varnish_front

In this example, we will assume that each VM hosts a HTTP server (Apache, lighttpd, nginx, etc.) listening on port 80, and that :

  • www.myhost1.com should use VM1
  • www.myhost2.com should use VM2
  • www.myhost3.com should use VM3

Varnish

Varnish, one of the best known reverse proxy engines (actually a caching HTTP reverse proxy), can be very useful in such a situation, by setting specific backends (ie. a VM) depending on the required hostname.

Varnish

I’m not going to explain in this post how to build the whole Varnish VCL file, but you’ll find very useful resources on Varnish’s wiki :  Varnish Default VCL Example.

Defining multiple backends

First of all, you will need to define the different backends that Varnish will rely on. In the example above, we have 3 VMs with each a private IPv4 in the 10.0.0.0/24 range.

At the beggining of your VCL, set your backends :

backend vm1 {
    .host = "10.0.0.11";
    .port = "80";
    .connect_timeout = 6000s;
    .first_byte_timeout = 6000s;
    .between_bytes_timeout = 6000s;
}

backend vm2 {
    .host = "10.0.0.12";
    .port = "80";
    .connect_timeout = 6000s;
    .first_byte_timeout = 6000s;
    .between_bytes_timeout = 6000s;
}

backend vm3 {
    .host = "10.0.0.13";
    .port = "80";
    .connect_timeout = 6000s;
    .first_byte_timeout = 6000s;
    .between_bytes_timeout = 6000s;
}

Using the appropriate backend

To define which backend (local HTTP server) should be used by Varnish to respond HTTP requests, we will set a few custom rules in the vcl_recv section of our VCL config file :

# Default backend is set to VM1
set req.backend = vm1;

if (req.http.host == "www.myhost2.com") {
    set req.backend = vm2;
}

if (req.http.host == "www.myhost3.com") {
    set req.backend = vm3;
}

Now restart Varnish and try to connect to one of the 3 hostnames : you should be forwarded to the appropriate backend.

NB : this post only covers HTTP reverse proxying, as Varnish does not handle HTTPS. If you need to use HTTPS, there are 2 options : using Apache as HTTPS reverse proxy (in French), or combining Varnish with another server such as Pound, which will handle the SSL communcation establishment.

EEPROM advanced usage on Arduino Uno / ATMega328

Storing values in the flash memory of your microcontroller is easy and fast : simply define any variable and assign it a value, you’re done.

But hat happens if your µC is reset ? You see where this is going, right ? EEPROM (Electrically-Erasable Programmable Read-Only Memory) is a persistent memory that allows you to store up to 1024 bytes (1 kilobyte) in your microncontroller, even when it’s turned off.

Arduino offers a native EEPROM library that allows us to easily deal with the EEPROM of the ATMega328 (or whatever Atmel µC your Arduino is running).

Writing and reading values is as simple as following :

int addr = 1;
byte myValue = 42;
byte readValue = 0;

void setup() {
  Serial.begin(9600);
  EEPROM.write(addr, myValue);
}

void loop() {
  readValue = EEPROM.read(addr); // => myValue
  Serial.println(readValue);
  delay(1000);
}

That said, I guess you’re now telling yourself that storing bytes might be quite limiting, and that you’d also like to store ints, chars, or whatever.

Remember, ATMega328 (used in the Arduino Uno) is based on Atmel’s AVR architecture, which allows us to use AVR LibC EEPROM functions, of which :

  • void eeprom_write_word (uint16_t *__p, uint16_t __value)
  • void eeprom_write_float (float *__p, float __value)
  • void eeprom_write_block (const void *__src, void *__dst, size_t __n)

As you probably already know, int variables are stored on 2 bytes, and are hence 16 bits long. Also good to know, the type uint16_t is actually (and basically) … an unsigned int !

Value from analog reading storage example

Let’s say you’d like to store into your EEPROM a value read from one of your analog inputs, which will be an integer between 0 and 1023 (10-bits ADC)  :

#include <avr/interrupt.h>
#include <avr/eeprom.h>

int addr = 1;
int sensorValue = 0;
int readValue = 0;

void setup() {
 Serial.begin(9600);
 while (!eeprom_is_ready()); // Wait for EEPROM to be ready
 cli();
 eeprom_write_word((uint16_t*)addr, sensorValue); // Let's initialize our value into EEPROM
 sei();
}

void loop() {
 sensorValue = analogRead(A0); // Value between 0 and 1023

 while (!eeprom_is_ready());
 cli();
 if(eeprom_read_word((uint16_t*)addr) != sensorValue) {
  eeprom_write_word((uint16_t*)addr, sensorValue);
 }
 sei();

 while (!eeprom_is_ready());
 cli();
 readValue = eeprom_read_word((uint16_t*)addr); // => sensorValue
 sei();

 Serial.print("Sensor value = ");
 Serial.println(readValue);
 delay(1000);
}

Before you begin

  • Always check that EEPROM is ready before reading/writing to it (eeprom_is_ready function)
  • EEPROM storage can handle a limited quantity of erase / write cycles : about 100.000 according to Atmel’s specifications
  • Always prefer the “update” functions rather than the “write” ones, as “update” checks first if the stored data is different than the data, so it erases / writes the new data only if it has changed. Edit on June 19th : update functions are not implemented on Arduino ; however, you may (as updated in the example above) check first if the value has changed before writing, or even use the EEPROMex alternative EEPROM Arduino library.
  • Read / write operations on EEPROM should never be interrupted : you should always disable/clear interrupts (cli()) before any operation and re-enable/set interrupts after (sei()).

1 KB’s not enough ? External I2C EEPROM

One kilobyte of EEPROM storage may not be enough for your project : there are several external EEPROM chips available, as Microchip’s 24AA256 32 KB I2C-enabled EEPROM.

Arduino / ATMega and interrupts

As you certainly already know, Arduino boards (Uno, Mega, Due, etc.) allow us to handle interrupts.

Arduino Uno (based on Atmel’s ATMega328 microcontroller) can handle to external interrupts on it’s pins INT 0 and INT1, mapped to Arduino’s D2 and D3 (respectively pins 4 and 5 of ATMega328 in PDIP package).

External interrupts pins on an Arduino Uno

Interesting fact, ATMega328 (and therefore Arduino Uno) can handle state change interrupts on 20 of it’s pins ; however, handling these interrupts is not as simple as it is with external ones : you need to determine which pin has generated the interrupt, for which reason, etc. Good thing, an Arduino library exists to help us handling these interrupts :  arduino-pinchangeint.

Interrupts can me triggered uppon 4 modes :

  • LOW : pin is in a low state
  • RISING : pin state goes from low to high
  • FALLING : pin state goes from high to low
  • CHANGE : pin state changes

One line of code is enough to “listen” for an interrupt on Arduino ;  for example, on the pin INT0 (which is D2), we attach an interrupt, that will call the method “myInterrupt” when the pin’s state  goes from LOW to HIGH :

attachInterrupt(0, myInterrupt(), RISING);

Please notice that although the Arduino pin is “D2”, we define here pin “0” which is the interrupt pin number (0 for INT0 / D2, 1 for INT1 / D3).

We now define the method that will be called by the interrupt (this method does not take any argument and does not return anything) :

void myInterrupt() {
  // do something ...
}

A few limitations

As interrupts are based on your microcontroler’s timers, method delay() wont work and method millis() wont increment, within  the method attached to an interrupt.

Usually, it is not recommended to run time based operations in your interrupts, which will hang your µC ; for example, serial data transmission (UART), I2C, etc.

Best practices

Using interrupts is very useful to detect user actions, as a button pressed, a keypad, or even to detect a fast state change (infrared signal cut), without having to constantly poll a pin state.

Basically, a method attached to an interrupt should be the shortest and fastest possible : a good practice consists in using interrupts to set a flag in a variable (defined as “volatile”). The execution of the matching action will be done within the main loop.

For example :

volatile int change = 0;

void main() {
  attachInterrupt(0, myInterrupt(), RISING);
}

void loop() {
  if(change == 1) {
    // do something ...
    change = 0;
  }
}

void myInterrupt() {
  change = 1;
}

Useful links

Standalone Arduino (ATMega328) on a breadboard

Arduino Uno board is really amazing for prototyping, but you will soon need to build your own Arduino-like “board” based on an ATMega328 microcontroller, especially for projects where available space is limited and you can’t fit an Uno with it’s shields.

The following Fritzing schema shows how to wire an ATMega328 (with Arduino bootloader installed) on a breadboard (click on schema for full-size image) :

Standalone Arduino on a breadboard

Parts list

These are the components you’ll need :

All these components for no more than $US 7.50. We can still do better and reduce the bill by $US 1.20, flashing a bare ATMega328 ($US 4.30) and by $US 0.50 using a ceramic resonator so the two 22 pF caps are not needed, that is as low as $US 5.80 for an Arduino Uno clone !

Not mandatory but nice to have, an additional ~220 Ω resistor and a LED will be useful for testing purposes.

If you do not have one yet, you’ll also need a 5V USB to Serial converter (FTDI cable, $US 17.95 or breakout board, $US 14.95), that you can wire a shown on the schema :

  • VCC to VCC (+5V)
  • GND to GND
  • RX to TX (pin 3)
  • TX to RX (pin 2)
  • DTR to Reset via a 100 nF (0.1 µF) capacitor

ATMega328 Arduino Uno pinout

You will probably be using Arduino’s native functions to deal with I/O pins, so here is a table mapping ATMega328 pins and Arduino’s :

ATMega328 Arduino pinout(Schema from Sparkfun)

Arduino : howto master to master I2C

I’ve been working since many weeks (months) on designing a home automation “box” project, and could hardly find a way to get my Arduinos (actually bare ATMega328 + Arduino’s Optiboot bootloader) communicating together without having the (physical) master poll all the other µCs continuously.

After lots of googling, I finally got an answer thanks to the official Arduino Forum : I should use multi-master I2C to allow all my Arduino’s to talk to each other (and that doing, interrupting receivers as it would be the case with slaves).

IC logo

Doing this is in fact pretty simple : you only to use the Arduino’s official Wire (I2C) library as following.

Master #1

#include <Wire.h>

#define I2C_ADDRESS_OTHER 0x2
#define I2C_ADDRESS_ME 0x1

void setup() {
 Serial.begin(9600);
 Wire.begin(I2C_ADDRESS_ME);
 Wire.onReceive(receiveI2C);
}

void loop() {
 delay(5000);
 Wire.beginTransmission(I2C_ADDRESS_LCD);
 Wire.write("hello world from 0x1 to 0x2");
 Wire.endTransmission();
}

void receiveI2C(int howMany) {
 while (Wire.available() > 0) {
  char c = Wire.read();
  Serial.print(c);
 }
 Serial.println();
}

Master #2

#include <Wire.h>

#define I2C_ADDRESS_OTHER 0x1
#define I2C_ADDRESS_ME 0x2

void setup() {
 Serial.begin(9600);
 Wire.begin(I2C_ADDRESS_ME);
 Wire.onReceive(receiveI2C);
}

void loop() {
 delay(5000);
 Wire.beginTransmission(I2C_ADDRESS_LCD);
 Wire.write("hello world from 0x2 to 0x1");
 Wire.endTransmission();
}

void receiveI2C(int howMany) {
 while (Wire.available() > 0) {
  char c = Wire.read();
  Serial.print(c);
 }
 Serial.println();
}

That’s all, now connect Arduino Uno’s I2C pins (A4 [SDA] and A5 [SCL]) between each board, not forgetting the pull-up (1.2kΩ is fine) resistors for both SDA & SCL.

You should then see “hello world” messages sent through serial on both of your Arduino masters.

Graph your sensors data with RRDtool

RRDtool logoAs I am, you may be using your Arduino to gather data from sensors. These could be analog sensors : temperature, humidity, light,  infrared receiver, or digital sensors : door switch (hall effect), mecanic switch, etc.

I chose to store all these data to a MySQL database (over a network connection) so I can process these later as I with. Of course, depending on how often you’re polling your sensors, you may quickly have hundred thousands or even millions entries in your database.

After giving a try to Amcharts, a Javascript graphing library, I eventually decided to use RRDtool to graph my data. That choice allows me to exclusively focus on graph generation, without worrying how data is fetched, if data is time based or not, etc.

In this article, I’ll be specifically covering temperature databases / graphing, as these are the most common (and useful) data collected.

Creating the Round Robin Database

In your favorite terminal, type the following (in your working directory, i.e. your home directory) :

rrdtool create temperatures.rrd \
--step 300 \
-b 123456789 \
DS:temp1:GAUGE:600:0:50 \
DS:temp2:GAUGE:600:0:50 \
RRA:MAX:0.5:1:288

Let’s explain the above command line :

  • rrdtool create temperatures.rrd : create database in file temperatures.rrd
  • –step 300 : we are expecting a 300 seconds (5 minutes) maximum interval between each value
  • -b 123456789 : (optional) it is useful if you plan to insert older values (with a timestamp in the past) ; otherwise, rrdtool will only accept values post the database creation date (replace 123456789 with the oldest timestamp of your data)
  • DS:temp1:GAUGE:600:0:50 :
    • add a data source called temp1 which has values between 0 and 50 (large enough for inside temp in degrees Celsius) ;
    • GAUGE means this is absolute data that shouldn’t be modified in any way by rrdtool, which is the best option for temperatures ;
    • 900 is heartbeat timeout value in seconds : if no data is added within this interval, zero value will be inserted (that will show up as blank on your graphs)
  • DS:temp2:GAUGE:600:0:50 : same thing as temp1 data source, you may add as many as needed
  • RRA:MAX:0.5:1:288 : Round Robin Archive, we define how much data we will store and for how long ;
    • MAX means that only one value (the maximal) should be taken if several are available ;
    • 0.5 should be kept as is (internal resolution) ;
    • 1 specifies that only one step is necessary to store final value, no average is made ;
    • 288 is the number of steps that we will store in our database, in our case, 288 * 300 = 86400 seconds = 1 day ; you may by example set this value to 2016 (7 days)

You should now have a file called temperatures.rrd in your current directory.

Adding data to the rrd database (rrdtool update)

This step is the easiest one. In your terminal, type :

rrdtool update temperatures.rrd N:22:23

A few explainations of this command line :

  • temperatures.rrd : database we’re adding data to
  • N:22:23 :
    • N : use current time as timestamp ; you may here specify the unix timestamp you want (as long as it’s post creation or start date of your rrd database)
    • 22 : value for the first data source (temp1)
    • 23 : value for the seocnd data source (temp2)

Note that you may also use the -t modifier to specify the data sources you’re supplying data for. For more details, you can refer to the official documentation of rrdupdate.

To keep your rrd database up to date, the best option is to set up a cron task that will update it every 5 minutes with the last data available.

Let’s make it to the next step : graphing the data !

Generating your graph

This part is where all the magic happens.

rrdtool graph temp_graph.png \
-w 785 -h 120 -a PNG \
--slope-mode \
--start -604800 --end now \
--vertical-label "temperature (°C)" \
DEF:temp1=temperatures.rrd:temp1:MAX \
DEF:temp2=temperatures.rrd:temp2:MAX \
LINE1:temp1#ff0000:"temp 1" \
LINE1:temp2#0000ff:"temp 2"

Explainations of the above command line :

  • rrdtool graph temp_graph.png : generate the graph in a file called temp_graph.png in the current directory
  • -w 785 -h 120 -a PNG : width 785, height 120, PNG format
  • –slode-mode : smooth line
  • –start -604800 –end now : graph begins 7 days ago (604800 seconds) and ends now
  • –vertical-label “temperature (°C)” : vertical axis label
  • DEF:temp1=temperatures.rrd:temp1:MAX : we are using temp1 data source from temperatures.rrd
  • DEF:temp2=temperatures.rrd:temp2:MAX : same as temp1
  • LINE1:temp1#ff0000:”temp 1″ : draw temp1 as a red line with “temp 1” label
  • LINE1:temp2#0000ff:”temp 2″ : draw temp2 as a blue line with “temp 2” label

If you’re working on a remote server, copy your newly created graph to some directory in your webserver path and reach it via it’s URL. Tada !

Here are my 4 temp sensors values graphed :

Temperatures RRDtool graph

Temperatures RRDtool graph

Thanks to this tutorial of Calomel.org for helping me understand RRDtool better and of course, do not hesitate to have a look at the official RRDtool documentation.