Collect & visualize your logs with Logstash, Elasticsearch & Redis

Update of December 6th : although Logstash does the job as a log shipper, you might consider replacing it with Lumberjack / Logstash Forwarder, which needs way less resources, and keep Logstash on your indexer to collect, transform and index your logs data (into ElasticSearch) : check out my latest blog post on the topic.

Kibana Dashboard

Kibana Dashboard

Even if you manage a single Linux server, you probably already know how hard it is to keep an eye on what’s going on with your server, and especially tracking logs data. And this becomes even worse when you have several (physical or virtual) servers to administrate.

Logstash logo

Although Munin is very helpful monitoring various informations from my servers / VMs, I felt the need of something more, and bit less static / more interactive.

There are 3 kind of logs I especially wanted to track :

  • Apache 2 access logs
  • iptables logs
  • Syslogs

After searching arround on the internet for a great tool that would help me, I read about the open source log management tool Logstash which seems to perfectly suit a (major) part of my needs : logs collecting / processing.

For the purpose of this post, I will take the following network architecture and assume and I want to collect my Apache, iptables, system logs from servers 1/2/3 (“shippers”) on server 4 (“indexer”) and visualize them :

Logstash architecture

As you can see, I am using 4 complementary applications, the role of each one being :

  • Logstash : logs collector, processor and shipper (to Redis) on log “shippers” 1-3 ; logs indexer on server 4 (reads from Redis, writes to Elasticsearch)
  • Redis : logs data broker, receiving data from log “shippers” 1-3
  • Elasticsearch : logs data persistent storage
  • Kibana : (time-based) logs data visualization (graphs, tables, etc.)

Installation

As shown on the schema above, I will describe how to install all of Logstash + Redis + Elasticsearch + Kibana on the same “indexer” server. You may want to separate these on different servers for any reason, just set the correct IPs / hostnames accordingly, in the examples below.

Redis

Redis

First of all, let’s install Redis on our indexer server (right, that’s #4 on the schema). As the versions of Redis available in Linux distributions repositories are not up to date, we’ll download the last stable release from Redis’ website :

$ sudo aptitude install gcc
$ wget http://download.redis.io/releases/redis-2.6.16.tar.gz
$ tar xzf redis-2.6.16.tar.gz
$ cd redis-2.6.16
$ make MALLOC=libc
$ sudo cp src/redis-server /usr/local/bin/
$ sudo cp src/redis-cli /usr/local/bin/

Launch redis-server (sudo redis-server), try to ping Redis to see if server is working :

$ redis-cli ping

If you get a PONG reply, your Redis server works fine. You might want to install Redis more properly, if so, follow this excellent guide at Redis.io.

You’re now ready to ship logs data from your servers to Redis. Note that Redis listens on it’s default port (tcp/6379) and accepts incoming connections from any IP :

$ netstat -tanpu|grep redis
tcp   0   0   0.0.0.0:6379   0.0.0.0:*   LISTEN   16955/redis-server

Logstash (shippers)

You will need to set an instance of Logstash on each of your servers you want to collect data from, it will act as a “logs shipper”.

Open a shell on one of the servers you want to collect log data from, and dowload logstash.

$ sudo mkdir /opt/logstash /etc/logstash
$ sudo cd /opt/logstash
$ sudo wget https://download.elasticsearch.org/logstash/logstash/logstash-1.2.2-flatjar.jar

Create a Logstash config file in /etc/logstash :

input { stdin { } }
output { stdout { codec => rubydebug } }

Now launch the logstash agent and type something, you should get something like this :

$ java -Xmx256m -jar logstash-1.2.2-flatjar.jar agent -f logstash-test.conf
hello world
{
  "message" => "hello world",
  "@timestamp" => "2013-11-17T18:35:56.672Z",
  "@version" => "1",
  "host" => "myhostname"
}

Logstash works fine, let’s configure it to work with our previously-installed Redis instance. Create a new config file :

input { stdin { } }
output {
  stdout { codec => rubydebug }
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

You’ll of course need to replace “10.0.0.5” by the IP of your server Redis is running on.

Launch logstash agent with logstash-redis.conf as config file and type something as above. Then, on your indexer server (where Redis is installed), launch redis-cli :

redis 127.0.0.1:6379> LLEN logstash
(integer 1)
redis 127.0.0.1:6379> LPOP logstash
"{\"message\":\"hello redis\",\"@timestamp\":\"2013-11-17T20:35:13.910Z\",\"@version\":\"1\",\"host\":\"myhostname\"}"

Here it is, we’ve got our message transmitted by Logstash to our Redis server. You’ve probably noticed that Logstash added a few fields to our initial (minimalistic) data (@timestamp, @version and host).

Now that we’ve got Logstash able to send data to Redis, we can begin processing our Apache 2 and iptables logs.

Apache 2 logs processing

Create a new config file in /etc/logstash :

input {
  file {
    path => "/var/log/apache2/*access.log"
    type => "apache"
  }
}

filter {
  if [type] == "apache" {
    grok {
      pattern => "%{COMBINEDAPACHELOG}"
    }
  }
}

output {
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

This config is quite self-explanatory ; few things although :

  • type => “apache” allows us to use conditionals further
  • pattern => “%{COMBINEDAPACHELOG}” is a built-in regex-like used to match against our Apache logs lines and extract fields (request, host, response, etc.)

Launch the logstash agent, and you’re done. It’s that simple ! You should now see the logstash list count grow in Redis (LLEN logstash) as your Apache gets hits.

iptables logs processing

There is no built-in grok pattern available to extract data from iptables logs, but there’s one available in Logstash’s cookbook config snippets.

Create a directory where you will keep your custom grok patterns (i.e. /usr/share/grok/patterns) and create a new file called iptables :

# Source : http://cookbook.logstash.net/recipes/config-snippets/
NETFILTERMAC %{COMMONMAC:dst_mac}:%{COMMONMAC:src_mac}:%{ETHTYPE:ethtype}
ETHTYPE (?:(?:[A-Fa-f0-9]{2}):(?:[A-Fa-f0-9]{2}))
IPTABLES1 (?:IN=%{WORD:in_device} OUT=(%{WORD:out_device})? MAC=%{NETFILTERMAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip}.*(TTL=%{INT:ttl})?.*PROTO=%{WORD:proto}?.*SPT=%{INT:src_port}?.*DPT=%{INT:dst_port}?.*)
IPTABLES2 (?:IN=%{WORD:in_device} OUT=(%{WORD:out_device})? MAC=%{NETFILTERMAC} SRC=%{IP:src_ip} DST=%{IP:dst_ip}.*(TTL=%{INT:ttl})?.*PROTO=%{INT:proto}?.*)
IPTABLES (?:%{IPTABLES1}|%{IPTABLES2})

You’ll also need to declare this directory in Logstash’s config file (see below). Now let’s process our iptables logs, create or edit a logstash config file :

input {
  file {
    path => [ "/var/log/syslog" ]
    type => "iptables"
  }
}

filter {
  if [type] == "iptables" {
    grok {
      patterns_dir => "/usr/share/grok/patterns/iptables"
      pattern => "%{IPTABLES}"
    }
  }
}

output {
  # Check that the processed line matched against grok iptables pattern
  if !("_grokparsefailure" in [tags]) {
    redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
  }
}

Actually, despite the very useful Grok Debugger, I couldn’t get these this pattern working. Plus, you will have to guess one way or another wether the log line is a REJECT, DROP, ACCEPT or whatever.

To make this simpler, you may use iptables rules like this :

iptables -N LogAndDrop
iptables -A LogAndDrop -p tcp -j LOG --log-prefix "RULE 1 -- DROP " --log-level=info
iptables -A LogAndDrop -j DROP

You can also create rules for REJECT / ACCEPT rules following this one.

Good thing is that your iptables log lines will now be prefixed with a “DROP” (or REJECT / ACCEPT), allowing you to process these log lines in a different way, measuring ACCEPT vs. DROP/REJECT count for instance. Here is the grok pattern you can use :

IPTABLES (.*RULE \d? -- (%{WORD:action})?.*SRC=(%{IP:src_ip}).*DST=(%{IP:dst_ip}).*PROTO=(%{WORD:protocol}).*SPT=%{INT:src_port}?.*DPT=%{INT:dst_port}?.*)

The following fields will be extracted for you iptables logs :

  • action = depending on what you set in your custom iptables rules, may be REJECT, DROP, ACCEPT …
  • src_ip = source IP address
  • dst_ip = destination IP address
  • protocol = protocol (TCP, UDP, ICMP, etc.)
  • src_port = source port number
  • dst_port = destination port number

You’ll probably notice that not all the data available in the logs is exctracted, feel free to adapt the grok pattern upon your specific needs.

Note that if you decide to create a “log & accept” iptables rule, it’s definitely NOT a good idea to systematically use it instead of the regular ACCEPT one. You’d rather use it to track connections from specific IP addresses ranges for example.

system logs (syslog) processing

Edit your existing one or create a new Logstash config file :

input {
  file {
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
    type => "syslog"
  }
}

output {
  redis { host => "10.0.0.5" data_type => "list" key => "logstash" }
}

As each log line may have a different format, they will each be stored “as is” in the “message” field in Elasticsearch. Anyway, this will not prevent you from analyzing this data (by example getting the number of (un)successful authentications from auth.log).

Elasticsearch

Elasticsearch
Thanks to a Debian package available on Elasticsearch’s official download page, a few command lines only will be sufficient to get it up and running :

$ sudo aptitude install openjdk-7-jre-headless
$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.7.deb
$ sudo dpkg -i elasticsearch

Elasticsearch should have started automatically, open your browser and reach http://yourhostname:9200/. If everything wen fine, you should get a JSON response looking like this :

{
  "ok": true,
  "status": 200,
  "name": "Alibar",
  "version": {
    "number": "0.90.7",
    "build_hash": "36897d07dadcb70a865b7f149e645ed3d44eb5f2",
    "build_timestamp": "2013-11-13T12:06:54Z",
    "build_snapshot": false,
    "lucene_version": "4.5.1"
  },
  "tagline": "You Know, for Search"
}

If necessary, you can tune Elasticsearch’s run parameters in /etc/default/elasticsearch and configuration parameters in /etc/elasticsearch/[elasticsearch,logging].yml.

Note for OpenVZ users

After (too much) hours of searching and trying various configurations, I still couldn’t get my Elasticsearch running in an OpenVZ Debian container, or more precisely, it won’t listen for incoming (HTTP) connections on it’s default port 9200 (visible process, but nothing with netstat).

It actually seems to be a common issue with Java running in an OpenVZ container, and I finally found a solution in this post from OpenVZ forums.

In short, edit your CT config file (usually /etc/vz/conf/CTID.conf), comment out the CPUS line and add a CPULIMIT line as following :

CPUUNITS="1000"
# CPUS="1"
CPULIMIT="100"

(Re)start your container, Elasticsearch should now work fine.

Logstash (indexer)

Thanks to a comment from DJP78, I realized that I forgot to explain how to configure Logstash on the indexer side : pulling logs data from Redis and storing them into Elasticsearch.

Here is the Logstash config you can use (note that I also process local [indexer] system logs) :

input {
  file {
    type => "syslog"
    path => [ "/var/log/*.log", "/var/log/messages", "/var/log/syslog" ]
  }
  redis {
    host => "127.0.0.1"
    data_type => "list"
    key => "logstash"
    codec => json
  }
}
output {
  elasticsearch { bind_host => "127.0.0.1" }
}

You can check if Logstash is correctly doing his job on the indexer, by either watching the list size decrease in Redis (redis-cli and then LLEN logstash) or searching your Elasticsearch index via a HTTP GET request : http://yourElasticSearchHostname:9200/_search?q=_index%20like%20logstash%25&sort=@timestamp:desc.

Kibana

Finally, let’s install Kibana. Kibana is a modern & dynamic (AngularJS based) frontend for Logstash / Elasticsearch, allowing you to get charts, tables, etc. from your collected logs data.

Kibana

All you need to use Kibana is a HTTP web server and access to Elasticsearch’s port 9200 (from your browser).

It’s installation is quite straight-forward :

$ sudo aptitude install git
$ cd /var/www
$ git clone https://github.com/elasticsearch/kibana.git kibana

Now open http://yourhostname/kibana/ in your browser. Tada !

Note that if Elasticsearch is not installed on the same server (or available through the same hostname) as Elasticsearch, you’ll need to configure it’s hostname (and possibly port) in config.js at Kibana’s root.

On first launch, Kibana offers you to use a “Logstash dashboard”, click on the link. You can now see your logs data in a table, try to activate some useful fields in the left column, or create your first graph :-).

tl;dr

  • Download Logstash on all your “shippers” and your “indexer”
  • Install and launch Redis on your “indexer”
  • Install and launch Elasticsearch on your “indexer”
  • Clone Kibana git repository on your “indexer” in /var/www
  • Create Logstash config files for your shippers and indexer (see above), launch all Logstash instances

About 

Freelance PHP Symfony2 & Magento developer, passionate about programming and electronics.

  • googleplus
  • twitter
  • Ccryshna

    Can i use it for production environment ? Server : Ubuntu Based ??? awaiting for response Thanks in advance.

    • Bruno Andrade

      yes, you can use it in your production environment but you need to have two things in mind… the quantity of logs your environment generates and buffering in logstash shippers.
      ElasticSearch is written in java, so, if your infrastructure generates a massively quantity of data/logs, you’ll get performance issues, but, that can be fixed by increasing JVM heap size and adding more ElasticSearch nodes.

  • DJP78

    Nice article, but I think you missed one logstash instance : the one which pull log from redis and push it to elasticsearch !

    • Michael BOUVY

      Indeed, you’re totally right. I just added the missing part to the post 😉

      • DJP78

        Oh, many thanks. Sounds good to me now.

        I wonder what is the purpose of putting redis between logstash-shipper (remote) and ES. What are the pros/cons doing directly RSYSLOG (remote) -> Logstash (codec syslog) -> ES ?

        • Michael BOUVY

          Actually, Apache2 logs are not syslogs, and it does not seem to be that easy to send them to a remote syslog server.

          Anyway, your point is right, we could definitely use Logstash’s syslog input plugin to receive syslogs from (in my example) shippers, without needing to run a Logstash instance on each shipper.

  • Very useful. Thank you so much!

    • renu

      had you configure on your server… ?

  • Pingback: Links & reads for 2013 Week 47 | Martin's Weekly Curations()

  • Pingback: Ship logs to Logstash with Lumberjack / Logstash Forwarder | Michael Bouvy()

  • Colin Burnell

    Hi I have tried to install, although have had probs at the first hurdle installing redis. Can anyone assist? Regards

    Colin

  • Kushal

    Hello,

    i am able to receive logs on Tcp in logstash server and also able to dump it in elastic search.

    Can you guide me on how can i filter the received logs and add to specific index and type so that i can search it in elastic search .

    Also i need you help on to create geoip point in elastic search by filtering log .

    Thanks .

  • wolkus

    Thanks for the excellent article on Logstash and ES. I had a problem in processing logs. In my environment I am downloading logs in to a folder which is set in Logstash input plugin to watch upon. I have a cron job that deletes older logs from the folder. Now what is happening is some of the log files are not read by logstash. Can you help me solve this.

  • Dowwie

    Since both Redis and ES are key/value NoSQL data stores, isn’t it redundant to have both in the stack? Did you look into consolidating to one?

    • Casper

      Redis feeds input into logstash, not ES directly. It’s useful to be able to restart logstash without losing data.

  • Rico Lelina

    I’m new to logstash and ElasticSearch and a little confused. We have an existing app that sends metrics data to Solr. We now need to replace Solr with ElasticSearch via logstash. I was thinking there would be a Java or REST API for logstash so I can just make a Java call or REST/HTTP call to send data. Is there such a thing?

  • Marcel van den Brink

    Can you elaborate on the hardware requirements? What can you advise… or what did you use?

  • JD

    Can you help me reading events from places.sqlite using the sqlite plugin? Usually the problem is the locking nature of .sqlite database.

  • tdecaux

    Pay attention the timestamp indexed in ES is not the log event date time BUT the logstach fetch date time… Also, the default mapping is shit, you have to rewrite your own (byte sent => integer, response status => integer, timestamp => date etc…) else it’s useless. Also the default analyser may not suit your needs, so you can customize it.

  • Pingback: Implementando a stack ELK (ElasticSearch, Logstash e Kibana) no CentOSRicardo Martins | Ricardo Martins()

  • Ryan

    Quick question:

    If i had to ship Syslog as well as Apache logs from a single host to a Broker,do i need to create multiple logstash configuration files on that host under /etc/logstash? Ex: /etc/logstash/logstash-Apache.conf & /etc/logstash/logstash-syslog.conf ?

    Regarding Elasticsearch in your setup,did you rely on the default mapping or did you have to customize or create a custom mapping ?

  • Thank you so much for such a nice write-up! Do ur best!

  • cyqui

    Wow another nice blog post. Gonna definetly refer to it later and give a try to this complete recipe.

  • Mahendiran Vel

    Hi all,

    How can i get response time of apache log

  • Ding Lei

    Hi, Could you explain more about why put the Redis between forwarder and Logstash?

  • Pingback: luottoa()

  • Pingback: steroidi anabolizzanti()

  • Pingback: Woolrich Polar Parka()

  • Pingback: lån svar direkt()

  • David Li

    Hello Michael, Great work! Just one question, what is the purpose of Redis, if i want to use one instance of logstash as processor and shipper, the other one as indexer like u did, can I ship JSON from one logstash instance to the other directly? If yes, do you have any suggestions on how to do that? THanks!

  • Andrés Álvarez

    Awesome post! Extremely helpful.

    I was wondering if it’s possible to make an architecture that removes the Logstash shippers and makes the Logstash instance agent on the Indexer server to collect the logs from other machines where the shippers used to be? This way it is not necessary to install a Logstash instance on every other machine.

  • renu

    i just want to reload persistence logs in logstash could you help me out from it.

  • Robin Ersek-Obadovics

    If you find Logstash isn’t working out for you for any reason, you may still consider using NXLog – https://nxlog.co/products/nxlog-community-edition – the open source log management software, which is available as a free download. It scales well and provides high-performance even when running on thousands of servers. As a multi-platform tool it also allows log collection from Linux, Windows, Android and more operating systems. A great alternative, especially if you are looking for such features.