Setting up fluentd elasticsearch and Kibana on azure

A decent way too go through log files can come handy in various ways. From debugging bugs, investigating problems on production environment to figuring out where performance bottle necks are. For simple cases when there is only one log file involved you can go very far with using grep/awk/sed magic or any decent editor capable of handling regex in large files.

It becomes tricky however when your logs are scattered through several files, maybe use different log format or stamp entries with different timezones. Fortunately there are services like Splunk, Loggly and many others that take away great amount of pain involved in aggregating distributed log entries and searching through them.

I was searching once for a free alternative that I could quickly get up and running in our test environment where we had 10+ machines running 20+ web sites or services that we’re all part of a single product. That’s when I found out about Kibana, logstash and how they make use of elasticsearch.

Back then I considered using fluentd instead of logstash however most of the machines we were running windows and that is not where fluentd shines. Nowadays I’m using linux vm mostly that’s why I decided to use fluentd.

Setup a vm for elasticsearch

We need a machine that will store our aggregated log entries so let’s create one using Azure Cross-Platform Command-Line interface. After installing it with npm install azure-cli -g we need to introduce ourself to it. The simplest way to do this is to issue azure account download which will open a browser to download your profile: After that you’ll need to import the downloaded file with npm account import /path/to/your/credentials.publishsettings.

I’m used to ubuntu so let’s find an machine image to use:

azure vm image list | grep Ubuntu-14
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140122.1-alpha2-en-us-30GB                          Public    Linux  
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140226.1-beta1-en-us-30GB                           Public    Linux  
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140326-beta2-en-us-30GB                             Public    Linux  
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140414-en-us-30GB                                   Public    Linux  
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140416.1-en-us-30GB                                 Public    Linux  
data:    b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140528-en-us-30GB                                   Public    Linux

The last one looks like most recent so we’ll used that. Next we need to create a virtual machine. We can use either password authentication (which isn’t recommended) or a ssh certificate. To use the latter we need to create a certificate key pair first. Openssl command line tool will do it for us:

openssl req -x509 -nodes -days 730 -newkey rsa:2048 -keyout ~/.ssh/mymachine.key -out ~/.ssh/mymachine.pem

It’s now time to issue azure to create a new instance of Ubuntu machine for us:

azure vm create machine-dns-name b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140528-en-us-30GB machineuser \
  --vm-size extrasmall \
  --location 'North Europe' \
  --ssh 22 --no-ssh-password --ssh-cert ~/.ssh/mymachine.pem

info:    Executing command vm create
+ Looking up image b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04-LTS-amd64-server-20140528-en-us-30GB
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info:    vm create command OK

We can find out more details about our newly created machine with: azure vm list --dns-name machine-dns-name --json. Our machine is up and running to so we can log into it using ssh (before you do that be sure to change .key file permission to 0600 so that only you can access it): ssh -i ~/.ssh/mymachine.key [email protected]

Install elasticsearch

First let’s download elasticsearch and extract it to our newly created vm:

curl https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.0.zip -o elasticsearch-1.2.0.zip
unzip elasticsearch-1.2.0.zip

To run elasticsearch we’ll need java in version at least 1.7:

sudo apt-get update
sudo apt-get install default-jre

Now we can finally run elasticsearch:

./elasticsearch-1.2.0/bin/elasticsearch

If we want to run elasticsearch as a daemon we can pass -d to the script. (see below for init.d script)

We also need to open port for elasticsearch:

azure vm endpoint create --endpoint-name elasticsearch machine-dns-name 9200 9200

We only wan’t our machines to be able to push logs to elasticsearch that’s why we need to add ACL rules to the endpoint we created above. Unfortunately I couldn’t find a way to do this through CLI interface so we need to resort to web interface.

Install Kibana

Kibana is a web application designed to perform elasticsearch searches and display them using neat interface. It communicates with elasticsearch directly that’s why it’s not really suited to be deployed to publicly accessible place. Fortunately there is kibana-proxy which provides authentication on top of it. To run kibana-proxy we’ll need node.js let’s install both of them:

sudo apt-get update
sudo apt-get install git
sudo apt-get install nodejs
sudo apt-get install npm

git clone https://github.com/hmalphettes/kibana-proxy.git
cd kibana-proxy
git submodule init
git submodule update
npm install
node app.js &

kibana-proxy is now running but in order to access it from outside we need to open firewall port:

azure vm endpoint create --endpoint-name kibana-proxy machine-dns-name 80 3003

Now when you point your browser to http://machine-dns-name/ you should see kibana welcome screen.

Restrict access to kibana-proxy

You probably don’t want anyone to be able to see your logs that’s why we need to setup authentication. kibana-proxy implements this through Google OAuth - so let’s enable it for our installation. First we’ll need APP_ID (Client ID) and APP_SECRET (Client secret) so let’s go to google developer console and create a new project. Next we need to enable OAuth for our project in APIs & auth > Credentials:

Make sure AUTHORIZED REDIRECT URI looks like http://machind-dns-name.cloudapp.net/auth/google/callback. For more secure setup you should be using https but for simplicity we’ll skip that.

Now we need to start kibana-proxy again this time passing it APP_ID, APP_SECRET along with AUTHORIZED_EMAILS through environment variables (kibana-proxy-start.sh):

export APP_ID=fill_me_appid
export APP_SECRET=fill_me_app_secret
export AUTHORIZED_EMAILS=my_email_list
exec /usr/bin/nodejs app.js

Now when you go to `http://machine-dns-name/` you should get redirect to Google authorization page.

We want to run the kibana-proxy continuously - the simplest way to achieve that is to make use of forever module. However in this guide we’ll use upstart scripts described below.

Fluentd installation

To push log entries to our elasticsearch instance we need fluentd agent installed on the machine we want to gather logs from. Fluentd agent comes in 2 forms: fluentd gem if you wish to have more control features and updates and td-agent if you care for stability over new features. We’ll go with fluentd gem for this guide.

Since fluentd is implemented in Ruby with most performance critical parts written in C. To install it we need ruby first but obtaining it with rvm is a piece of cake:

curl -sSL https://get.rvm.io | bash -s stable
source ~/.rvm/scripts/rvm
rvm install 2.1.1
sudo gem install fluentd --no-ri --no-rdoc
sudo fluentd -s

Now we have fluentd installed with sample configuration available in /etc/fluentd/fluent.conf.

Since we would like our log entries to be pushed to elasticsearch we need a plugin for that too:

gem install fluent-plugin-elasticsearch
gem install fluent-plugin-tail-multiline

Let’s edit fluent.confg file so that our agents reads sample log files from rails app and forwards them to elasticsearch instance:

<source>
  type tail_multiline
  format /^(?<level_short>[^,]*), \[(?<time>[^\s]*) .(?<request_id>\d+)\]\s+(?<level>[A-Z]+)[-\s:]*(?<message>.*)/
  path /path/to/log/file/staging.log
  pos_file /path/to/positions/file/fluentd_tail.pos
  tag app_name.environment
</source>

<match app_name.**>
  type elasticsearch
  host machine-dns-name.cloudapp.net
  logstash_format true
  flush_interval 10s # for testing
</match>

The format we see above will following line of Rails log file:

I, [2014-05-22T12:24:44.434963 #11837]  INFO -- : Completed 200 OK in 884ms (Views: 395.7ms | ActiveRecord: 115.0ms)

and break it down to following groups/fields:

Key	Value
time	2014/05/22 12:24:44
level_short	I
request_id	11837
level	INFO
message	Completed 200 OK in 884ms (Views: 395.7ms \| ActiveRecord: 115.0ms)

Note that although builtin in_tail plugin supports multiline log entries I either don’t completely understand how to use it or there is no way to have full stack trace logged as a single message. Moreover tail_multiline doesn’t seem to support multiple input paths that’s why in order to monitor multiple files you would have to duplicate the source section.

The only thing left is to start fluentd and watch our logs

rvmsudo fluetnd -d

Upstart for fluentd, elasticsearch and kibana-proxy

The final thing we need to do is to setup above components to be run as services. Since we’re using Ubuntu for our servers we’ll use upstart congiurations for that.

Upstart for fluentd

Since we have installed fluentd in ubuntu user rvm but we wan’t we have to generate a wrapper script for it

rvm wrapper default fluentd

and then use generated wrapper ~/.rvm/wrappers/default/fluentd in our upstart script. Let’s create our upstart script and put it inside /etc/init/fluetnd.conf

description "Fluentd"
start on runlevel [2345] # All except below
stop on runlevel [016] # Halt, Single-User, Reboot
respawn
respawn limit 5 60 # Restart if ended abruptly
exec sudo -u ubuntu /home/ubuntu/.rvm/wrappers/default/fluentd

Upstart for elasticsearch

Elasticsearch already comes with a script for starting it in properly configured jvm we’ll utilize it in our upstart config /etc/init/elasticsearch.conf

description "Elasticsearch"
start on runlevel [2345] # All except below
stop on runlevel [016] # Halt, Single-User, Reboot
respawn
respawn limit 5 60 # Restart if ended abruptly
setuid ubuntu
exec /path/to/your/elasticsearch/bin/elasticsearch

Upstart for kibana-proy

With the kibana-proxy-start.sh the upstart configuration for kibana-proxy is dead simple:

description "kibana-proxy"
start on runlevel [2345] # All except below
stop on runlevel [016] # Halt, Single-User, Reboot
respawn
respawn limit 5 60 # Restart if ended abruptly
chdir /where/you/installed/kibana-proxy
setuid ubuntu
exec ./kibana-proxy-start.sh

Running services

Once you have your upstart scripts configuration ready you can now start services with a descriptive syntax:

sudo service elasticsearch start

I you’re having trouble getting upstart to work you can always check its configuration with:

initctl check-config

To find out why a particular service won’t start check out

dmesg | grep elasticsearch

Finally you can find you log files in /var/log/upstart/servicename.*

Now if you have setup everything correctly and of course you have some log messages generated by apps when you navigate to http://machine-dns-name.cloudapp.net/index.html#/dashboard/file/logstash.json you should see a nice dashboard filling up with log entries: kibana in action

Published 8 Jun 2014

TDD fan eager to learn new things. Always focused, always eager to ask why?Piotr Mionskowski