Logging Aggregation

The below is currently outdated, and is kept for archival purposes. If logging aggregation is something that we want to set up again (and it likely is), feel free to refer to this documentation. This setup, however, has some flaws:

  • Graylog2 requires that you predefine how you want to tokenize log entries. That is, Graylog2 doesn’t support free-form text search. This is problematic since we’d need to define what we want to look for before we look for it.
  • Graylog2 has no good (existing) plugins to ingest logs. There are two well-supported options that I found: the graylog2 injestor and fluentd. The graylog2 injestor is written in Java, and so requires a JVM. This is problematic for a low-powered device like a raspberry pi, which doesn’t have much RAM. The option option, fluentd, is written in ruby. However, it can only be installed with its own ruby installation, which weighs in at > 250 MB.

The ACM used to use Graylog2 and Fluentd to aggregate log files.

Setting up machines to submit logs

Note: as of this writing td-agent is 215MB. There are lighter setups which can be used if need be (I think). But probably don’t use this instructions on a beaglebone.

Run curl https://packages.treasuredata.com/GPG-KEY-td-agent | apt-key add - and echo "deb http://packages.treasuredata.com/2/debian/jessie/ jessie contrib" > /etc/apt/sources.list.d/treasure-data.list. Note, you may have to insert [amd64] after deb, if apt-get update fails.

Immediately run service td-agent stop, since by default, the configuration will listen on various ports. And that’s bad.

As root, run td-agent-gem install fluent-plugin-secure-forward.

Setting up the recieving server

There are only three ports which your server should be listening on for logging:
80 (to redirect to HTTPs), 443 (for the web interface), 24284 (for fluentd, explained later).

There are auxiliary services that will need to be installed in order to get the logging system working. Many of these services will try to bind to 0.0.0.0. Don’t let them! Many of these services DO NOT provide authentication, and so it is vital that they are bound to 127.0.0.1, so that they are inaccessible from the outside world.

As of this writing, everything needed to install graylog is packaged for debian, though you’ll need to fetch some other repos.

See: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/setup-repositories.html http://www.fluentd.org/guides/recipes/graylog2 http://docs.graylog.org/en/1.3/pages/installation/operating_system_packages.html#debian-8

Note, graylog requires elasticsearch 1.7, but NOT elasticsearch 2. The fluetnd.org page says that you need to install .deb files manually. Don’t do that! There are officail elasticsearch repos for debian jessie. Use them.

For security purposes, it’s best to install services one at at time, so that each one can be secured in turn.

When installing fluentd, be sure to remove the default configuration, since it will listen on ports we don’t want it listening on. Fluentd is set up using the http://docs.fluentd.org/articles/forwarding-over-ssl secure forwarding plugin. It listens over TLS for log messages. Be sure to set up a secret key, which must be shared among all machines which are to send logging data, but try to keep the key secret, if possible. (i.e. don’t stick it in the metapackages.) The key ensures that someone can’t simply toss data at our fluentd instance, and pretend that it’s coming from our machines. You also need to hand fluentd a (signed) SSL cert, so that servers sending data to the fluetnd reciever can’t be man-in-the-middle’d.

Mongo and elasticsearch can both be made to do all listening on 127.0.0.1 by changing their configuration files. Be careful with elasticsearch, since it will try to do service discovery at 0.0.0.0, so set network.host: 127.0.0.1, instead of just one of the IPs. Additionally, disable elasticsearch service discovery. We can point graylog directly at our elasticsearch instance. We’re not using elasticsearch in a high-availability configuration. Elasticsearch’s cluster_name should also be changed to graylog2.

Graylog has two components: server and web. Server is the API that does all the data processing, web is the user interface that communicates with the programatic interface exposed by server.

In order to make graylog-web listen on 127.0.0.1, you need to modify an environment variable in /etc/default/graylog-web. I know, kinda derpy.

Now, you should set up apache or nginx (preferably nginx, because Russians are cool) to terminate the SSL on the web interface, and proxy a public HTTPS server (on port 443, 80) to graylog-web at localhost:9000.

Table Of Contents

Previous topic

Ebola

Next topic

LXC and Docker DIY

This Page