Musings++

Something something programming

API Logging with Graylog2 - PHP Logging

This is Part 2 of the API Logging with Graylog2 series. View Part 1

Now that you have the backend components configured for logging, it's time to set up and configure GELF - the Graylog Extended Log Format.

Step 1: Composer

GELF support for PHP is only available via Composer. The installation instructures are pretty straight forward, so I won't attempt to go into too much detail - Composer does an excellent job of covering the basics. Composer Installation Instructions.

Once that's done and set up, you'll need to set up your composer.json file as follows:

{
    "require": {
        "graylog2/gelf-php": "0.1.*"
    }
}

Then just run composer install or composer update if you already had a composer.json file.

What this will do is grab the gelf-php libs and toss it into a ./vendor/ directory wherever the composer.json file exists. It will also configure an auto-loader so that you don't have to figure out what files to include.

Step 2: Log

Now that we've got the library where we want it, we can go ahead and start the logging!

$transport = new Gelf\Transport\UdpTransport('127.0.0.1');
$publisher = new Gelf\Publisher();
$publisher->addTransport($transport);

$message = new Gelf\Message();
$message->setShortMessage('some log message')
        ->setLevel(6);

$publisher->publish($message);

That's all there is to logging to Graylog2. However, there are a lot more things that you can add to your message to give your log a bit more substance.

Customizing your message attributes

One of the things that isn't really documented very well (with the library at least) is what exactly can constitute the message. The Message object includes a few additional methods that you can use to get the most out of Graylog2.

  • setShortMessage - Just a short descriptive message about the log.
  • setFullMessage - This is where you could include any backtraces or additional dumps.
  • setLevel - The "severity" of the log. It follows the standard syslog levels
  • setAdditional - This method accepts two args, the first being a custom key, the second being the value. This is a neat way of adding new information to your log (API keys for example).

Personally, I think Graylog2 is a phenomenal way to achieve a proper logging system - something that is often overlooked when you're in "app dev" phase. I've talked about planning when I talked about Lines of code as a metric and logging is definitely one of the most easily overlooked features - that's super easy to add from the beginning. Logging provides you, not just a way to track errors, but also track progress. Imagine tracking your API usage with Graylog2 and watching the requests/hour steadily rise. And then, because you thought about logging from the beginning, you can easily display the additional attribute "api_key" and "execution_time" that you've been logging to keep a better eye on your server.


API Logging with Graylog2 - Server Setup

This is a two-part piece. Normally, I try and stay away from these, but the setup process can be a little long. The second piece will go live tomorrow and will contain information about how to interact with the system that you set up.

I recently hit a rather interesting problem and I thought I'd share some researching around the matter. I was tasked with a simple feature:

Add additional metrics to an API logging mechanism

While it sounds simple enough, the additional metrics would cause our current table structure (logging in MySQL) some issues adding tons of rows and causing our api logging table to grow quite extensively. We were worried, to the point of unusability. So, as often happens when a developer is tasked with a "simple" task - the quick feature billowed into a rather monstrous task.

How do we overhaul our current logging infrastructure so that we can:
1. Add the new metrics that we want to track
2. Don't ever run into this issue again

After a bit of thought, we decided to spent the time working on our API logging system with the idea that we can roll out more logging to the rest of the site. Instead of just cobbling something together using our MySQL instance we decided to move our logging infrastrucutre to something a little more robust - namely Graylog2.

Before we go any further I just want to point out that this tutorial is NOT for getting a production ready variant of Graylog2 running. There is a LOT more configuration that should go into getting this running in a production environment. This is exclusively a local testing environment to see what all the hubbub with Graylog2 is about.

Graylog2

If you've never heard of Graylog2 think you better be prepared to have your socks knocked off.

Installation

The installation for Graylog2 is a little more invovled if you've only been used to getting lamp/lemp stacks running. There's config files that need configuring, applications that need installing and even a bit of finger-crossing. To make things a little easier for myself, I've opted to do the full Graylog2 install in a Vagrant VM. If you've never used Vagrant before, I recommend you check it out - it's very easy to get started with.

Step 1: Oracle JRE

If you get to this point and you're going to argue with me over the Oracle JRE vs Open-jdk then just use what you'd like. To be honest I couldn't care less, and I don't know either of the following apps will either. The only reason I'm installing Oracle Java is because out of the two I'd rather go with the official Java for something like this.

This is super easy thanks to the webupd8team's ppa.

sudo add-apt-repository ppa:webupd8team/java  
sudo apt-get update  
sudo apt-get install oracle-java7-installer  

If you get an error saying that "add-apt-repository" is not a valid command, just install the python-software-properties and software-properties-common . Then go back and add the repo.

sudo apt-get install python-software-properties software-properties-common  

The Java installer will take a bit of time to get through, and during it you'll have to accept a couple license agreements to continue.

Step 2: Grab MongoDB

Installing MongoDB is pretty simple, especially considering the Mongo team gives us an official repo. They have full instructions for this at this url: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian/

Step 3: Elasticsearch

Once you have Java set up, you'll want to download the right version of Elasticsearch. Graylog2 relies on Elasticsearch but it requires a very specific version of it. If you have the wrong version, nothing will work and you'll end up with errors about Graylog2 not being able to read the entire message. It's a confusing error but it has to do with the way Graylog2 and Elasticsearch exchange information.

For this little walkthrough, I'm setting up the latest stable version of the Graylog2 server version 0.20.1 - This requires version 0.90.10 of Elasticsearch so go ahead and do the following

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.10.tar.gz  
tar zxf elasticsearch-0.90.10.tar.gz  
cd elasticsearch-0.90.10  

Now you'll want to edit the Elasticsearch config file located here: config/elasticsearch.yml.

You'll want to configure as followed:

cluster.name: elasticsearch  
transport.tcp.port: 9300  
http.port: 9200  

These are just the default settings, I've just uncommented them in the configuration file by deleting the #

Once that's done you can start up elastic search by running

./bin-elasticsearch -f

Step 4: Graylog2-server

Next you'll need to grab the Graylog2-server component. This is what actually handles the logging mechanism. Below, we'll download the version of the server we want, extract the tar.gz and move into that directory. Then we'll simply copy over the Graylog configuration file to its appropraite place and then pop in to edit it.

wget https://github.com/Graylog2/graylog2-server/releases/download/0.20.1/graylog2-server-0.20.1.tgz  
tar zxf graylog2-server-0.20.1.tgz  
cd graylog2-server-0.20.1  
sudo cp graylog2.conf.example /etc/graylog2.conf  
sudo vim /etc/graylog2.conf  

All we realy want to do here is confiugre the password_secret which is a salt that we'll be using to hash passwords. The configuration file has an example setup you can do to generate a salt.

Next we want to set the rootpasswordsha2 hash. This again, is pretty straight forward, and the configuration file shows you how to do this in the comments right above where you'd be setting these values.

Next you'll want to scroll down to line 70 and uncomment the elasticsearch_cluster_name and ensure that it is set to elasticsearch (or something else if you changed the name of the cluster in the elasticsearch.yml file.

With that done it's time to start up the Graylog2 server for the first time! Graylog recommends that you start it as follows the first time :

sudo java -jar graylog2-server.jar --debug  

Eventually you'll see a line saying that it started and is not doing anything. At this point you can kill it and use the start script ./bin/graylog2ctl start

Step 5: Graylog2-webserver

Finally is the pretty sweet Graylog2 webserver. This actually lets us configure the inputs on the server and allows us to actually SEE the data that we're logging.

wget https://github.com/Graylog2/graylog2-web-interface/releases/download/0.20.1/graylog2-web-interface-0.20.1.tgz  
tar zxf graylog2-web-interface-0.20.1.tgz  
cd graylog2-web-interface-0.20.1  
vim conf/graylog2-web-interface.conf  

At this point, we've been working entirely from our local server with a single instance of elasticsearch running. So our graylog2-server-uris should be set to http://127.0.0.1:12900.

Then you just need to set an application.secret, which is just a random salt that will be used for crypto functions.

Now you just need to start the server!

./bin/graylog2-web-interface

If all goes well, you should see a Listening for HTTP message and a port which is where the web-server is running. Now just point your browser over to http://localhost:9000 and log in with the credentials admin and the password being whatever you configured during your graylog2 server setup.

Take a look around the interface and under System/Inputs (http://localhost:9000/system/inputs) you'll want to add a new GELF/UDP Input. You can leave the defaults as is, and just give it a name so you know what it is.

With that done, you are now ready to start logging to your Graylog2 server with their GELF protocol.

In the next part, I'll cover how you can start logging to your Graylog2 instance over PHP.)


Proxies 101

At it's core, a proxy is a service that is designed to act as a "middle man".
That is, if there are two parties (Website, You) that are trying to communicate
neither of you talk to each other directly. Instead, you talk to a "proxy",
which then relays your information to the website. In return, the website
talks to the proxy, which then relays the information back to you.

Understanding Proxies

A basic conversion between YOU and MY WEBSITE if you connect through a proxy
might look something like this.

  1. You tell the proxy: "connect to http://xangelo.ca"
  2. The proxy connects to http://xangelo.ca and requests the home page
  3. I give the proxy the home page
  4. The proxy gives you the home page.

Pretty simple right?

Of course, the implications of this are enormous! I'll touch on this more later
on as there are a LOT of benefits of running behind a proxy.

Of course, not all proxies are made equal, and there are many different reasons
to use one.

Types of proxies

Proxies can be broken down depending on what layer of the OSI model
they function on. Each has its benefits and drawbacks, and depending on what
you are using a proxy for, one might be better than the other.

Layer 4 Proxies

If you're not familiar with Layer 4 of the OSI model, it's the "Transport"
layer. At this layer, the proxy doesn't see a "URL", but instead it sees IP
addresses.

Proxies are faster if they function at this stage, because as soon as they know
the IP address of what you're trying to get to, they can forward the traffic
there.

Layer 7 Proxies

This layer is the "Application" layer. At this layer, the proxy can see the
specific URL you are trying to access and even the content that is flowing
back and forth.

Forward and Reverse Proxies

In addition, you may hear about "Reverse" and "Forward" proxies. The thing is,
apart from a bit of technical jargon, these proxies function almost exactly
the same way. A forward proxy "forwards" your requests to the internet. A
"reverse" proxy forwards requests FROM the internet to a series of recipients.

Generally forward proxies have a single user that is making the request, whereas
a reverse proxy has multiple recipients that will handle the request from the
internet.

The benefits of a proxy

Personal Anonymity

By placing a proxy between you and a website, the website won't know that
YOU are accessing it. It will think the proxy is. A common use for this scenario
is to have a proxy in one country that you access. Any websites you visit
through the proxy will think you are coming from Country A, when you actually
reside in Country B.

This idea is the basis of technologies like TOR. Instead of connecting directly
to a website, you connect through what is essentially a series of proxies
and the website you are attempting to connect to will only see the details of
the LAST proxy. Of course, your traffic is still traceable through the proxy
chain, but every hop makes it a LOT harder.

Load Balancing

Load balancing is exactly what it sounds like. Imagine you're given a 300 pound
barbell. Instead of holding it in one hand, you use both your hands. This way,
you're not putting unnecessary stress on one hand - you're balancing the load
across both hands.

When it comes to web services, we "Load balance" to direct traffic between
multiple servers so that one server is trying to handle everything. This allows
us to have smaller servers, but also to have redundancy. If one server is not
functional, we can still serve our users because we have another one.

If you have a fairly simple web app (The entirety can be on a single server),
you can have multiple servers behind a proxy, you can use a Layer 4 Reverse
Proxy to balance traffic between them.

If you have a complicated web application, you can have a single URL, and then
using a Layer 7 Reverse Proxy you can route the user to different components
of your application. If a "Service Oriented Architecture" system, a Layer 7
Reverse Proxy is pretty essential.

Security

Today, everyone runs an anti-virus to protect themselves from malware and
viruses. However, when you run a company, buying anti-virus and managing it for
hundreds of users is a LOT of cost. It would be better if you could do it all
in once place.

By routing all requests through our proxy server, we can can the content that
websites send back and run it through virus detection software we can ensure
that all employees aren't getting viruses and malware sent back to them.

Proxies are, in addition, what power the black/white listing functionalities
that you find at various companies. With a blacklist, you can enter a URL or
IP address (Layer 7 or 4 respectively) and users that attempt to access those
websites are given stopped.

Data Leakage Prevention

In addition to stopping malware and viruses from getting in to a company, a
proxy can stop data from leaving a company as well. In large corporations,
data theft is a big issue and being able to detect data loss. By passing
data leaving the company through a DLP solution (part of which includes a
proxy) it allows you to stop data leakage before it happens.