Logstash: Geoip resolution using a REST api

Consistent geoip lookups in microservice environments

Logstash comes pre-packaged with a geoip filter plugin and a Maxmind Geolite2 database. But when running Logstash in a container environment, updating the bundled database can be a hazzle. If you need geoip lookups in other systems, a rest service is the perfect fit to deliver consistent geoip data.

Geoip API rest service

During my time at Shopping24 we developed and open-sourced a rest api for geoip information that is fed by a Maxmind geoip database. The REST service served not only Logstash but also other system in need of geoip information (e.g. fraud detection).

The Maxmind databases are available for free. A more accurate and detailed version can be purchased. The database files are updated weekly, but updating them on a bunch of servers was a hazzle. To ease this, I updated the project last week:

  1. A Docker container is available on Docker Hub
  2. The current Maxmind Geolite2 City database is bundled within the container
  3. The container is built weekly with an updated database. If you use the latest tag in your deployments, you’re always up to date.

Using the Geoip-API in Logstash

We’ll set up Logstash in a Docker-Compose environment. Besides a recent version of Logstash, we launch the _geoip-api. The Logstash container get’s his configuration (default.conf) mounted from the host’s disk. The Logstash container accepts log file data via the Beats protocol on port 5044 from Filebeat instances.

# docker-compose.yaml
# -------------------------------------------------
version: "3.7"
services:
  logstash:
    image: docker.elastic.co/logstash/logstash:7.3.1
    ports:
      - 5044:5044
    volumes:
      - /etc/logstash/pipeline/default.conf:/usr/share/logstash/pipeline/default.conf:ro
    depends_on:
      - geoip-api
  geoip-api:
    image: observabilitystack/geoip-api:latest

Enriching log data with geoip data

We’ll assume that our Logstash receives Nginx or Apache access logs via the Beats protocol. We’ll parse the log lines and forward the clientip to the geoip-api service, which returns geo information as json. We’ll merge the returned JSON with the log message.

# default.conf
# -------------------------------------------------
# Accept http access log lines via Beats protocol.
input {
    beats {
        port => 5044
    }
}

filter {

    # Parse the http access log using a predifined
    # Grok pattern.
    grok {
        match => { "message" => "%{COMMONAPACHELOG}" }
    }

    # lookup geoip and anonymize for non rfc1819 ip addresses
    if [clientip] and [clientip] !~ /^10\./  {

        # contact the geoip micro service on port 8080
        # and send the extracted clientip as path parameter
        http {
            url => "http://geoip-api:8080/%{clientip}"

            # place the returned json fields unter the
            # geoip field.
            target_body => "geoip"
            target_headers => "geoip_headers"
        }

        # anonymize your clientip to comply with
        # privacy protection
        fingerprint {
            source => "clientip"
            target => "clientip"
            method => "IPV4_NETWORK"
            key => "24"
        }
    }
}

output {
    # push your metrics to elasticsearch or graylog here
}

Wrap up

That’s all the steps it takes to enrich your log files with geo location data in an easy to maintain environment. The weekly builds of the geoip-api service are a easy way to stay up to date with Maxmind’s Geolite databases. Tools included are:

Torsten Bøgh Köster

Looking for an experienced search & operations engineer to build, tune and ship your search engine? Need a hand running large scale distributed systems or containers in Kubernetes? Let’s talk!