
Logstash comes pre-packaged with a geoip filter plugin and a Maxmind Geolite2 database. But when running Logstash in a container environment, updating the bundled database can be a hazzle. If you need geoip lookups in other systems, a rest service is the perfect fit to deliver consistent geoip data.
Geoip API rest service
During my time at Shopping24 we developed and open-sourced a rest api for geoip information that is fed by a Maxmind geoip database. The REST service served not only Logstash but also other system in need of geoip information (e.g. fraud detection).
The Maxmind databases are available for free. A more accurate and detailed version can be purchased. The database files are updated weekly, but updating them on a bunch of servers was a hazzle. To ease this, I updated the project last week:
- A Docker container is available on Docker Hub
- The current Maxmind Geolite2 City database is bundled within the container
- The container is built weekly with an updated database.
If you use the
latest
tag in your deployments, you’re always up to date.
Using the Geoip-API in Logstash
We’ll set up Logstash in a Docker-Compose environment. Besides a recent
version of Logstash, we launch the _geoip-api. The Logstash container
get’s his configuration (default.conf
) mounted from the host’s disk.
The Logstash container accepts log file data via the Beats protocol on
port 5044
from Filebeat instances.
# docker-compose.yaml
# -------------------------------------------------
version: "3.7"
services:
logstash:
image: docker.elastic.co/logstash/logstash:7.3.1
ports:
- 5044:5044
volumes:
- /etc/logstash/pipeline/default.conf:/usr/share/logstash/pipeline/default.conf:ro
depends_on:
- geoip-api
geoip-api:
image: observabilitystack/geoip-api:latest
Enriching log data with geoip data
We’ll assume that our Logstash receives Nginx or Apache access logs via the Beats protocol. We’ll parse the log lines and forward the clientip to the geoip-api service, which returns geo information as json. We’ll merge the returned JSON with the log message.
# default.conf
# -------------------------------------------------
# Accept http access log lines via Beats protocol.
input {
beats {
port => 5044
}
}
filter {
# Parse the http access log using a predifined
# Grok pattern.
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
# lookup geoip and anonymize for non rfc1819 ip addresses
if [clientip] and [clientip] !~ /^10\./ {
# contact the geoip micro service on port 8080
# and send the extracted clientip as path parameter
http {
url => "http://geoip-api:8080/%{clientip}"
# place the returned json fields unter the
# geoip field.
target_body => "geoip"
target_headers => "geoip_headers"
}
# anonymize your clientip to comply with
# privacy protection
fingerprint {
source => "clientip"
target => "clientip"
method => "IPV4_NETWORK"
key => "24"
}
}
}
output {
# push your metrics to elasticsearch or graylog here
}
Wrap up
That’s all the steps it takes to enrich your log files with geo location data in an easy to maintain environment. The weekly builds of the geoip-api service are a easy way to stay up to date with Maxmind’s Geolite databases. Tools included are: