Ruby Script to Search Apache Logs for High Frequency Clients
I wrote a quick Ruby script to scour through my Apache access logs and look for IPs that are hitting my site too frequently, e.g., bad bots, etc. The command line arguments are simple:
$ ruby find-frequent-clients.rb \ --apache-access-log=/path/to/your/log \ --seconds=3600 \ --request-limit=7200 \ --log-time-zone=PST
That command is going to find any client IPs that are hitting my web server in the last 10 minutes more twice or more per second. The output will be a line separated list of IP addressess (optionally with a hit count if --show-count=1 is added). Here’s how it works:
-
require 'date'
-
require 'time'
-
# Process command line arguments. Filter only args starting with –
-
args = {}
-
$*.each do |arg|
-
spl=arg.split("=")
-
if spl[0][0..1] == "–"
-
args[spl[0][2..spl[0].length-1].gsub("-","_").intern]=spl[1]
-
end
-
end
-
-
# Check that we have the bare essentials to proceed
-
raise "You must specify the full path to an Apache access log file with –apache-access-log" unless args[:apache_access_log]
-
raise "You must specify the maximum amount of recent seconds to consider with –seconds" unless args[:seconds]
-
raise "You must specify the maximum requests allowed per #{args[:seconds]} seconds with –request-limit" unless args[:request_limit]
-
raise "You must specify the time zone of the Apache logs with –log-time-zone e.g., EST" unless args[:log_time_zone]
-
raise "The Apache access log file specified does not exist or is not readable: #{args[:apache_access_log]}" unless FileTest.readable?(args[:apache_access_log])
-
-
# Open the file and read the lines in reverse; exit once time stamps are beyond our time threshold
-
file = File.open(args[:apache_access_log], "r")
-
log_array = []
-
log_snapshot = file.readlines
-
file.close
-
start_time = Time.now.to_i
-
log_snapshot.reverse_each do |line|
-
line_array = line.split(" ")
-
date_time = line_array[3][1..line_array[3].length-1]
-
date_time[11] = " "
-
date_time = Time.iso8601(DateTime.parse(date_time+" "+args[:log_time_zone]).to_s).to_i
-
if date_time > (start_time - args[:seconds].to_i)
-
log_array << [line_array[0], date_time]
-
else
-
break
-
end
-
end
-
-
# Use a hash to collect the counts of the IPs
-
log_hash = Hash.new(0)
-
log_array.each do |log|
-
log_hash[log[0]]+=1
-
end
-
-
# collect the offenders in an array
-
offenders = log_hash.to_a.collect{|h| h if h[1] > args[:request_limit].to_i}.compact
-
-
# output the offending IPs, 1 per line; optionally show the offending count
-
offenders.each{|o| puts o[0].to_s+"#{" => "+o[1].to_s if args[:show_count]}"}
Note: This makes the assumption that your logs are in the format: aa.bb.cc.dd - - [datetime]