Ruby Script to Search Apache Logs for High Frequency Clients

I wrote a quick Ruby script to scour through my Apache access logs and look for IPs that are hitting my site too frequently, e.g., bad bots, etc. The command line arguments are simple:

$ ruby find-frequent-clients.rb \
--apache-access-log=/path/to/your/log \
--seconds=3600 \
--request-limit=7200 \
--log-time-zone=PST

That command is going to find any client IPs that are hitting my web server in the last 10 minutes more twice or more per second. The output will be a line separated list of IP addressess (optionally with a hit count if --show-count=1 is added). Here’s how it works:

File: find-frequent-clients.rb
  1. require 'date'
  2. require 'time'
  3. # Process command line arguments.  Filter only args starting with –
  4. args = {}
  5. $*.each do |arg|
  6.   spl=arg.split("=")
  7.   if spl[0][0..1] == "–"
  8.     args[spl[0][2..spl[0].length-1].gsub("-","_").intern]=spl[1]
  9.   end
  10. end
  11.  
  12. # Check that we have the bare essentials to proceed
  13. raise "You must specify the full path to an Apache access log file with –apache-access-log" unless args[:apache_access_log]
  14. raise "You must specify the maximum amount of recent seconds to consider with –seconds" unless args[:seconds]
  15. raise "You must specify the maximum requests allowed per #{args[:seconds]} seconds with –request-limit" unless args[:request_limit]
  16. raise "You must specify the time zone of the Apache logs with –log-time-zone e.g., EST" unless args[:log_time_zone]
  17. raise "The Apache access log file specified does not exist or is not readable: #{args[:apache_access_log]}" unless FileTest.readable?(args[:apache_access_log])
  18.  
  19. # Open the file and read the lines in reverse; exit once time stamps are beyond our time threshold
  20. file = File.open(args[:apache_access_log], "r")
  21. log_array = []
  22. log_snapshot = file.readlines
  23. file.close
  24. start_time = Time.now.to_i
  25. log_snapshot.reverse_each do |line|
  26.   line_array = line.split(" ")
  27.   date_time = line_array[3][1..line_array[3].length-1]
  28.   date_time[11] = " "
  29.   date_time = Time.iso8601(DateTime.parse(date_time+" "+args[:log_time_zone]).to_s).to_i
  30.   if date_time > (start_time - args[:seconds].to_i)
  31.     log_array << [line_array[0], date_time]
  32.   else
  33.     break
  34.   end
  35. end
  36.  
  37. # Use a hash to collect the counts of the IPs
  38. log_hash = Hash.new(0)
  39. log_array.each do |log|
  40.   log_hash[log[0]]+=1
  41. end
  42.  
  43. # collect the offenders in an array
  44. offenders = log_hash.to_a.collect{|h| h if h[1] > args[:request_limit].to_i}.compact
  45.  
  46. # output the offending IPs, 1 per line; optionally show the offending count
  47. offenders.each{|o| puts o[0].to_s+"#{" => "+o[1].to_s if args[:show_count]}"}

Note: This makes the assumption that your logs are in the format: aa.bb.cc.dd - - [datetime]

Tags: , , ,

Leave a Reply