Posts Tagged ‘apache’

Ruby Script to Search Apache Logs for High Frequency Clients

Wednesday, October 29th, 2008

I wrote a quick Ruby script to scour through my Apache access logs and look for IPs that are hitting my site too frequently, e.g., bad bots, etc. The command line arguments are simple:

$ ruby find-frequent-clients.rb \
--apache-access-log=/path/to/your/log \
--seconds=3600 \
--request-limit=7200 \
--log-time-zone=PST

That command is going to find any client IPs that are hitting my web server in the last 10 minutes more twice or more per second. The output will be a line separated list of IP addressess (optionally with a hit count if --show-count=1 is added). Here’s how it works:

File: find-frequent-clients.rb
  1. require 'date'
  2. require 'time'
  3. # Process command line arguments.  Filter only args starting with –
  4. args = {}
  5. $*.each do |arg|
  6.   spl=arg.split("=")
  7.   if spl[0][0..1] == "–"
  8.     args[spl[0][2..spl[0].length-1].gsub("-","_").intern]=spl[1]
  9.   end
  10. end
  11.  
  12. # Check that we have the bare essentials to proceed
  13. raise "You must specify the full path to an Apache access log file with –apache-access-log" unless args[:apache_access_log]
  14. raise "You must specify the maximum amount of recent seconds to consider with –seconds" unless args[:seconds]
  15. raise "You must specify the maximum requests allowed per #{args[:seconds]} seconds with –request-limit" unless args[:request_limit]
  16. raise "You must specify the time zone of the Apache logs with –log-time-zone e.g., EST" unless args[:log_time_zone]
  17. raise "The Apache access log file specified does not exist or is not readable: #{args[:apache_access_log]}" unless FileTest.readable?(args[:apache_access_log])
  18.  
  19. # Open the file and read the lines in reverse; exit once time stamps are beyond our time threshold
  20. file = File.open(args[:apache_access_log], "r")
  21. log_array = []
  22. log_snapshot = file.readlines
  23. file.close
  24. start_time = Time.now.to_i
  25. log_snapshot.reverse_each do |line|
  26.   line_array = line.split(" ")
  27.   date_time = line_array[3][1..line_array[3].length-1]
  28.   date_time[11] = " "
  29.   date_time = Time.iso8601(DateTime.parse(date_time+" "+args[:log_time_zone]).to_s).to_i
  30.   if date_time > (start_time - args[:seconds].to_i)
  31.     log_array << [line_array[0], date_time]
  32.   else
  33.     break
  34.   end
  35. end
  36.  
  37. # Use a hash to collect the counts of the IPs
  38. log_hash = Hash.new(0)
  39. log_array.each do |log|
  40.   log_hash[log[0]]+=1
  41. end
  42.  
  43. # collect the offenders in an array
  44. offenders = log_hash.to_a.collect{|h| h if h[1] > args[:request_limit].to_i}.compact
  45.  
  46. # output the offending IPs, 1 per line; optionally show the offending count
  47. offenders.each{|o| puts o[0].to_s+"#{" => "+o[1].to_s if args[:show_count]}"}

Note: This makes the assumption that your logs are in the format: aa.bb.cc.dd - - [datetime]

Installing Gitweb on Fedora Linux and Apache

Friday, September 19th, 2008

My next natural step after getting my projects up and running with Git was to install a web interface. Gitweb was my choice because:

  • it’s available via yum with Fedora
  • it provides up-to-date diff information
  • it’s part of the overall Git package, so it’s tightly integrated

Installation was ultimately quite simple, but I found the install docs to be less than helpful for people like me who want immediate functionality and will get to the tweaks and details later.

Step 1: Install Gitweb

sudo yum install gitweb

This will install a few files at /var/www/git.  You shouldn’t need to do anything to them.

Step 2: Create /etc/gitweb.conf

You need a configuration file to tell Gitweb where to look for your project.  You can change this folder to wherever your project will be.

$ echo "\$projectroot = '/srv/git/';" > /etc/gitweb.conf

Step 3: Edit Apache Configuration File

This configuration file assumes you are running your site as a virtual host.

/etc/httpd/conf.d/git.conf
  1. <VirtualHost *:80>
  2.     DocumentRoot /var/www/git
  3.     ServerName git.yourproject.com
  4.      <Directory /var/www/git>
  5.           Allow from all
  6.           AllowOverride all
  7.           Order allow,deny
  8.           Options ExecCGI
  9.           <Files gitweb.cgi>
  10.                SetHandler cgi-script
  11.           </Files>
  12.      </Directory>
  13.      DirectoryIndex gitweb.cgi
  14.      SetEnv  GITWEB_CONFIG  /etc/gitweb.conf
  15. </VirtualHost>

Step 3: Tweak Your Repository’s Config File

Gitweb lists two key elements at the start of your project’s page: description and owner.  To have these display something appropriate, edit /srv/git/yourproject/.git/description:

My Awesome Project

… and add this to /srv/git/yourproject/.git/config:

[gitweb]
        owner = "Mark McBride"

Step 4: Restart Apache

That’s it.  Just restart Apache and you should find Gitweb running at the domain you’ve specified.

References: