Archive for the ‘Server Administration’ Category

Ruby Script to Search Apache Logs for High Frequency Clients

Wednesday, October 29th, 2008

I wrote a quick Ruby script to scour through my Apache access logs and look for IPs that are hitting my site too frequently, e.g., bad bots, etc. The command line arguments are simple:

$ ruby find-frequent-clients.rb \
--apache-access-log=/path/to/your/log \
--seconds=3600 \
--request-limit=7200 \
--log-time-zone=PST

That command is going to find any client IPs that are hitting my web server in the last 10 minutes more twice or more per second. The output will be a line separated list of IP addressess (optionally with a hit count if --show-count=1 is added). Here’s how it works:

File: find-frequent-clients.rb
  1. require 'date'
  2. require 'time'
  3. # Process command line arguments.  Filter only args starting with –
  4. args = {}
  5. $*.each do |arg|
  6.   spl=arg.split("=")
  7.   if spl[0][0..1] == "–"
  8.     args[spl[0][2..spl[0].length-1].gsub("-","_").intern]=spl[1]
  9.   end
  10. end
  11.  
  12. # Check that we have the bare essentials to proceed
  13. raise "You must specify the full path to an Apache access log file with –apache-access-log" unless args[:apache_access_log]
  14. raise "You must specify the maximum amount of recent seconds to consider with –seconds" unless args[:seconds]
  15. raise "You must specify the maximum requests allowed per #{args[:seconds]} seconds with –request-limit" unless args[:request_limit]
  16. raise "You must specify the time zone of the Apache logs with –log-time-zone e.g., EST" unless args[:log_time_zone]
  17. raise "The Apache access log file specified does not exist or is not readable: #{args[:apache_access_log]}" unless FileTest.readable?(args[:apache_access_log])
  18.  
  19. # Open the file and read the lines in reverse; exit once time stamps are beyond our time threshold
  20. file = File.open(args[:apache_access_log], "r")
  21. log_array = []
  22. log_snapshot = file.readlines
  23. file.close
  24. start_time = Time.now.to_i
  25. log_snapshot.reverse_each do |line|
  26.   line_array = line.split(" ")
  27.   date_time = line_array[3][1..line_array[3].length-1]
  28.   date_time[11] = " "
  29.   date_time = Time.iso8601(DateTime.parse(date_time+" "+args[:log_time_zone]).to_s).to_i
  30.   if date_time > (start_time - args[:seconds].to_i)
  31.     log_array << [line_array[0], date_time]
  32.   else
  33.     break
  34.   end
  35. end
  36.  
  37. # Use a hash to collect the counts of the IPs
  38. log_hash = Hash.new(0)
  39. log_array.each do |log|
  40.   log_hash[log[0]]+=1
  41. end
  42.  
  43. # collect the offenders in an array
  44. offenders = log_hash.to_a.collect{|h| h if h[1] > args[:request_limit].to_i}.compact
  45.  
  46. # output the offending IPs, 1 per line; optionally show the offending count
  47. offenders.each{|o| puts o[0].to_s+"#{" => "+o[1].to_s if args[:show_count]}"}

Note: This makes the assumption that your logs are in the format: aa.bb.cc.dd - - [datetime]

3 Reasons to Switch to Git from Subversion

Saturday, October 18th, 2008

Dozens of articles outline the detailed technical reasons Git is better than Subversion, but if you’re like me, you don’t necessarily care about minor speed differences, the elegance of back-end algorithms, or all of the hardcore features that you may only ever use once.  You want to see clear, major differences in your day-to-day interaction with software before you switch to something new.  After several weeks of trials, Git seems to offer major improvements over Subversion.  These are my reasons for jumping on the Git bandwagon.

Let’s start with a few assumptions for the scenarios we’ll walk through:

  • you’re one of many developers for a project
  • all changes going into production must first be peer-reviewed
  • you all use simple GUI text editors like TextMate or an equivalent
  • you have 4 features that you’re working that are due soon

Let’s get to work.

Endless, Easy, Non-File-System-Based, Local Branches

You’d like to work on each of your 4 features A, B, C, and D independently and somewhat in parallel, though B looks like a quick win.  Let’s compare the branching features offered by both Git and Subversion side-by-side as we get going:

Task Git Subversion
1. Get a copy of the project on your local machine. git clone /srv/repos /local/copy svn checkout /srv/repos /local/copy
2. Create branches A-D to represent the features you’re working on. git checkout -b A
git checkout -b B
git checkout -b C
git checkout -b D
svn copy /srv/repos/trunk /srv/repos/branches/A; svn checkout /srv/repos/branches/A /local/copy/branches/A

svn copy /srv/repos/trunk /srv/path/repos/branches/B; svn checkout /srv/repos/branches/B /local/copy/branches/B

svn copy /srv/repos/trunk /srv/path/repos/branches/C; svn checkout /srv/repos/branches/C /local/copy/branches/C

svn copy /srv/repos/trunk /srv/path/repos/branches/D; svn checkout /srv/repos/branches/D /local/copy/branches/D

3. Feature B is very simple and you want to knock it out and get it into production ASAP. git checkout B
[work in editor]
git commit -a

[peer review]
git format-patch
git send-email [options will vary]
[peer gives you "thumbs up"]

git checkout master
git merge B
git push

[open text editor for branch B]
[work in editor]
svn commit

[peer review]
[send email to peer with branch name]
[peer checks out your branch locally to review]
[peer gives you "thumbs up"]

cd /local/copy/trunk
svn merge /local/copy/branches/B .
svn commit

4. Get rid of unnecessary branch B. git branch -d B svn delete /srv/repos/branches/B
svn update

Note the key advantages Git offered in each step:

  1. Git creates a full repository with this command.  With Subversion, you’re just checking out the files in the repository.
  2. With each branch, no new files are created in the project file hierarchy on your system.  Since you have a full local repository, Git creates the files you need on the fly by processing the recorded changes.  With Subversion, you have to create every branch remotely on the server.  This can get messy depending on the size of your team.  If you decide to control branching to keep things clean, you forfeit the power branching offers.
  3. With Git, we only push our work to the server AFTER collaboration (more below).  With Subversion, it all hits the server.
  4. Again, no file system work.  Since we’re using a local repository, we let Git handle the details of removing the branch.  With Subversion, you still have the old copy until you update.  You either have to clean up manually, or “update” to clean up local and remote copies.

In addition, try to do this scenario on your laptop while not connected to the Internet.  With Git, no issues, the repository is local; however, with Subversion, you’re out of luck.  Your new branches will have to wait. The advantages of Git for branching are clear in this simple branching scenario.  Let’s continue to look at our scenario with non-trivial features A, C, and D that we’re working on.

Stashing Temporary Work

You start working on A and you’re about 100 lines of code into it when you get stumped on a math function.  The math wiz on your team is out for the day and you’d rather not continue until you consult him.  You’ve got some ideas for C, so you decide shift gears and get started.

Task Git Subversion
1. Switch to branch A, write 100 lines of code git checkout A [open text editor for branch A]
2. Switch to branch C  while waiting on a co-worker’s advise for A git stash
git checkout C
[close text editor for branch A]
[open text editor for branch C]
3. Work on C for a while, get advise from co-worker and resume work on A git stash
git checkout A
git stash list
git stash apply [stash name]
[close text editor for branch C]
[open text editor for branch A]

At a glance, you might get the impression that Subversion is simpler, and you’re probably right.  However, this is one case where simple may not be what you’re looking for.  Let’s look at each step:

  1. The key thing to note in this and every step is how we switch between branches.  In Git, the repository handles this. With Subversion, you’re literally just working on a separate set of files.  Ultimately, it’s up to you to manage retrieving and editing these files.  If you’re using TextMate, you’ll probably save a TextMate project file every time you branch simply to give you quick access to the branch.  If you branch a lot, this quickly becomes annoying, time consuming, and non-productive.  With Git, when you checkout a separate branch, it “magically” changes all of the files on your file system for you.  That means 1 project file is all you ever need.  Git handles the rest.
  2. Git will “float” uncommitted changes.  This means that if simply did a “git checkout C,” you’d bring with you all of the uncommitted work you did for A.  However, you don’t want to commit A because it’s not in a good working state.  Instead, you “stash” your work.  Stash is like a work in process commit.  Using it will tuck away your WIP changes without a formal commit, which allows you to change to C without “floating” any of your A changes.  The Subversion method is simpler, but you could potentially end up with several half-baked branches and no record of when you abandoned them.  Git’s stash allows you to list all stashes, and even write a message when you stash.  It is far more powerful in this scenario.
  3. Same as 2, but shows the “git stash list” feature.

So now we can work smoothly between multiple branches without worry of the consequences of interruption.  Git thus far has shown immense strength in two key areas, but let’s revisit collaboration to seal the deal.

Collaboration Before Public Commits

It’s now weeks later and you’re working on D.  After some round table discussions with the team, you all agree that D may not be the best approach.  A co-worker starts working on his own branch E and a few days later wants to review it with you.

Task Git Subversion
1. Review co-workers suggested changes in his branch E [check your email for patch]
[review patch]
svn checkout /srv/repos/branches/E /local/copy/branches/E
[open text editor for branch E]
svn log [to find changes]
svn diff [to view changes]
[review branch]
2. Agree that E is better and destroy your branch D git branch -D D svn delete /srv/repos/branches/D
svn update

Git offers power by putting collaboration up front before commits are public for all to see.  Consider in each step:

  1. Git has a nice feature to create “patches.”  They are simply changes to code, very similar to a diff.  The idea is that you create patches from commits you’ve only made on your local copy of the repository.  When your co-worker sent you the patch for E, no one else on the team had to see his commit logs, branches, etc., in the public repository because they never existed there.  You are collaborating about E via emailed patches.  With Subversion, it’s all on the server, all the time.
  2. With Git, deleting an abandoned branch is simple and clean.  The work done in D will never be seen by the public, i.e., your team.  You’ve spared the team clutter both in the logs and on their file systems.  With Subversion, the clean up is on you.  Should you forget to delete D, it’s has the potential to get used and that could be bad.  Conversely, someone else may have quietly checked out D and been working on something.  When you delete it from the public repository, their commit will surely fail.

Conclusion

There are literally hundreds of features for both Git and Subversion.  While you may have detailed reasons to choose one over the other, I think these 3 high level reasons are strongly convincing in favor of Git.  If you have differing opinions, I’d love to hear them.

Multiple Remote Git Branches With Different Local Names

Sunday, September 21st, 2008

I pounded my head against the wall for a bit when trying to play out this scenario in Git:

  • Remote repository has two branches: master and some-long-complex-name
  • Locally, I have cloned master
  • I have set up my config to refer to the remote master as “origin”
  • I have checked out some-long-complex-name using the following:
$ git checkout --track -b simple-name origin/some-long-complex-name

The key thing to note is that my local branch has a different name than the remote branch, i.e., “simple-name” is my local branch that’s tracking the remote branch “some-long-complex-name”.  I’ve used pseudo-branch-names in this example, but in practice I like my local branch names to be 3 characters or so such that they’re easy to type, and I like the remote branches to have long names such that there is no ambiguity about what they are.

My initial .git/config looked like this after the aforementioned checkout command:

[remote "origin"]
        url = ssh://myserv/srv/git/proj.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master
[branch "simple-name"]
        remote = origin
        merge = refs/heads/some-long-complex-name

Now, this isn’t too bad.  All pull related commands work, but push only works for the master branch.  What I wanted was git to push to “some-long-complex-name” whenever I ran “git push” from my local “simple-name” branch.

I changed .git/config to look like this:

[remote "origin"]
    url = ssh://myserv/srv/git/proj.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
[remote "simple_origin"]
        url = ssh://myserv/srv/git/proj.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        push = simple-name:some-long-complex-name
[branch "simple-name"]
        remote = simple_origin
        merge = refs/heads/some-long-complex-name

Note the additional “remote” section and the “push” reference.  Now, when I’m in my simple-name branch, I can just type “git push” and this branch will push out to the remote branch names “some-long-complex-name”.

Thanks to those who offered help on the freenode IRC #git channel, namely RandalSchwartz.

Installing Gitweb on Fedora Linux and Apache

Friday, September 19th, 2008

My next natural step after getting my projects up and running with Git was to install a web interface. Gitweb was my choice because:

  • it’s available via yum with Fedora
  • it provides up-to-date diff information
  • it’s part of the overall Git package, so it’s tightly integrated

Installation was ultimately quite simple, but I found the install docs to be less than helpful for people like me who want immediate functionality and will get to the tweaks and details later.

Step 1: Install Gitweb

sudo yum install gitweb

This will install a few files at /var/www/git.  You shouldn’t need to do anything to them.

Step 2: Create /etc/gitweb.conf

You need a configuration file to tell Gitweb where to look for your project.  You can change this folder to wherever your project will be.

$ echo "\$projectroot = '/srv/git/';" > /etc/gitweb.conf

Step 3: Edit Apache Configuration File

This configuration file assumes you are running your site as a virtual host.

/etc/httpd/conf.d/git.conf
  1. <VirtualHost *:80>
  2.     DocumentRoot /var/www/git
  3.     ServerName git.yourproject.com
  4.      <Directory /var/www/git>
  5.           Allow from all
  6.           AllowOverride all
  7.           Order allow,deny
  8.           Options ExecCGI
  9.           <Files gitweb.cgi>
  10.                SetHandler cgi-script
  11.           </Files>
  12.      </Directory>
  13.      DirectoryIndex gitweb.cgi
  14.      SetEnv  GITWEB_CONFIG  /etc/gitweb.conf
  15. </VirtualHost>

Step 3: Tweak Your Repository’s Config File

Gitweb lists two key elements at the start of your project’s page: description and owner.  To have these display something appropriate, edit /srv/git/yourproject/.git/description:

My Awesome Project

… and add this to /srv/git/yourproject/.git/config:

[gitweb]
        owner = "Mark McBride"

Step 4: Restart Apache

That’s it.  Just restart Apache and you should find Gitweb running at the domain you’ve specified.

References:

Migrating a Subversion (svn) Project and Server to Git

Wednesday, September 17th, 2008

I’m sold on Git. The branching feature alone was reason enough for me to move from Subversion. However, the decision to move was the easy part. Migrating my projects, while not too painful, wasn’t trivial. I found that my knowledge of svn was actually a disadvantage as it made it easy to assume things about Git that simply weren’t true. This is ultimately how I went about my migration.

First, some assumptions:

  • My projects are small. By small I mean that everyone working on code has shell access to the servers involved. It’s not open sourced, public, or any of that cool stuff. If you need to do something for large scale access, consider GitHub or gitosis.
  • I have a semi-centralized need. None of the people on my projects are co-located. There is a definite need for a place to post code for review that everyone can access.
  • I have a functioning, in-production Subversion repository and the transition must be seamless.
  • My apps are Rails applications deployed using Capistrano.
  • All of my servers/clients are running Fedora 9
  • You understand the basics of Git.

That said, let’s dive in.

Step 1: Install Git

This is the easy step:

[local]$ sudo yum install git git-svn

Step 2: Convert your Central Subversion Repository to a Local Git Repository

This is key step to wrap your mind around conceptually. With svn you would first make a central repository and import something into it to get started. We’re about to do quite the opposite. Remember that in Git every instance is a repository. When we grab the contents of your existing project, we will be building the new repository locally, not on your server (that will come later).

Build a text file list of your existing authors

In order to maintain some continuity with your existing svn logs, we need to peg the svn user names of people who have committed to your svn repository to git user names. This is quite easy to do. Just create a text file with lines that look like this:

markmcb = Mark A. McBride <mark@markmcb.com>
example = Example Person <person@example.com>
... etc.

Save this file. I’ll refer to it later as svn-to-git-authors. (If you have a lot of authors, check out Josh’s script to automate the creation of this file.)

Clone your Subversion database to a Git repository

This next step is so nice. With one command and the help of the file we just created, we’ll create an almost ready to use git repository. The command is simple: git clone “what” “where”. Or something like:

[local]$ cd /path/of/your/liking
[local]$ git svn clone svn+ssh://yourserver.example.com/path/to/your/repository \
             ./myrepos.git --authors-file=svn-to-git-authors

Hit return and relax as the magic happens. Depending on the size of your repository, this could take some time.

Set some basic configuration options

There are hundreds of configuration options for Git, but I’m only going to touch on a few critical ones. Specifically, let’s tell our new Git repository who we are and set the stage for working with a remote, semi-centralized repository.

[user]
name = Mark A. McBride
email = mark@markmcb.com

The user settings are pretty straightforward. Just ensure they match what you had before in the authors file and it’ll be very easy for you to keep track of who has done what. In addition, you may see a section relating to svn. Once you no longer need to pull data from that repository (which, unless your repository is busy, is right now), you can delete this section.

From here you’re ready to get to work. You have a functional repository. However, if you plan to work with anyone other than yourself, you may need to interact with a public repository.

Step 3: Setup a Public Repository

The steps to establish a repository that you can access over ssh are pretty simple. Just ssh to the public server and (you may need to setup permissions to write depending on the folder do the following in):

[publicsrv]$ cd /srv/git
[publicsrv]$ mkdir publicproject.git
[publicsrv]$ cd publicproject.git
[publicsrv]$ git --bare init

That’s all you need to do on the server. The critical thing to note is that bare reference. This tells Git that there is no working copy, i.e., the files you are coding. All this repository will track are the changes and not actually store the files (though anyone can clone this repository and get the files).

Point Your Repository to the Public One

Back on your local machine, you just need to run one command to make your repository aware of this newly created public version:

[local]$ git remote add origin ssh://publicsrv.example.com/srv/git/publicproject.git

Now you have a remote repository named origin from which your local repository can fetch all of its data from. Look in you .git/config file for details. The last step is simply to push the files you have in your local copy to the server.

[local]$ git push origin master

After running this, anyone on your team with an ssh account can clone the repository with:

[local]$ git clone ssh://publicsrv.example.com/srv/git/publicproject.git

If you’re lazy like me, and just want to be able to type git push/pull instead of typing out the public server’s name each time, add the following to your .git/config file:

[branch "master"]
        remote = origin
        merge = refs/heads/master

And with that, you’re done with the repository migration.

Step 4: Final Rails Tweaks

Your Git work is done.  These last items are final notes to make your new repository play nice with your Rails app.

Tell Capistrano about Git

The very last thing you have to do is tell Capistrano to pull your Rails app out of a Git repository during deployment rather than from Subversion.  This is quite simple.  In your deploy.rb file, add this line:

set :scm, :git

Also, be sure to set your repository URL to the new location.

Ignore logs and temporary files

You may need to create some of the directories depending on how your svn repository was set up.  Insert empty .gitignore files in them to ensure Git doesn’t ignore them.

[local]$ mkdir tmp
[local]$ mkdir log
[local]$ mkdir vendor
[local]$ touch tmp/.gitignore log/.gitignore vendor/.gitignore

Add the following to .gitignore in your root folder to ignore standard Rails files that you don’t want in your repository:

.DS_Store
log/*.log
tmp/**/*
config/database.yml
db/*.sqlite3

That’s it.  Your Rails app is now ready to modify and deploy from Git.

References:

Linux Server Load With Top

Tuesday, March 25th, 2008

I always find it interesting when I believe something to be true for years and then one day someone says, “that’s incorrect.” Today was one of those days.

In Unix, Linux, OS X, or most any good operating system, you’ll find a standard command called top. It’s a simple programs that, according to the manual, “provides a dynamic real-time view of a running system.” At the top of the top (heh) output is a header that looks like:

top - 15:43:10 up 14 days, 15:25,  4 users,  load average: 0.27, 0.12, 0.10
Tasks: 106 total,   2 running, 104 sleeping,   0 stopped,   0 zombie

Note the top right load average numbers. I had always thought that those numbers represented the percent load on the processor averaged over the last minute, 5 minutes, and 15 minutes respectively. Well, I was right about the time intervals, but wrong about the information. The load actually reflects system load such that a value of 1 means the processor on average had 1 process waiting to run, i.e., was loaded 100%. So, in the values above, the CPU was only 27% busy over the last minute, 12% in the last five, and 10% in the last 15. Anything over 1 means that there is an overload and a queue of pending processes.

This would explain why I saw values of 40 when OmniNerd was last Slashdotted. I thought the server was doing well with a 40% load. In reality it had a 4000% load and was dropping requests left and right.

Oh well. Live and learn.