Replacing –, ’, “, etc., with UTF-8 Characters in Ruby on Rails

Recently I upgraded some older Rails applications to Rails 3.1 and Ruby 1.9.2 (from 2.3 and 1.8.7 respectively). One post upgrade issue was that text content had a lot of garbage showing up like –, ’, “, etc. For example, here’s an actual example from a comment in one of the applications:

One of my “things to do before I’m 50” is

This should read:

One of my “things to do before I’m 50” is

It turns out these are just special characters that were improperly encoded for utf-8. The fix is simple enough: loop through your content and replace where needed.

If your database is big, this could take a long time unless you disable callbacks. The script below highlights both how to replace the characters using Ruby and how to disable your Rails callbacks to make this script run in seconds instead of hours (depending on the complexity of your callbacks).

replacements = []
replacements << ['…', '…']           # elipsis
replacements << ['–', '–']           # long hyphen
replacements << ['’', '’']           # curly apostrophe
replacements << ['“', '“']           # curly open quote
replacements << [/â€[[:cntrl:]]/, '”'] # curly close quote
klasses = [Comment, Article]           # replace with relevant classes

klasses.each do |klass|
  klass.all.each do |obj|
    original = obj.body
    replacements.each{ |set| obj.body = obj.body.gsub(set[0], set[1]) }
    unless (original == obj.body)
      #### Remove or Customize ####
      # This should reflect your models' callbacks.  It should be safe
      # since we're just doing a simple find/replace.
      Comment.skip_callback(:save, :after,  :do_after_save_tasks ) if obj.is_a?(Comment)
      Article.skip_callback(:save, :before, :do_before_save_tasks) if obj.is_a?(Article)
      #### End Remove or Customize ####
      obj.save!
    end
  end
end

If you noticed, I used a regular expression for the curly close quote. This is because there is an invisible control character that is not easily copy/pasted into your code. Using [[:cntrl:]] is just an easier way to catch it.

Integer Compression in Ruby (Base-10 to Base-62)

A few days ago I was thinking about all those link shortening sites and wondered how easy it would be to compress a base-10 number like 1,234,567,890 to something much smaller like 1LY7VK. Here’s what I came up with:

class IntegerCompressor

  CompressionCharacterSet = %w(0 1 2 3 4 5 6 7 8 9
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def self.to_base
    CompressionCharacterSet.length
  end

  def self.compress(number_to_convert)
    digits_needed = Math.log(number_to_convert, IntegerCompressor.to_base).floor + 1
    compressed_number_string = ''
    previous_remainder = number_to_convert
    (digits_needed-1).downto(0) do |power|
      r=previous_remainder.divmod(IntegerCompressor.to_base**power)
      compressed_number_string << CompressionCharacterSet[r[0]]
      previous_remainder = r[1]
    end
    compressed_number_string
  end

  def self.decompress(compressed_number)
    power = 0
    base_10_integer = 0
    compressed_number.to_s.reverse.each_char do |digit|
      base_10_integer += ((IntegerCompressor.to_base**power)*CompressionCharacterSet.index(digit))
      power+=1
    end
    base_10_integer
  end

end

It seems to work:

$ irb
>> require '/path/to/file/integer_compressor.rb'
=> true
>> IntegerCompressor.compress 1234567890
=> "1LY7VK"
>> IntegerCompressor.decompress "1LY7VK"
=> 1234567890

If anyone has a more elegant solution, I'd be curious to see it.

(Thanks to Tamara for the log refresher.)

Symbolic Links to Image Folders in Rails 3.1

I recently tweaked the code for OmniNerd.com to upgrade it from a Rails 2.3 app to a Rails 3.1 app.  Online docs got me through everything but one item: symbolic links to folder of images (or assets in general).  On OmniNerd, users can upload images.  The site also auto-generates a lot of graphical content.  Our setup is like many, i.e., we deploy with Capistrano and use the shared folder to hold static user content that gets handed off between deployments.  We previously simply had a symlink setup such that:

/path/to/app/current/public/images/content --> /path/to/app/shared/user_uploaded

Continue reading

Install markItUp! in Ruby on Rails

If you’re torn between a WYSIWYG editor and something clean like Textile/Redcloth, then markItUp! will provide you with some nice middle ground. The tool is well documented, but there are a few “notes” for installing it in Rails that aren’t in the docs.

1. Initial setup

Since there’s overhead associated with markItUp!, I set a variable @content_submission to true so that my standard layout template can skip all of the extra javascript when it’s not necessary. To keep things simple, I’m going to follow the markItUp! hierarchy, which means it’s stylesheets will be in your javascripts path.

Continue reading

Ruby on Rails Diff Text to HTML <ins> and <del>

This code is perfect if you have 2 text objects in your Rails application and you want to compare their differences in one of your HTML views. It’s 99% pure Ruby too, so if you alter the first line, you can use it for other purposes.

Only one thing to note: you must have diff installed. I’m using: diff (GNU diffutils) 2.8.1.

#set up some variables to reference later
temporary_directory = File.join(Rails.root, "tmp")
max_lines = 9999999 #needs to be larger than the most lines you'll consider
diff_header_length = 3

# text_old and text_new should be the values of the string objects to compare
# these are just example strings to show it works
text_old      = "line1\ndeleted line2\nline3\n\nline4\nline5"
text_new      = "line1\ninserted line2\nline3\n\nline4\nline5"

# since we're using diff on the file system, we'll save the text we want to compare
# and then run diff against the two files
file_old_name = File.join(temporary_directory,"file_old"+rand(1000000).to_s)
file_new_name = File.join(temporary_directory,"file_new"+rand(1000000).to_s)
file_old      = File.new(file_old_name, "w+")
file_new      = File.new(file_new_name, "w+")
file_old.write(text_old+"\n")
file_new.write(text_new+"\n")
file_old.close
file_new.close

# diff will give provide a string showing insertions and deletions.  We will
# split this string out by newlines if there are difference, and mark it up
# accordingly with html
lines = %x(diff -­-­­­­­­unified=#{max_lines} #{file_old_name} #{file_new_name})
if lines.empty?
  lines = text_new.split(/\n/)
else
  lines = lines.split(/\n/)[diff_header_length..max_lines].
  collect do |i|
    if i.empty?
      ""
    else
      case i[0,1]
      when "+"; then "<ins>"+i[1..i.length-1]+"</ins>"
      when "-"; then "<del>"+i[1..i.length-1]+"</del>"
      else; i[1..i.length-1]
      end
    end
  end
end

#clean up the temporary diff files we created
File.delete(file_new_name)
File.delete(file_old_name)

#return marked up text
lines.join("\n")</pre>
If you fire up RAILS_ROOT/script/console and paste that code in, it will return a nicely marked up string like this:
<pre lang="html">line1
<del>deleted line2</del>
<ins>inserted line2</ins>
line3

line4
line5

Use CSS to make your ins and del tags render however you like.

3 Reasons to Switch to Git from Subversion

Dozens of articles outline the detailed technical reasons Git is better than Subversion, but if you’re like me, you don’t necessarily care about minor speed differences, the elegance of back-end algorithms, or all of the hardcore features that you may only ever use once.  You want to see clear, major differences in your day-to-day interaction with software before you switch to something new.  After several weeks of trials, Git seems to offer major improvements over Subversion.  These are my reasons for jumping on the Git bandwagon.

Let’s start with a few assumptions for the scenarios we’ll walk through:

  • you’re one of many developers for a project
  • all changes going into production must first be peer-reviewed
  • you all use simple GUI text editors like TextMate or an equivalent
  • you have 4 features that you’re working that are due soon

Let’s get to work.

Continue reading

Skip All Rails Filters

It took me a while to figure this out, but it’s quite simple.  If you want to skip all of the filters a Rails controller will run, simply put the following at the top of your controller:

skip_filter filter_chain #both documented in the Rails API

For example, if your application controller defines a filter to check if a user is logged in, it makes sense that this filter might run for all controllers, except in rare cases.  In my case, I have a dynamic image controller that doesn’t require all of the overhead that most controllers do.  For that controller, I use the above to skip all of the filters.

Multiple Remote Git Branches With Different Local Names

I pounded my head against the wall for a bit when trying to play out this scenario in Git:

  • Remote repository has two branches: master and some-long-complex-name
  • Locally, I have cloned master
  • I have set up my config to refer to the remote master as “origin”
  • I have checked out some-long-complex-name using the following:
$ git checkout --track -b simple-name origin/some-long-complex-name

The key thing to note is that my local branch has a different name than the remote branch, i.e., “simple-name” is my local branch that’s tracking the remote branch “some-long-complex-name”.  I’ve used pseudo-branch-names in this example, but in practice I like my local branch names to be 3 characters or so such that they’re easy to type, and I like the remote branches to have long names such that there is no ambiguity about what they are.

My initial .git/config looked like this after the aforementioned checkout command:

[remote "origin"]
        url = ssh://myserv/srv/git/proj.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master
[branch "simple-name"]
        remote = origin
        merge = refs/heads/some-long-complex-name

Now, this isn’t too bad.  All pull related commands work, but push only works for the master branch.  What I wanted was git to push to “some-long-complex-name” whenever I ran “git push” from my local “simple-name” branch.

I changed .git/config to look like this:

[remote "origin"]
    url = ssh://myserv/srv/git/proj.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
[remote "simple_origin"]
        url = ssh://myserv/srv/git/proj.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        push = simple-name:some-long-complex-name
[branch "simple-name"]
        remote = simple_origin
        merge = refs/heads/some-long-complex-name

Note the additional “remote” section and the “push” reference.  Now, when I’m in my simple-name branch, I can just type “git push” and this branch will push out to the remote branch names “some-long-complex-name”.

Thanks to those who offered help on the freenode IRC #git channel, namely RandalSchwartz.

Installing Gitweb on Fedora Linux and Apache

My next natural step after getting my projects up and running with Git was to install a web interface. Gitweb was my choice because:

  • it’s available via yum with Fedora
  • it provides up-to-date diff information
  • it’s part of the overall Git package, so it’s tightly integrated

Installation was ultimately quite simple, but I found the install docs to be less than helpful for people like me who want immediate functionality and will get to the tweaks and details later.

Step 1: Install Gitweb

sudo yum install gitweb

This will install a few files at /var/www/git.  You shouldn’t need to do anything to them.

Step 2: Create /etc/gitweb.conf

You need a configuration file to tell Gitweb where to look for your project.  You can change this folder to wherever your project will be.

$ echo "\$projectroot = '/srv/git/';" > /etc/gitweb.conf

Step 3: Edit Apache Configuration File

This configuration file assumes you are running your site as a virtual host.

/etc/httpd/conf.d/git.conf

    DocumentRoot /var/www/git
    ServerName git.yourproject.com

          Allow from all
          AllowOverride all
          Order allow,deny
          Options ExecCGI

               SetHandler cgi-script

     DirectoryIndex gitweb.cgi
     SetEnv  GITWEB_CONFIG  /etc/gitweb.conf

Step 3: Tweak Your Repository’s Config File

Gitweb lists two key elements at the start of your project’s page: description and owner.  To have these display something appropriate, edit /srv/git/yourproject.git/description:

My Awesome Project

… and add this to /srv/git/yourproject/.git/config:

[gitweb]
        owner = "Mark McBride"

Step 4: Restart Apache

That’s it.  Just restart Apache and you should find Gitweb running at the domain you’ve specified.

References:

Migrating a Subversion (svn) Project and Server to Git

I’m sold on Git. The branching feature alone was reason enough for me to move from Subversion. However, the decision to move was the easy part. Migrating my projects, while not too painful, wasn’t trivial. I found that my knowledge of svn was actually a disadvantage as it made it easy to assume things about Git that simply weren’t true. This is ultimately how I went about my migration.

First, some assumptions:

  • My projects are small. By small I mean that everyone working on code has shell access to the servers involved. It’s not open sourced, public, or any of that cool stuff. If you need to do something for large scale access, consider GitHub or gitosis.
  • I have a semi-centralized need. None of the people on my projects are co-located. There is a definite need for a place to post code for review that everyone can access.
  • I have a functioning, in-production Subversion repository and the transition must be seamless.
  • My apps are Rails applications deployed using Capistrano.
  • All of my servers/clients are running Fedora 9
  • You understand the basics of Git.

That said, let’s dive in.

Step 1: Install Git

This is the easy step:

[local]$ sudo yum install git git-svn

Step 2: Convert your Central Subversion Repository to a Local Git Repository

This is key step to wrap your mind around conceptually. With svn you would first make a central repository and import something into it to get started. We’re about to do quite the opposite. Remember that in Git every instance is a repository. When we grab the contents of your existing project, we will be building the new repository locally, not on your server (that will come later).

Build a text file list of your existing authors

In order to maintain some continuity with your existing svn logs, we need to peg the svn user names of people who have committed to your svn repository to git user names. This is quite easy to do. Just create a text file with lines that look like this:

markmcb = Mark A. McBride <mark@markmcb.com>
example = Example Person <person@example.com>
... etc.

Save this file. I’ll refer to it later as svn-to-git-authors. (If you have a lot of authors, check out Josh’s script to automate the creation of this file.)

Clone your Subversion database to a Git repository

This next step is so nice. With one command and the help of the file we just created, we’ll create an almost ready to use git repository. The command is simple: git clone “what” “where”. Or something like:

[local]$ cd /path/of/your/liking
[local]$ git svn clone svn+ssh://yourserver.example.com/path/to/your/repository \
             ./myrepos.git --authors-file=svn-to-git-authors

Hit return and relax as the magic happens. Depending on the size of your repository, this could take some time.

Set some basic configuration options

There are hundreds of configuration options for Git, but I’m only going to touch on a few critical ones. Specifically, let’s tell our new Git repository who we are and set the stage for working with a remote, semi-centralized repository.

[user]
name = Mark A. McBride
email = mark@markmcb.com

The user settings are pretty straightforward. Just ensure they match what you had before in the authors file and it’ll be very easy for you to keep track of who has done what. In addition, you may see a section relating to svn. Once you no longer need to pull data from that repository (which, unless your repository is busy, is right now), you can delete this section.

From here you’re ready to get to work. You have a functional repository. However, if you plan to work with anyone other than yourself, you may need to interact with a public repository.

Step 3: Setup a Public Repository

The steps to establish a repository that you can access over ssh are pretty simple. Just ssh to the public server and (you may need to setup permissions to write depending on the folder do the following in):

[publicsrv]$ cd /srv/git
[publicsrv]$ mkdir publicproject.git
[publicsrv]$ cd publicproject.git
[publicsrv]$ git --bare init

That’s all you need to do on the server. The critical thing to note is that bare reference. This tells Git that there is no working copy, i.e., the files you are coding. All this repository will track are the changes and not actually store the files (though anyone can clone this repository and get the files).

Point Your Repository to the Public One

Back on your local machine, you just need to run one command to make your repository aware of this newly created public version:

[local]$ git remote add origin ssh://publicsrv.example.com/srv/git/publicproject.git

Now you have a remote repository named origin from which your local repository can fetch all of its data from. Look in you .git/config file for details. The last step is simply to push the files you have in your local copy to the server.

[local]$ git push origin master

After running this, anyone on your team with an ssh account can clone the repository with:

[local]$ git clone ssh://publicsrv.example.com/srv/git/publicproject.git

If you’re lazy like me, and just want to be able to type git push/pull instead of typing out the public server’s name each time, add the following to your .git/config file:

[branch "master"]
        remote = origin
        merge = refs/heads/master

And with that, you’re done with the repository migration.

Step 4: Final Rails Tweaks

Your Git work is done.  These last items are final notes to make your new repository play nice with your Rails app.

Tell Capistrano about Git

The very last thing you have to do is tell Capistrano to pull your Rails app out of a Git repository during deployment rather than from Subversion.  This is quite simple.  In your deploy.rb file, add this line:

set :scm, :git

Also, be sure to set your repository URL to the new location.

Ignore logs and temporary files

You may need to create some of the directories depending on how your svn repository was set up.  Insert empty .gitignore files in them to ensure Git doesn’t ignore them.

[local]$ mkdir tmp
[local]$ mkdir log
[local]$ mkdir vendor
[local]$ touch tmp/.gitignore log/.gitignore vendor/.gitignore

Add the following to .gitignore in your root folder to ignore standard Rails files that you don’t want in your repository:

.DS_Store
log/*.log
tmp/**/*
config/database.yml
db/*.sqlite3

That’s it.  Your Rails app is now ready to modify and deploy from Git.

References: