Custom Front Page

This is the digital hub of Mark McBride.
Contact me via email.
Follow my thoughts and photos at Google+.
Or, read my blog posts below.

Recent Posts

Replacing –, ’, “, etc., with UTF-8 Characters in Ruby on Rails

Recently I upgraded some older Rails applications to Rails 3.1 and Ruby 1.9.2 (from 2.3 and 1.8.7 respectively). One post-upgrade issue was that text content had a lot of garbage showing up like –, ’, “, etc. For example, here’s an actual example from a comment in one of the applications:

One of my “things to do before I’m 50” is

This should read:

One of my “things to do before I’m 50” is

It turns out these are just special characters that were improperly encoded for utf-8. The fix is simple enough: loop through your content and replace where needed.

If your database is big, this could take a long time unless you disable callbacks. The script below highlights both how to replace the characters using Ruby and how to disable your Rails callbacks to make this script run in seconds instead of hours (depending on the complexity of your callbacks).

replacements = []
replacements << ['…', '…']           # elipsis
replacements << ['–', '–']           # long hyphen
replacements << ['’', '’']           # curly apostrophe
replacements << ['“', '“']           # curly open quote
replacements << [/â€[[:cntrl:]]/, '”'] # curly close quote
klasses = [Comment, Article]           # replace with relevant classes

klasses.each do |klass|
  klass.all.each do |obj|
    original = obj.body
    replacements.each{ |set| obj.body = obj.body.gsub(set[0], set[1]) }
    unless (original == obj.body)
      #### Remove or Customize ####
      # This should reflect your models' callbacks.  It should be safe
      # since we're just doing a simple find/replace.
      Comment.skip_callback(:save, :after,  :do_after_save_tasks ) if obj.is_a?(Comment)
      Article.skip_callback(:save, :before, :do_before_save_tasks) if obj.is_a?(Article)
      #### End Remove or Customize ####
      obj.save!
    end
  end
end

If you noticed, I used a regular expression for the curly close quote. This is because there is an invisible control character that is not easily copy/pasted into your code. Using [[:cntrl:]] is just an easier way to catch it.

  1. Integer Compression in Ruby (Base-10 to Base-62) 1 Reply
  2. 3 Reasons Aperture and Picasa are a Great Photo Combo 14 Replies
  3. Symbolic Links to Image Folders in Rails 3.1 Leave a reply
  4. CrashPlan for Large, Distributed, Cheap, Off-Site Backup 3 Replies
  5. 5TB LVM Volume with an LSI 9265-8i RAID Controller Leave a reply
  6. Converting VHS to Digital Video (DV) With Canopus (Grass Valley) ADVC-300 Leave a reply
  7. Lunchpool Alpha Launched! 1 Reply
  8. Install markItUp! in Ruby on Rails 3 Replies
  9. Ruby on Rails Diff Text to HTML <ins> and <del> 8 Replies