Replacing –, ’, “, etc., with UTF-8 Characters in Ruby on Rails

Recently I upgraded some older Rails applications to Rails 3.1 and Ruby 1.9.2 (from 2.3 and 1.8.7 respectively). One post-upgrade issue was that text content had a lot of garbage showing up like –, ’, “, etc. For example, here’s an actual example from a comment in one of the applications:

One of my “things to do before I’m 50” is

This should read:

One of my “things to do before I’m 50” is

It turns out these are just special characters that were improperly encoded for utf-8. The fix is simple enough: loop through your content and replace where needed.

If your database is big, this could take a long time unless you disable callbacks. The script below highlights both how to replace the characters using Ruby and how to disable your Rails callbacks to make this script run in seconds instead of hours (depending on the complexity of your callbacks).

replacements = []
replacements << ['…', '…']           # elipsis
replacements << ['–', '–']           # long hyphen
replacements << ['’', '’']           # curly apostrophe
replacements << ['“', '“']           # curly open quote
replacements << [/â€[[:cntrl:]]/, '”'] # curly close quote
klasses = [Comment, Article]           # replace with relevant classes

klasses.each do |klass|
  klass.all.each do |obj|
    original = obj.body
    replacements.each{ |set| obj.body = obj.body.gsub(set[0], set[1]) }
    unless (original == obj.body)
      #### Remove or Customize ####
      # This should reflect your models' callbacks.  It should be safe
      # since we're just doing a simple find/replace.
      Comment.skip_callback(:save, :after,  :do_after_save_tasks ) if obj.is_a?(Comment)
      Article.skip_callback(:save, :before, :do_before_save_tasks) if obj.is_a?(Article)
      #### End Remove or Customize ####
      obj.save!
    end
  end
end

If you noticed, I used a regular expression for the curly close quote. This is because there is an invisible control character that is not easily copy/pasted into your code. Using [[:cntrl:]] is just an easier way to catch it.

Integer Compression in Ruby (Base-10 to Base-62)

A few days ago I was thinking about all those link shortening sites and wondered how easy it would be to compress a base-10 number like 1,234,567,890 to something much smaller like 1LY7VK. Here’s what I came up with:

class IntegerCompressor

  CompressionCharacterSet = %w(0 1 2 3 4 5 6 7 8 9
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m n o p q r s t u v w x y z)

  def self.to_base
    CompressionCharacterSet.length
  end

  def self.compress(number_to_convert)
    digits_needed = Math.log(number_to_convert, IntegerCompressor.to_base).floor + 1
    compressed_number_string = ''
    previous_remainder = number_to_convert
    (digits_needed-1).downto(0) do |power|
      r=previous_remainder.divmod(IntegerCompressor.to_base**power)
      compressed_number_string << CompressionCharacterSet[r[0]]
      previous_remainder = r[1]
    end
    compressed_number_string
  end

  def self.decompress(compressed_number)
    power = 0
    base_10_integer = 0
    compressed_number.to_s.reverse.each_char do |digit|
      base_10_integer += ((IntegerCompressor.to_base**power)*CompressionCharacterSet.index(digit))
      power+=1
    end
    base_10_integer
  end

end

It seems to work:

$ irb
>> require '/path/to/file/integer_compressor.rb'
=> true
>> IntegerCompressor.compress 1234567890
=> "1LY7VK"
>> IntegerCompressor.decompress "1LY7VK"
=> 1234567890

If anyone has a more elegant solution, I'd be curious to see it.

(Thanks to Tamara for the log refresher.)

3 Reasons Aperture and Picasa are a Great Photo Combo

I spent the evening hours of many days in October looking for a photo management and sharing solution.  I tried a lot of software, workflows, web applications, mobile apps, etc.  It was tedious and a little frustrating.  I was surprised at how incomplete most of the options were.  I had hoped for a one-app-does-it-all solution, but after a great deal of exploring, I’m quite happy with the Aperture + Picasa combination to meet my photo storing and sharing needs.

My Needs

Before I get into why I picked Aperture and Picasa, let me explain my specific needs:

  • I Need to Manage Professional Grade Photos – By no measure am I a pro photographer, but I have a nice camera (Canon T2i w/ 18-55mm and 55-250mm lenses).  I bought it in January and I’ve really enjoyed taking high quality photos.  I’m currently shopping around for nicer lenses and possibly a camera upgrade.  The bottom line is I have good equipment and it’s getting better.  I take between 100-1000 photos a month with each photo over 10 MB.  I’m a bit of a hoarder when it comes to my photos.  Aside from the obvious terrible pictures, I tend to keep them all.  I need a management application capable of handling thousands of large photos.
  • I Want to Share Some Photos with Lots of People – I’d like to have a workflow where I put the photos on my computer, review them, pick the ones I want to share, and share them.  The faster this process the better.  Also, after I post them, I’d like to be able to make tweaks/edits, add meta data, etc., locally (i.e., without needing an Internet connection) and then at some point have them sync up with little/no effort.  I don’t want the sync to override what others have done online, e.g., tagging, comments, etc., i.e., it should be a true sync rather than a re-upload.
  • I Have an Existing Backup Solution – I use Crashplan to backup my computer.  It’s awesome and I highly recommend it.  As such, I don’t need either my management or sharing applications to worry too much about backup.  What I do care about is the ability to quickly make local copies, e.g., copy everything to a separate drive.  I don’t consider this backup as it wouldn’t do me any good if my house burned down, but it’s good if a library gets corrupted, deleted, etc., simply because it’s faster.  Ultimately, I don’t need to have 100% of my photos backed up at full quality using the same service with which I share photos (but if it does that, then great!).

Continue reading

Symbolic Links to Image Folders in Rails 3.1

I recently tweaked the code for OmniNerd.com to upgrade it from a Rails 2.3 app to a Rails 3.1 app.  Online docs got me through everything but one item: symbolic links to folder of images (or assets in general).  On OmniNerd, users can upload images.  The site also auto-generates a lot of graphical content.  Our setup is like many, i.e., we deploy with Capistrano and use the shared folder to hold static user content that gets handed off between deployments.  We previously simply had a symlink setup such that:

/path/to/app/current/public/images/content --> /path/to/app/shared/user_uploaded

Continue reading

CrashPlan for Large, Distributed, Cheap, Off-Site Backup

In the early 90s, my friend’s father took me to EDS where he worked at the time.  I remember him saying, “this is one of the largest data centers in the world.  They have over 3 terabytes of data in there.”  In the homemade box tucked away quietly in my hall closet is a 6x1TB RAID with another 1TB disk for the OS.  Add in the media center and 3 laptops and I’ve got a lot of data just waiting to be lost with a disk failure, theft, or an accidental rm -rf.

What I Need in a Backup Solution

As I thought about my data, I came up with a few criteria before I started scouring the net for a solution.

  1. No constraints on backup size – The data I want to backup exceeds 2TB and is growing.  I’ve used cool apps like DropBox that have arbitrary upper limits like 100GB.  However, the coolest app though won’t do me any good if I can’t backup everything I need. (To be fair, backup is just one tiny element of what DropBox does.  I highly recommend that app for the other things it does, like sync.)
  2. Highly configurable – That 2TB I mentioned lives amongst tons of other stuff that I keep as sort of a cache, but wouldn’t miss it too much if it got deleted.  I need to be able to clearly specify what data I actually want backed up.  Moreover, I need a high degree of control about backup policies, security, etc.  I like solutions that make things simple, but in this case there also needs to be a way to get as complicated as I like.
  3. Distributed backups – Part of the reason I have that 6x1TB RAID array is for super-fast local backup.  Obviously that won’t do me any good if my house burns down, but if a laptop crashes is way easier to grab 500GB from a local machine than it is to pull it across the net.  I want to be able to backup to a service as well as many other computers that I specify both in my house and on the Internet.
  4. Smart, low profile application – Modern OS’s like Mac OS X keep a log of what files have changed.  I don’t want a dumb service that does things like that on its own and consumes my computers’ resources.  I need something that will run in the background and not make any noise.
  5. Accessibility – I need a service that runs on any platform, specifically Mac OS X and Linux.  Moreover, I need to be able to access my backups from the web.
  6. Cheap – I want to pay for storage, not bandwidth.  Less than $10/mo is my general rule of thumb.

There are other minor points, but those are the non-negotiable items.

Continue reading

5TB LVM Volume with an LSI 9265-8i RAID Controller

This article outlines how to get a 5TB LVM volume created with an LSI 9265-8i RAID controller.

Background

RAID Array

I’ve been running software RAID for a while. Specifically, I’ve got an ASUS P6T Deluxe V2 motherboard with 6 SATA ports. Up until now, I’ve had 1 SATA connected to a single 1 TB drive with the Fedora OS on it, one to a SATA DVD/Blu-ray drive, and the other 4 to a 4x1TB software RAID 5. This has worked great. When I started to max that out, I had a decision to make. It seems I could either:

  • Continue with the small array and just continue to increase the disk size.  This is easiest, but given that 4 disks in RAID 5 give you a 25% loss of storage space (i.e., 3 used, 1 for parity), you have to buy bigger disks and the biggest ones usually cost the most.
  • Make the 1-time investment to get an 8-port RAID card and grow the array with disks that are large, but not necessarily the largest out there.

I decided latter made more sense for me and went with the LSI 9265-8i based on various reviews.  My plan was to build a 6x1TB SATA array (5TB storage) with 2 available ports on which I could add 2 additional drives when/if needed.

Continue reading

Converting VHS to Digital Video (DV) With Canopus (Grass Valley) ADVC-300

Canopus / Grass Valley ADVC-300My parents have mountains of VHS tapes that are slowly degrading in various boxes, cabinets and shelves throughout their house.  For Christmas, I got them a Canopus (now Grass Valley) ADVC-300.  The results were pretty awesome.  Here’s what we did:

  1. Purchased the ADVC-300 online from Electronica Direct via Amazon.
  2. Went to Wal-Mart and got the cheapest VHS player they had.  I think this is important.  If you’ve got a ton of videos to convert, make sure you have a player that’s ready for the load.  If you dig out the old player from 1987, you might get poor results.
  3. Went to Best Buy and got a Firewire 400 to 800 converter cable.  The ADVC-300 comes with a 400 to mini-400 cable, but my parents have a newer iMac that has a Firewire 800 port so the extra cable was necessary.
  4. Plugged everything in: VHS player to ADVC-300 to iMac
  5. Opened iMovie, clicked import, pressed play on the VHS player

The results were very nice.  Obviously quality is determined by the tapes, but the process was generally hassle free.  The only pain point was that iMovie stops importing every time it reaches empty tape.  So if you’ve got several things on one tape and a few seconds between each video set, then iMovie will stop importing at the end of each and you’ll have to manually restart the import for the next set.  I’m guessing that if you used a “pro” application like Final Cut, this could probably be avoided.

If you’ve got a stockpile, now’s the time.  I found tapes with mold in the cassette and one tape broke during playback due to brittle plastic.  I’m glad we converted to DV because I’m not sure those tapes would last much longer and it would be a shame to lose 20+ years of video records.

Bottom line: the ADVC-300 is a solid purchase for anyone looking to convert VHS tapes to digital video.

Lunchpool Alpha Launched!

At work, people constantly order food.  The process is something like this: invite 15 people to a meeting, order food for 20 “just in case,” post a call-in number on the meeting, have 10 people actually attend in person with at least 2-3 not eating for various reasons.  Result: tons (literally) of wasted food per week.

Last week I launched a new web app called Lunchpool.  It’s purpose is to deal with the situation above.  If know where extra food is, you post it.  Your post is limited to people at your domain, so if my company’s email address ends in @example.com, then only people with the same email domain can see my post.  Throw in some building, floor, and food info with your post and alert people when new stuff is posted and the scavengers come out of the woodwork.

I’ll be curious to see how it goes over the next few weeks.  If you’ve got a company, school, organization, etc., with a unique domain and you’d like to be a part of the alpha test shoot me a note using the contact info on Lunchpool.

Install markItUp! in Ruby on Rails

If you’re torn between a WYSIWYG editor and something clean like Textile/Redcloth, then markItUp! will provide you with some nice middle ground. The tool is well documented, but there are a few “notes” for installing it in Rails that aren’t in the docs.

1. Initial setup

Since there’s overhead associated with markItUp!, I set a variable @content_submission to true so that my standard layout template can skip all of the extra javascript when it’s not necessary. To keep things simple, I’m going to follow the markItUp! hierarchy, which means it’s stylesheets will be in your javascripts path.

Continue reading

Ruby on Rails Diff Text to HTML <ins> and <del>

This code is perfect if you have 2 text objects in your Rails application and you want to compare their differences in one of your HTML views. It’s 99% pure Ruby too, so if you alter the first line, you can use it for other purposes.

Only one thing to note: you must have diff installed. I’m using: diff (GNU diffutils) 2.8.1.

#set up some variables to reference later
temporary_directory = File.join(Rails.root, "tmp")
max_lines = 9999999 #needs to be larger than the most lines you'll consider
diff_header_length = 3

# text_old and text_new should be the values of the string objects to compare
# these are just example strings to show it works
text_old      = "line1\ndeleted line2\nline3\n\nline4\nline5"
text_new      = "line1\ninserted line2\nline3\n\nline4\nline5"

# since we're using diff on the file system, we'll save the text we want to compare
# and then run diff against the two files
file_old_name = File.join(temporary_directory,"file_old"+rand(1000000).to_s)
file_new_name = File.join(temporary_directory,"file_new"+rand(1000000).to_s)
file_old      = File.new(file_old_name, "w+")
file_new      = File.new(file_new_name, "w+")
file_old.write(text_old+"\n")
file_new.write(text_new+"\n")
file_old.close
file_new.close

# diff will give provide a string showing insertions and deletions.  We will
# split this string out by newlines if there are difference, and mark it up
# accordingly with html
lines = %x(diff -­-­­­­­­unified=#{max_lines} #{file_old_name} #{file_new_name})
if lines.empty?
  lines = text_new.split(/\n/)
else
  lines = lines.split(/\n/)[diff_header_length..max_lines].
  collect do |i|
    if i.empty?
      ""
    else
      case i[0,1]
      when "+"; then "<ins>"+i[1..i.length-1]+"</ins>"
      when "-"; then "<del>"+i[1..i.length-1]+"</del>"
      else; i[1..i.length-1]
      end
    end
  end
end

#clean up the temporary diff files we created
File.delete(file_new_name)
File.delete(file_old_name)

#return marked up text
lines.join("\n")</pre>
If you fire up RAILS_ROOT/script/console and paste that code in, it will return a nicely marked up string like this:
<pre lang="html">line1
<del>deleted line2</del>
<ins>inserted line2</ins>
line3

line4
line5

Use CSS to make your ins and del tags render however you like.