2 Comments

Integer Compression in Ruby (Base-10 to Base-62)

A few days ago I was thinking about all those link shortening sites and wondered how easy it would be to compress a base-10 number like 1,234,567,890 to something much smaller like 1LY7VK. Here’s what I came up with:

class IntegerCompressor
  
  CompressionCharacterSet = %w(0 1 2 3 4 5 6 7 8 9
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m n o p q r s t u v w x y z)
  
  def self.to_base
    CompressionCharacterSet.length
  end

  def self.compress(number_to_convert)
    digits_needed = Math.log(number_to_convert, IntegerCompressor.to_base).floor + 1
    compressed_number_string = ''
    previous_remainder = number_to_convert
    (digits_needed-1).downto(0) do |power|
      r=previous_remainder.divmod(IntegerCompressor.to_base**power)
      compressed_number_string << CompressionCharacterSet[r[0]]
      previous_remainder = r[1]
    end
    compressed_number_string
  end
  
  def self.decompress(compressed_number)
    power = 0
    base_10_integer = 0
    compressed_number.to_s.reverse.each_char do |digit|
      base_10_integer += ((IntegerCompressor.to_base**power)*CompressionCharacterSet.index(digit))
      power+=1
    end
    base_10_integer
  end
  
end

It seems to work:

$ irb
>> require '/path/to/file/integer_compressor.rb'
=> true
>> IntegerCompressor.compress 1234567890
=> "1LY7VK"
>> IntegerCompressor.decompress "1LY7VK"
=> 1234567890

If anyone has a more elegant solution, I'd be curious to see it.

(Thanks to Tamara for the log refresher.)

2 Comments

  1. I was looking for a way to compress integers and came across your blog. I later found a solution I like better.

    This changes the base from 10 to 32 and back. I’m not sure if there are any benefits to base 64, like you are using.

    Here is the solution I went with.

    # compress
    1234567890.to_s(36)
    => “kf12oi”

    ” decompress
    “kf12oi”.to_i(36)
    => 1234567890

    1. Sorry for the delayed reply. You’re right, what you showed is much simpler. The only advantage to my solution is increased compression. For example 123456 compresses to 2n9c in base-36, but it’s W7E in the base-62 solution I presented. In other words, if you were using it for a link shortener, you’d have more short links to issue out.

Comments are closed.