Short, Long and Pretty Urls

Urls, Uniform Resource Locator, are fundamental to the web. These are the addresses that are used for finding web pages or, more specifically, ‘resources’ as they don’t actually need to be web pages, they can be anything from images, files or even raw data. In most modern websites, urls have come a long way from the days when they were illegible and gave away the technology being used eg ‘/pages/show.php?page=15&tag=ruby’.

Modern urls are usually short, readable and descriptive such as ‘/pages/tagged/with/ruby’. There has also been big increase in the use of short urls such as those used by Bit.ly and the like. Another type of url that is often overlooked is the long url. These are used to avoid the url being memorable or easy to reproduce. For example if all the pages of a site are public facing and you only want people to have access to urls they have been specifically sent.

For the purposes of demonstrating these three different url schemes, I’m going to use a very simple Sinatra & DataMapper app that creates notes.

You can see a demo of this app here: Pretty Short and Long
The source code is available on GitHub.

The Note Model

Each note has a title and some content as can be seen in the code for the model:

class Note
  include DataMapper::Resource
  property :id, Serial
  property :title, String, :required => true
  property :content, Text
end

Pretty Urls

A ‘pretty’ url can be thought of as one that human-readable and descriptive. There are some small SEO benefits in using them, but the main reason is that they give your urls a much more professional look and make them more memorable.

It’s easy to create a pretty url for each note by adding an extra property to the model called ‘pretty’ and creating it by default based on the title entered.

property :pretty, String, default: -> r,p { r.make_pretty }

This uses a proc object to call the following make_pretty method on the newly created note object and then save it to the database:

def make_pretty
  title.downcase.gsub(/W/,'-').squeeze('-').chomp('-')
end

This method takes the string used for the title and then chains 4 string methods together in order to create the pretty url. Here is a breakdown of what each method does to the example title ‘Ouch! That really Hurt!’

downcase: Changes all the letters to lowercase – ‘ouch! that really hurt!’
gsub(/W/,’-’): replaces all characters that are not letters or numbers with a hyphen – ‘ouch–that-really-hurt-’
squeeze(‘-’): replaces any repeated hyphens with a single hyphen – ‘ouch-that-hurt-’
chomp(‘-’) : Removes any hyphens from the end that can look messy – ‘ouch-that-hurt’

Because this was saved as a property of the Note class, the database can be queried to find notes based on this property using the first method:

get '/pretty/:url' do
  @note = Note.first(:pretty => params[:url])
  slim :show
end

Long Urls

A long url for each note can be easily created by hashing some values that are unique to the note. An extra property is needed called ‘long’ which will use a proc to call the make_long method and then save the resulting string to the database:

  property :long, String, default: -> r,p { r.make_long }

This uses the Digest library to hash a string created from concatenating the time the note was created to the note’s title and id.

def make_long
  Digest::SHA1.hexdigest(Time.now.to_s + self.title + self.id.to_s)
end

The id is used to ensure that this string will be unique and the timestamp will make it difficult to make random guesses. I’ve chosen to use the SHA1 library, which creates a 40-character string, but there are others such as MD5, SHA2 and BCRYPT.

Notes using long urls can be found in much the same way as the pretty urls:

get '/long/:url' do
  @note = Note.first(:long => params[:url])
  slim :show
end

Short Urls

The easiest way to create a short url for each note would be to simply use the note’s id property as the url (eg ‘/3’ would be the url for the note with an id of 3). Unfortunately there are at least two disadvantages to this approach: Firstly, as the number of notes grows, the length of the url will also grow – once you go above a million notes, the urls will become 7 or more digits long. Secondly, if you are simply using an auto-incrementing id as the url, then a user may be tempted to try changing the value in the hope of finding another note that is not meant for them (assuming there is no password protection in place). For example if somebody has sent me a link to a note with the url of ‘/17’ then I may be tempted to also look at the urls ‘/15’ and ‘/16’.

The first problem can be solved by a change of base. By changing the id into a base 36 number, you will significantly reduce the number of digits required. Base 36 numbers use all the digits 0-9 and all the letters a-z (lowercase only) to represent numbers. For example, the number 1000000 in base 36 is lfls. Ruby has a neat built in method for changing the base of a number – you just have to add the base you want to convert to as an argument to the to_s Integer method eg 1000000.to_s(36) => “lfls” As you can see, a string is returned. To change back, use string’s to_i method with the same argument eg ”lfls”.to_i(36) => 1000000. As you can see this has reduced a 7-digit number into a 4-character string.

We still haven’t solved the second problem though – people trying to guess other urls. For example if the url ‘/lfls’ point to the 1 millionth note, then I could easily find the next note by typing ‘lflt’ which is the base 36 representation of 1000001. To disguise these short urls we first need to create a random 1-digit number that will be stored in the database as a salt:

property :salt, String, default: -> r,p { (1+rand(8)).to_s }

you can use the following method to create a very random looking short url:

def short
  id.to_s + (salt.to_s).reverse.to_i.to_s(36)
end

This takes the id, changes it to a string and then concatenates the salt value to the end before reversing it and then changing into base 36. This has the effect of making notes with consecutive ids have very different looking short urls. Take the example from above of 1000000 and 1000001:

(1000000.to_s + (1+rand(8)).to_s).reverse.to_i.to_s(36) => "zq0ap"
(1000001.to_s + (1+rand(8)).to_s).reverse.to_i.to_s(36) => "1c8401"

You do sacrifice the length a bit here as the salt makes the resulting url longer, but I feel this is worth it to achieve short urls that appear to be random. You could make the urls even shorter by using base 62 numbers (they also use all the capital letters A-Z), but you’ll have to use an external library such as the Base 62 gem.

Short urls don’t actually have to be saved to the database, because they have a simple inverse function that can be applied to the short url string to map it back to the original id of the note.

url.to_i(36).to_s.reverse.chop

This converts the string back into a base 10 integer, then back to a string, reverses it and chops off the last digit (which is the random salt value). DataMapper’s get method can then be used to find the note using its id:

get '/short/:url' do
  @note = Note.get params[:url].to_i(36).to_s.reverse.chop
  slim :show
end

I hope you’ve found this useful. Leave a comment about how you might use some of these techniques, or other ways of writing urls.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://bibwild.wordpress.com Jonathan Rochkind

    Yes,useful, thanks.

    On the short URLs, if all you to to look up in the db is throw out the last letter and then convert back to an integer… they’re still guessable if the ‘attacker’ knows that, because the last char is irrelevant. Attacker can take an existing id, throw out the last char, increment one, add whatever last char they want, and it’ll work.

    But URL security by nobody guessing it is probably not a great idea anyway in general. That URL is quite likely to end up on the web somewhere anyway, when someone bookmarks it on delicious or google+, or something. I can’t say exactly how it happens, but every single time I’ve put something on the web at a weird URL I was sure nobody was linking to… it eventually shows up on google anyway if I don’t robots.txt it. Obscure URLs have a way of eventually ending up linked to on the public web and indexed by google.

    • http://ididitmyway.heroku.com/ Darren Jones

      Hi Jonathan,

      Good point about the short-urls. The attack you describe could be stopped by checking if the char that is chopped off equals the salt value, but there are only 9 possibilities for this so a determined attacker would soon guess it. This was really just to stop somebody who just being curious and trying a few different random urls. There is effectively a 1 in 9 chance that people could guess an url at random; not great but hopefully enough to put people off. You could always use 2 digits in the salt to make the chances 1 in 90, but then the urls would be longer. As always, it is a trade off between having short urls and making them random.

      I don’t think any url scheme can be totally relied upon to keep things safe and secure – you’ll always need to have some robust form of password protection if you want that.

      cheers,

      DAZ

  • http://jozzua.com Jozzua

    More Sinatra sample apps please!
    Really love these. Yours are the most useful/well-structured tutorials I’ve seen so far.

    • http://ididitmyway.heroku.com/ Darren Jones

      Thanks Jazzua!

      It’s always really nice to get such positive feedback.

      I’m writing some more tutorials at the moment so watch this space!

      DAZ

  • Torsten

    Your example with pretty urls can surely be combined with the stringex gem which produces much nicer human readable urls. Anyway – good posting.