Short, Long and Pretty Urls
Urls, Uniform Resource Locator, are fundamental to the web. These are the addresses that are used for finding web pages or, more specifically, ‘resources’ as they don’t actually need to be web pages, they can be anything from images, files or even raw data. In most modern websites, urls have come a long way from the days when they were illegible and gave away the technology being used eg ‘/pages/show.php?page=15&tag=ruby’.
Modern urls are usually short, readable and descriptive such as ‘/pages/tagged/with/ruby’. There has also been big increase in the use of short urls such as those used by Bit.ly and the like. Another type of url that is often overlooked is the long url. These are used to avoid the url being memorable or easy to reproduce. For example if all the pages of a site are public facing and you only want people to have access to urls they have been specifically sent.
For the purposes of demonstrating these three different url schemes, I’m going to use a very simple Sinatra & DataMapper app that creates notes.
You can see a demo of this app here: Pretty Short and Long
The source code is available on GitHub.
The Note Model
Each note has a title and some content as can be seen in the code for the model:
class Note
include DataMapper::Resource
property :id, Serial
property :title, String, :required => true
property :content, Text
end
Pretty Urls
A ‘pretty’ url can be thought of as one that human-readable and descriptive. There are some small SEO benefits in using them, but the main reason is that they give your urls a much more professional look and make them more memorable.
It’s easy to create a pretty url for each note by adding an extra property to the model called ‘pretty’ and creating it by default based on the title entered.
property :pretty, String, default: -> r,p { r.make_pretty }
This uses a proc object to call the following make_pretty method on the newly created note object and then save it to the database:
def make_pretty
title.downcase.gsub(/W/,'-').squeeze('-').chomp('-')
end
This method takes the string used for the title and then chains 4 string methods together in order to create the pretty url. Here is a breakdown of what each method does to the example title ‘Ouch! That really Hurt!’
downcase: Changes all the letters to lowercase – ‘ouch! that really hurt!’
gsub(/W/,’-‘): replaces all characters that are not letters or numbers with a hyphen – ‘ouch–that-really-hurt-’
squeeze(‘-‘): replaces any repeated hyphens with a single hyphen – ‘ouch-that-hurt-’
chomp(‘-‘) : Removes any hyphens from the end that can look messy – ‘ouch-that-hurt’
Because this was saved as a property of the Note class, the database can be queried to find notes based on this property using the first
method:
get '/pretty/:url' do
@note = Note.first(:pretty => params[:url])
slim :show
end
Long Urls
A long url for each note can be easily created by hashing some values that are unique to the note. An extra property is needed called ‘long’ which will use a proc to call the make_long method and then save the resulting string to the database:
property :long, String, default: -> r,p { r.make_long }
This uses the Digest library to hash a string created from concatenating the time the note was created to the note’s title and id.
def make_long
Digest::SHA1.hexdigest(Time.now.to_s + self.title + self.id.to_s)
end
The id is used to ensure that this string will be unique and the timestamp will make it difficult to make random guesses. I’ve chosen to use the SHA1 library, which creates a 40-character string, but there are others such as MD5, SHA2 and BCRYPT.
Notes using long urls can be found in much the same way as the pretty urls:
get '/long/:url' do
@note = Note.first(:long => params[:url])
slim :show
end
Short Urls
The easiest way to create a short url for each note would be to simply use the note’s id property as the url (eg ‘/3’ would be the url for the note with an id of 3). Unfortunately there are at least two disadvantages to this approach: Firstly, as the number of notes grows, the length of the url will also grow – once you go above a million notes, the urls will become 7 or more digits long. Secondly, if you are simply using an auto-incrementing id as the url, then a user may be tempted to try changing the value in the hope of finding another note that is not meant for them (assuming there is no password protection in place). For example if somebody has sent me a link to a note with the url of ‘/17’ then I may be tempted to also look at the urls ‘/15’ and ‘/16’.
The first problem can be solved by a change of base. By changing the id into a base 36 number, you will significantly reduce the number of digits required. Base 36 numbers use all the digits 0-9 and all the letters a-z (lowercase only) to represent numbers. For example, the number 1000000 in base 36 is lfls. Ruby has a neat built in method for changing the base of a number – you just have to add the base you want to convert to as an argument to the to_s
Integer method eg 1000000.to_s(36) => “lfls”
As you can see, a string is returned. To change back, use string’s to_i
method with the same argument eg ”lfls”.to_i(36) => 1000000
. As you can see this has reduced a 7-digit number into a 4-character string.
We still haven’t solved the second problem though – people trying to guess other urls. For example if the url ‘/lfls’ point to the 1 millionth note, then I could easily find the next note by typing ‘lflt’ which is the base 36 representation of 1000001. To disguise these short urls we first need to create a random 1-digit number that will be stored in the database as a salt:
property :salt, String, default: -> r,p { (1+rand(8)).to_s }
you can use the following method to create a very random looking short url:
def short
id.to_s + (salt.to_s).reverse.to_i.to_s(36)
end
This takes the id, changes it to a string and then concatenates the salt value to the end before reversing it and then changing into base 36. This has the effect of making notes with consecutive ids have very different looking short urls. Take the example from above of 1000000 and 1000001:
(1000000.to_s + (1+rand(8)).to_s).reverse.to_i.to_s(36) => "zq0ap"
(1000001.to_s + (1+rand(8)).to_s).reverse.to_i.to_s(36) => "1c8401"
You do sacrifice the length a bit here as the salt makes the resulting url longer, but I feel this is worth it to achieve short urls that appear to be random. You could make the urls even shorter by using base 62 numbers (they also use all the capital letters A-Z), but you’ll have to use an external library such as the Base 62 gem.
Short urls don’t actually have to be saved to the database, because they have a simple inverse function that can be applied to the short url string to map it back to the original id of the note.
url.to_i(36).to_s.reverse.chop
This converts the string back into a base 10 integer, then back to a string, reverses it and chops off the last digit (which is the random salt value). DataMapper’s get method can then be used to find the note using its id:
get '/short/:url' do
@note = Note.get params[:url].to_i(36).to_s.reverse.chop
slim :show
end
I hope you’ve found this useful. Leave a comment about how you might use some of these techniques, or other ways of writing urls.