The search engines (goto.com, Lycos, Hotbot, Webcrawler, and the like) create specialized databases using MS SQL Server, Sybase (basically the same as SQL Server but for Unix), Informix or Oracle. To fill these databases they ask for submissions and create specialized browsing agents called spiders. The spiders sit on a server and just ping web addresses until they find an actual site. Once they get a site they pull in the default page and then proceed to follow its links. All of these pages are input into the database for later querying.
Each search engine has its own proprietary searching and sorting algorithms to display your searches. Naturally they are rather reluctant to disclose these pieces of code.
Building a personal Engine
Things are not that difficult for the webmaster these days if they want to add similar capabilities to their sites. To start you can download the data from the Open Directory Project using something like Anaconda Open Directory or POD written in PERL or you can write a script that pulls the information fairly simply in ASP and PHP using prebuilt components and/or modules.