Run Your Own Spider

By Blane Warrene

I came across Carlos Perez’s blog, manageability.org, while Googling for some research today. Carlos had a great list of open source web crawlers that included JSpider, a tool I have used for error checking on web sites.

JSpider is written entirely in Java and can be configured extensively for spidering, error checking and downloading. It of course obeys robots.txt files (http://www.robotstxt.org/wc/norobots-rfc.txt) and additional options included in configuration.

I thought the added downloading option was nice as I had been using a separate application to pull down entire web sites for offline use. Now this can be accomplished with the JSpider engine.

The tool has a plug-in architecture that opens the door for custom development from users to extend JSpider to meet their needs (and perhaps contribute to the project). JSpider is released under the LGPL license.

JSpider does require J2SE 1.3+ Runtime and an XMLParser (Xerces, …) installed (comes with JDK1.4). The app will run on any system supporting Java and these requirements.

There is even a simple sample site JSpider has created for testing purposes once you get up and running. Additionally, a fairly comprehensive 120 page user manual is available in PDF format.

  • http://www.Bigmoolah.com yolah

    i also have my own spider on my site that searches other sites its a webcrawler at yolah.com it pretty cool its like google



Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Instant Website Review

Use Woorank to analyze and optimize your website to improve your website to improve your ranking!

Run a review to see how your site can improve across 70+ metrics!

Get the latest in Front-end, once a week, for free.