Well it certainly sounds like a fun project, but it may be beyond my expertise. So, take what I say with a grain of salt.
I would start by getting very familiar with regular expressions. I'm assuming you'll be using PHP for this project, given that you've posted in the PHP forum. If regular expressions are new or foreign to you, don't freak out. It's not the easiest thing in the world to grasp, at least not for me. Take the best try you can and the folks around these forums will be happy to help you out, though they're not going to do all the heavy lifting for you. http://www.lmgtfy.com/?q=php+regular+expressions
You have two options. You could either spend 6 years writing algorithms to decipher the content of any HTML page and recognize author's names, book titles, ISBN numbers, etc. - or you could figure out how to crowd source your efforts. Coming up with a way to allow people to help you identify the content on a page would probably be ideal. There have been some popular attempts at tasks such as this, some more successful than others. I remember when Google wanted help tagging images for searches. They created some sort of game or activity which allowed users to quickly and easily add tags or descriptions to images they found. Clever, but it wasn't immensely popular. In fact, they recently shut down the service entirely. Another example, which is wildly successful, is reCaptcha. They decided that instead of just making up curly and distorted words for users to attempt to read in order to prove they were human, they would present users with real words from a scanned book. Essentially they crowd sourced humans to form the most powerful OCR software available. It worked so well that Google bought them. My point is that you may want to come up with a reason for your users to help you out here.
As far as the technology behind something like that... My thought is that maybe you give the user a toolbar or browser extension by which they can draw boxes around or highlight text on a particular web page. They can then label that text as a book title, an author, etc. Maybe by logging this information they're also adding that book to their wish list or something like that. Again, give them a purpose to participate. This isn't going to be easy.
Maybe it's time to think backwards into the situation. Is the end goal to build a giant database of books? If so, somebody has probably already beat you to it, for example, every public library. Depending on how much that database is worth to you, you could probably purchase such a list.