Advice on Transcribing Digitized Books

I’m not sure if this is the right forum for my question, but here goes…

I would like take a series of books that are in the public domain and turn them into ebooks. Most of the books have been digitized, and I can download the pages and use Adobe Acrobat to convert them to text. However, the translation is far from perfect, with countless misspellings and blocks of text missing where the digitized text is faded or missing.

So I have to go through each manuscript page by page, fixing all the mistakes and typing in the missing text, as well as adding styles (e.g. bold, italics), superscripts, etc.

I can type about 70 words a minute, but I’ve estimated that it takes me an average of about ten minutes to type a page out of a particular book from scratch. At that rate, it would take approximately 80-85 hours to type up a 500-page book. But with most of the text properly translated by Acrobat, the time involved might be cut in half or less.

I just wondered if anyone has any advice for expediting the process. For example, I’d like to find a collection of regular expressions that would help me plow through some of the more common errors. For example, a regular expression that recognizes thirteen variations of New York (e.g. Ne,v York, New Yark, etc.) and convert them all to “New York” would be helpful.

Another possibility is to hire someone do the typing and transcription. Can anyone suggest a place to advertise for such a service? What would be a fair price?

Are you aware of any software that helps with such projects?

Thanks for any tips.

Have you looked at Project Gutenberg, which does exactly this through Distributed Proofreaders?

You may be able to find some help in their documentation or forums, or even persuade them to take on the projects.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.