SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Addict
    Join Date
    May 2008
    Location
    Missouri, USA
    Posts
    273
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Extract Text from Images

    I've done some research and I'm not sure this is possible in PHP but thought I would ask the experts.

    In its most basic form, I want to be able to upload a document and extract all text from it which will then be parsed.

    Is this possible with PHP either through a built in library or 3rd party library?
    What other languages better suit this task (I would like to keep it web-based if possible)?
    Follow Me On Twitter: BryceRay

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    The process is called OCR (Optical Character Recognition).

    You could take a look at phpOCR.

    I've never tried it myself, but it's worth a shot.

    By the way, I fiddled around with OCR in the past (using desktop apps, not PHP) and found that they're not flawless. For example they easily confuse a c for an o and vice versa. Overall the results are not bad, but it looks a bit like it's written by someone who's made some typos here and there
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Excellent advice, although I'd probably opt for a CLI compatible OCR app, I don't think PHP has a place here.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.

  4. #4
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,871
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Anything that you feed through any OCR needs to be proofread to correct all the errors before you can use it. There is no way to automate that part of the process.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  5. #5
    SitePoint Zealot
    Join Date
    Jan 2006
    Location
    Gold Coast, Australia
    Posts
    123
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Since you're on a Linux box (I'm assuming), you should have a look at Ocropus, its not PHP, but ofcourse you can call it via exec (=( ).


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •