Hi all. I noticed recently that the text recognition on my resume, saved in PDF format, is inaccurate. I designed the resume in Apple Pages and exported to PDF. However—and possibly because of my font—the text recognition has replaced my t’s with P’s. For example, if you copy-paste some text from my resume, it will read:
“InvesPgate financial mismanagement and regulatory violaPons in the aviaPon, defense, environmental, and healthcare industries.”
When a PDF is made up from rasterised images, rather than actual text, it will use OCR to convert the image to text. OCR can be prone to errors, depending on image quality, fonts or just how good the OCR software is.
But these types of PDF usually come from scanned documents. I would not expect an export from Pages to produce bitmap images, though I have not used it.
No, I missed that. :der: I believe that PDFs can hold both a scanned version and a text/OCR version of the same document. My guess would be that that is where the problem lies, although I don’t know why Quartz would need to create a scanned version I don’t know.
Sorry, I saw the OCR and missed the Apple Pages. So l thought you had scanned a print copy. lf that had been the case, then it might have been possible to improve the recognition or to type the text into a word processing program. Which of course is what Apple Pages is.
I just now looked at my Apple Pages to check the export to PDF but didn’t find it.
Might it be you crafted the résumé using a non-standard keyboard setting that didn’t get “translated” to standard?
Though it seems there would be “funny looking characters” if that was the case.
Try again using a different app?
Writing a good résumé is tough enough let alone making sure it doesn’t have any words with “ti” in them.
If I import the PDF into Illustrator (as well as complaining about missing fonts) the text comes out as gobbledegook, as if it’s maybe a different character set or something.
Maybe that would explain why it is OCRing actual text. That’s the bit that did not make any sense to me, why would it OCR text?
As a test, I highlighted all text and turned it all into Calibri. Then I copy-pasted from PDF again. This is what happened:
“InvesXgate financial mismanagement and regulatory violaXons in the aviaXon, defense,
environmental, and healthcare industries.”