Hi there,
I have a PDF which is in Chinese.
I am trying to copy it into Notepad to then paste it into HTML, however, it is appearing as little squares when I paste.
Can anyone recommend a way of transferring this from the Chinese in the document into Notepad or another program which I can then copy out again?
Thanks
SamA74
November 4, 2018, 6:27pm
2
Is your character set UTF-8 and are you saving the file as UTF-8?
<meta charset="UTF-8">
I’m trying to add it into WordPress and also Dreamweaver, but it’s still just the strange little squares.
I wondered if there is a way of exporting or copying from a PDF keeping the Chinese symbols?
Erik_J
November 4, 2018, 9:57pm
5
Select and copy the Chinese text in the pdf, paste it in a new, not yet saved, Notepad document.
Yes, it will display as bars or rectangles, depending on the language in Notebook. But it actually is the Chinese letters, saving the file in UTF-8 (any font) will keep the copy-pasted Chinese letters. Mind though that saving in ANSI or ISO can ruin the pasted utf-8 characters.
But also try the copy-paste direct into the html and save the file as UTF-8, that should work too as far as I can tell.
There are tools that can extract and save text, unformatted, from pdf documents.
The browser should show the Chinese letters, even if the editor maybe can’t like Notepad.
If not, check, as @SamA74 says, that the UTF-8 is used in all steps and also the html meta charset is UTF-8.
Please tell how you succeed, or fail.
1 Like
toolman
November 4, 2018, 10:27pm
6
Hi,
Thanks for the reply.
I have tried saving using UTF-8, but no luck
Erik_J
November 4, 2018, 10:42pm
7
Pity you didn’t include the select character set box below showing UTF-8.
If you open the originally saved as UTF-8 text file in the browser, what does it show there?
toolman
November 4, 2018, 10:50pm
8
Sorry, which character box? This is what I have when I save it:
This is what it outputs in Chrome (both HTML and TXT)
Erik_J
November 4, 2018, 11:19pm
9
When I paste and save as UTF-8:
Cromium-test.txt (65 Bytes)
Above is pasted from Google translate, but I often copy snippets from Chinese pdf documents.
Could you maybe link to a pdf that you fail copy-paste Chinese from, or tell what pdf-reader you use and also post the content in the pdf file properties?
toolman
November 5, 2018, 9:12am
10
Hi,
I’m using Adobe Acrobat Pro.
This is a screenshot of the PDF properties:
Erik_J
November 5, 2018, 10:36am
11
The fonts tab is interesting too getting a clue why copy-paste fails. I assume fonts are not embedded, but does it say what the character table is, ISO/UTF?
So, you can actually edit the pdf document’s content, fonts and securitty etc. I assumed the copy-paste was done in a pdf reader that can only display the content, maybe you could try that if everything else fails.
I can’t give any advise for your Acrobat Pro, the last version I had was Acrobat Pro 1.x, sorry I’m not on Windows since very long.
Now I’m curious, what happens if you insert the Chinese snippet I posted above to the pdf and save as UTF-8. Will the copy-paste of that snippet too result in squares?
Hi there toolman,
why don’t you upload your PDF file to this thread?
Members will then have a golden opportunity to test
your problem and possibly resolve it for you.
coothead
2 Likes
toolman
November 5, 2018, 2:52pm
13
Hi,
Thanks for the reply.
This is one of the pages:
http://elop.co/page.pdf
Erik_J
November 5, 2018, 9:11pm
14
Hi,
The fonts are embedded and I extracted them to see what syllables was used in the document. I couldn’t find the Chinese letters displayed in the pdf in the 40kB embedded subset of the 13.6MB PingFangSC-Regular.ttf. What the text encoding the text originally had isn’t clear either.
I could read the leaflet was about the qualities of Botox. But the Chinese letters I think Acrobat might have replaced with images, that could explain the file size. The file is “optimized” so it’s not easy to debug it.
Now this is not the pdf you posted the file properties from.
This pdf is v.1.6 and it’s created a few hours after @coothead suggested to upload the pdf.
If you or a college of yours created it, it would be reasonable to think you have other ways to get the Chinese text than copy-pasting it to Notepad.
Anyway, what I think could also be totally wrong.
toolman
November 5, 2018, 9:22pm
15
Hi,
Many thanks for the reply.
Yes, this is a different page - a more simpler one which is not as big in size as the other pages.
I have tried to download the PingFang font and also SFNSDisplay, but still no luck
Hi there toolman,
check out the attachment which contains the page basics:
an HTML file, a CSS file and the three woff fonts used.
pdf-to-html.zip (43.6 KB)
Unfortunately, the Chinese characters stubbornly refused
to be Copy & Pasted .
coothead
toolman
November 5, 2018, 10:25pm
17
Hi,
Many thanks, that worked great
How did you export the PDF?
Erik_J
November 5, 2018, 10:28pm
18
You are allowed to download one font free of charge at the Chinese site https://en.fontke.com/ . You can find the font there and download it using the browser. Note: The site is in Chinese so the download names are in Chinese too, and you might need to rename the file to open it.
About the copy-paste failure, I stumbled over an interesting Github thread that explains and also has a tip how to get around the failure by exporting the selected part as text.
I think you could find the answer or links to here:
opened 11:11AM - 06 Sep 17 UTC
0708测试使用gs optimization 原来有问题的pdf 失败
```
(py3.5) ➜ pdftabextract git:(master)… ✗ pdf2ps 111.pdf 111.ps
(py3.5) ➜ pdftabextract git:(master) ✗ ps2pdf -dPDFSETTINGS=/ebook 111.ps 111-optimized.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 111-optimized.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
(py3.5) ➜ pdftabextract git:(master) ✗ gs -o 111-optim.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true 111.pdf
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 111-optim.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
OHXVUR+SimSun TrueType WinAnsi yes yes yes 10 0
(py3.5) ➜ pdftabextract git:(master) ✗ pdftocairo 111.pdf
Error: one of the output format options (-png, -jpeg, -ps, -eps, -pdf, -print, -printdlg, -svg) must be used.
(py3.5) ➜ pdftabextract git:(master) ✗ pdftocairo -pdf 111.pdf
Error: an output filename or '-' must be supplied when the output format is PDF and input PDF file is a local file.
(py3.5) ➜ pdftabextract git:(master) ✗ pdftocairo -o pdf 111.pdf
Error: one of the output format options (-png, -jpeg, -ps, -eps, -pdf, -print, -printdlg, -svg) must be used.
(py3.5) ➜ pdftabextract git:(master) ✗ pdftocairo -pdf 111.pdf 111-pdftocario.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 111-pdftocario.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JKQTOQ+SimSun CID TrueType Identity-H yes yes yes 9 0
OKUCXI+SimSun TrueType WinAnsi yes yes yes 10 0
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext 111-pdftocario.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ $ gs -sDEVICE=pdfwrite -o 111.pdf -dBATCH -f mypg3out.pdf Adobe-GB1-UCS2
zsh: command not found: $
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o 111.pdf -dBATCH -f mypg3out.pdf Adobe-GB1-UCS2
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Error: /undefinedfilename in (mypg3out.pdf)
Operand stack:
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push
Dictionary stack:
--dict:1204/1684(ro)(G)-- --dict:0/20(G)-- --dict:78/200(L)--
Current allocation mode is local
Last OS error: No such file or directory
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o 111-out.pdf -dBATCH -f 111.pdf Adobe-GB1-UCS2
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Error: /rangecheck in defineresource
Operand stack:
Adobe-GB1-UCS2 --dict:10/12(L)-- CMap Adobe-GB1-UCS2 --dict:10/12(L)--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_pop 1966 1 3 %oparray_pop 1852 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1929 3 5 %oparray_pop defineresource %errorexec_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1204/1684(ro)(G)-- --dict:1/20(G)-- --dict:78/200(L)-- --dict:38/38(ro)(G)-- --dict:10/12(L)-- --dict:16/25(ro)(G)--
Current allocation mode is local
Current file position is 231880
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o 111-out.pdf -dBATCH -f 111.pdf Adobe-CNS1-UCS2
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Error: /rangecheck in defineresource
Operand stack:
Adobe-CNS1-UCS2 --dict:10/12(L)-- CMap Adobe-CNS1-UCS2 --dict:10/12(L)--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_pop 1966 1 3 %oparray_pop 1852 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1929 3 5 %oparray_pop defineresource %errorexec_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1204/1684(ro)(G)-- --dict:1/20(G)-- --dict:78/200(L)-- --dict:38/38(ro)(G)-- --dict:10/12(L)-- --dict:16/25(ro)(G)--
Current allocation mode is local
Current file position is 265113
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o 111-out.pdf -dBATCH -f 111.pdf Adobe-CNS1-UCS2
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Error: /rangecheck in defineresource
Operand stack:
Adobe-CNS1-UCS2 --dict:10/12(L)-- CMap Adobe-CNS1-UCS2 --dict:10/12(L)--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_pop 1966 1 3 %oparray_pop 1852 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1929 3 5 %oparray_pop defineresource %errorexec_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1204/1684(ro)(G)-- --dict:1/20(G)-- --dict:78/200(L)-- --dict:38/38(ro)(G)-- --dict:10/12(L)-- --dict:16/25(ro)(G)--
Current allocation mode is local
Current file position is 265113
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o 111-out.pdf -dBATCH -f 111.pdf Adobe-GB1-UCS2
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Error: /rangecheck in defineresource
Operand stack:
Adobe-GB1-UCS2 --dict:10/12(L)-- CMap Adobe-GB1-UCS2 --dict:10/12(L)--
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1983 1 3 %oparray_pop 1982 1 3 %oparray_pop 1966 1 3 %oparray_pop 1852 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1929 3 5 %oparray_pop defineresource %errorexec_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1204/1684(ro)(G)-- --dict:1/20(G)-- --dict:78/200(L)-- --dict:38/38(ro)(G)-- --dict:10/12(L)-- --dict:16/25(ro)(G)--
Current allocation mode is local
Current file position is 231880
GPL Ghostscript 9.21: Unrecoverable error, exit code 1
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext 111-out.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 111-out.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
OHXVUR+SimSun TrueType WinAnsi yes yes yes 10 0
(py3.5) ➜ pdftabextract git:(master) ✗ gs -sDEVICE=pdfwrite -o mypg3o2-111.pdf -dBATCH \
-c '/CIDSystemInfo << /Registry (Adobe) /Ordering (Unicode) /Supplement 1 >>' \
-f 111.pdf
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext mypg3o2-111.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts mypg3o2-111.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
OHXVUR+SimSun TrueType WinAnsi yes yes yes 10 0
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 11-reprint-osx.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
SRPUEP+SimSun TrueType WinAnsi yes yes yes 13 0
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext 11-reprint-osx.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 11-reprint-osx.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
ETOBOE+SimSun TrueType WinAnsi yes yes yes 13 0
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext 11-reprint-osx.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pstopdf 11-reprint-osx.ps 11-reprint-osx-2.pdf
/AppleBraille-Outline6Dot
/AppleBraille-Outline8Dot
/AppleBraille-Pinpoint6Dot
/AppleBraille-Pinpoint8Dot
/AppleBraille
/AppleColorEmoji
/.AppleColorEmojiUI
/AppleSymbols
/AquaKana
/AquaKana-Bold
/ArialHebrew
/ArialHebrew-Bold
/ArialHebrew-Light
/.ArialHebrewDeskInterface
/.ArialHebrewDeskInterface-Bold
/.ArialHebrewDeskInterface-Light
/ArialHebrewScholar
/ArialHebrewScholar-Bold
/ArialHebrewScholar-Light
/AvenirNextCondensed-Bold
/AvenirNextCondensed-BoldItalic
/AvenirNextCondensed-DemiBold
/AvenirNextCondensed-DemiBoldItalic
/AvenirNextCondensed-Italic
/AvenirNextCondensed-Medium
/AvenirNextCondensed-MediumItalic
/AvenirNextCondensed-Regular
/AvenirNextCondensed-Heavy
/AvenirNextCondensed-HeavyItalic
/AvenirNextCondensed-UltraLight
/AvenirNextCondensed-UltraLightItalic
/AvenirNext-Bold
/AvenirNext-BoldItalic
/AvenirNext-DemiBold
/AvenirNext-DemiBoldItalic
/AvenirNext-Italic
/AvenirNext-Medium
/AvenirNext-MediumItalic
/AvenirNext-Regular
/AvenirNext-Heavy
/AvenirNext-HeavyItalic
/AvenirNext-UltraLight
/AvenirNext-UltraLightItalic
/Avenir-Book
/Avenir-BookOblique
/Avenir-Black
/Avenir-BlackOblique
/Avenir-Heavy
/Avenir-HeavyOblique
/Avenir-Light
/Avenir-LightOblique
/Avenir-Medium
/Avenir-MediumOblique
/Avenir-Oblique
/Avenir-Roman
/Courier
/Courier-Bold
/Courier-Oblique
/Courier-BoldOblique
/GeezaPro
/GeezaPro-Bold
/.GeezaProInterface
/.GeezaProInterface-Bold
/.GeezaProInterface-Light
/.GeezaProPUA-Regular
/.GeezaProPUA-Bold
/Geneva
/Helvetica
/Helvetica-Bold
/Helvetica-Oblique
/Helvetica-BoldOblique
/Helvetica-Light
/Helvetica-LightOblique
/HelveticaNeue-Bold
/HelveticaNeue
/HelveticaNeue-UltraLight
/HelveticaNeue-Italic
/HelveticaNeue-Light
/HelveticaNeue-UltraLightItalic
/HelveticaNeue-CondensedBlack
/HelveticaNeue-CondensedBold
/HelveticaNeue-BoldItalic
/HelveticaNeue-LightItalic
/HelveticaNeue-Medium
/HelveticaNeue-Thin
/HelveticaNeue-ThinItalic
/HelveticaNeue-MediumItalic
/.HelveticaNeueDeskInterface-Regular
/.HelveticaNeueDeskInterface-Bold
/.HelveticaNeueDeskInterface-Italic
/.HelveticaNeueDeskInterface-BoldItalic
/.HelveticaNeueDeskInterface-MediumP4
/.HelveticaNeueDeskInterface-MediumItalicP4
/.HelveticaNeueDeskInterface-Light
/.HelveticaNeueDeskInterface-Thin
/.HelveticaNeueDeskInterface-UltraLightP2
/.HelveticaNeueDeskInterface-Heavy
/.Keyboard
/LastResort
/LucidaGrande
/LucidaGrande-Bold
/.LucidaGrandeUI
/.LucidaGrandeUI-Bold
/MarkerFelt-Thin
/MarkerFelt-Wide
/Menlo-Regular
/Menlo-Bold
/Menlo-Italic
/Menlo-BoldItalic
/Monaco
/Noteworthy-Light
/Noteworthy-Bold
/Optima-Regular
/Optima-Bold
/Optima-Italic
/Optima-BoldItalic
/Optima-ExtraBlack
/Palatino-Roman
/Palatino-Italic
/Palatino-Bold
/Palatino-BoldItalic
/.SFCompactDisplay-Black
/.SFCompactDisplay-Bold
/.SFCompactDisplay-Heavy
/.SFCompactDisplay-Light
/.SFCompactDisplay-Medium
/.SFCompactDisplay-Regular
/.SFCompactDisplay-Semibold
/.SFCompactDisplay-Thin
/.SFCompactDisplay-Ultralight
/.SFCompactRounded-Black
/.SFCompactRounded-Bold
/.SFCompactRounded-Heavy
/.SFCompactRounded-Light
/.SFCompactRounded-Medium
/.SFCompactRounded-Regular
/.SFCompactRounded-Semibold
/.SFCompactRounded-Thin
/.SFCompactRounded-Ultralight
/.SFCompactText-Bold
/.SFCompactText-BoldItalic
/.SFCompactText-Heavy
/.SFCompactText-HeavyItalic
/.SFCompactText-Light
/.SFCompactText-LightItalic
/.SFCompactText-Medium
/.SFCompactText-MediumItalic
/.SFCompactText-Regular
/.SFCompactText-Italic
/.SFCompactText-Semibold
/.SFCompactText-SemiboldItalic
/.SFNSDisplay
/.SFNSDisplayCondensed-Black
/.SFNSDisplayCondensed-Bold
/.SFNSDisplayCondensed-Heavy
/.SFNSDisplayCondensed-Light
/.SFNSDisplayCondensed-Medium
/.SFNSDisplayCondensed-Regular
/.SFNSDisplayCondensed-Semibold
/.SFNSDisplayCondensed-Thin
/.SFNSDisplayCondensed-Ultralight
/.SFNSText
/.SFNSTextCondensed-Bold
/.SFNSTextCondensed-Heavy
/.SFNSTextCondensed-Light
/.SFNSTextCondensed-Medium
/.SFNSTextCondensed-Regular
/.SFNSTextCondensed-Semibold
/.SFNSText-Italic
/STHeitiTC-Light
/STHeitiSC-Light
/STHeitiTC-Medium
/STHeitiSC-Medium
/Symbol
/Thonburi
/Thonburi-Bold
/Thonburi-Light
/Times-Roman
/Times-Bold
/Times-Italic
/Times-BoldItalic
/ZapfDingbatsITC
/ZapfDingbats
/ACaslonPro-Bold
/ACaslonPro-BoldItalic
/ACaslonPro-Italic
/ACaslonPro-Regular
/ACaslonPro-Semibold
/ACaslonPro-SemiboldItalic
/AdobeArabic-Bold
/AdobeArabic-BoldItalic
/AdobeArabic-Italic
/AdobeArabic-Regular
/AdobeDevanagari-Bold
/AdobeDevanagari-BoldItalic
/AdobeDevanagari-Italic
/AdobeDevanagari-Regular
/AdobeFangsongStd-Regular
/AdobeFanHeitiStd-Bold
/AdobeGothicStd-Bold
/AdobeHebrew-Bold
/AdobeHebrew-BoldItalic
/AdobeHebrew-Italic
/AdobeHebrew-Regular
/AdobeHeitiStd-Regular
/AdobeKaitiStd-Regular
/AdobeMingStd-Light
/AdobeMyungjoStd-Medium
/AdobeNaskh-Medium
/AdobeSongStd-Light
/AGaramondPro-Bold
/AGaramondPro-BoldItalic
/AGaramondPro-Italic
/AGaramondPro-Regular
/AlNile
/AlNile-Bold
/.AlNilePUA
/.AlNilePUA-Bold
/AlTarikh
/.AlTarikhPUA
/AlBayan
/.AlBayanPUA
/AlBayan-Bold
/.AlBayanPUA-Bold
/AmericanTypewriter
/AmericanTypewriter-Light
/AmericanTypewriter-Bold
/AmericanTypewriter-Semibold
/AmericanTypewriter-Condensed
/AmericanTypewriter-CondensedBold
/AmericanTypewriter-CondensedLight
/AndaleMono
/Apple-Chancery
/AppleGothic
/AppleMyungjo
/Arial-Black
/Arial-BoldItalicMT
/Arial-BoldMT
/Arial-ItalicMT
/ArialNarrow-BoldItalic
/ArialNarrow-Bold
/ArialNarrow-Italic
/ArialNarrow
/ArialRoundedMTBold
/ArialUnicodeMS
/ArialMT
/Athelas-Regular
/Athelas-Italic
/Athelas-BoldItalic
/Athelas-Bold
/Ayuthaya
/Baghdad
/.BaghdadPUA
/BanglaMN
/BanglaMN-Bold
/BanglaSangamMN
/BanglaSangamMN-Bold
/Baskerville
/Baskerville-Bold
/Baskerville-Italic
/Baskerville-BoldItalic
/Baskerville-SemiBold
/Baskerville-SemiBoldItalic
/Beirut
/.BeirutPUA
/BigCaslon-Medium
/BirchStd
/BlackoakStd
/BodoniSvtyTwoOSITCTT-Book
/BodoniSvtyTwoOSITCTT-BookIt
/BodoniSvtyTwoOSITCTT-Bold
/BodoniSvtyTwoSCITCTT-Book
/BodoniSvtyTwoITCTT-Book
/BodoniSvtyTwoITCTT-BookIta
/BodoniSvtyTwoITCTT-Bold
/BodoniOrnamentsITCTT
/BradleyHandITCTT-Bold
/BrushScriptMT
/BrushScriptStd
/Chalkboard
/Chalkboard-Bold
/ChalkboardSE-Light
/ChalkboardSE-Regular
/ChalkboardSE-Bold
/Chalkduster
/ChaparralPro-Bold
/ChaparralPro-BoldIt
/ChaparralPro-Italic
/ChaparralPro-LightIt
/ChaparralPro-Regular
/CharlemagneStd-Bold
/Charter-Roman
/Charter-Italic
/Charter-BoldItalic
/Charter-Bold
/Charter-BlackItalic
/Charter-Black
/Cochin
/Cochin-Bold
/Cochin-Italic
/Cochin-BoldItalic
/ComicSansMS-Bold
/ComicSansMS
/CooperBlackStd-Italic
/CooperBlackStd
/Copperplate
/Copperplate-Light
/Copperplate-Bold
/CorsivaHebrew
/CorsivaHebrew-Bold
/CourierNewPS-BoldItalicMT
/CourierNewPS-BoldMT
/CourierNewPS-ItalicMT
/CourierNewPSMT
/Damascus
/.DamascusPUA
/DamascusLight
/.DamascusPUALight
/DamascusMedium
/.DamascusPUAMedium
/DamascusBold
/.DamascusPUABold
/DamascusSemiBold
/.DamascusPUASemiBold
/DecoTypeNaskh
/.DecoTypeNaskhPUA
/DevanagariSangamMN
/DevanagariSangamMN-Bold
/DevanagariMT
/DevanagariMT-Bold
/Didot
/Didot-Italic
/Didot-Bold
/DINAlternate-Bold
/DINCondensed-Bold
/DiwanKufi
/.DiwanKufiPUA
/DiwanThuluth
/EuphemiaUCAS
/EuphemiaUCAS-Bold
/EuphemiaUCAS-Italic
/Farah
/.FarahPUA
/Farisi
/Futura-Medium
/Futura-MediumItalic
/Futura-Bold
/Futura-CondensedMedium
/Futura-CondensedExtraBold
/Georgia-BoldItalic
/Georgia-Bold
/Georgia-Italic
/Georgia
/GiddyupStd
/GillSans
/GillSans-Bold
/GillSans-Italic
/GillSans-BoldItalic
/GillSans-SemiBold
/GillSans-SemiBoldItalic
/GillSans-UltraBold
/GillSans-Light
/GillSans-LightItalic
/GujaratiSangamMN
/GujaratiSangamMN-Bold
/GujaratiMT
/GujaratiMT-Bold
/GurmukhiMN
/GurmukhiMN-Bold
/GurmukhiSangamMN
/GurmukhiSangamMN-Bold
/MonotypeGurmukhi
/Herculanum
/HoboStd
/HoeflerText-Ornaments
/HoeflerText-Regular
/HoeflerText-Black
/HoeflerText-Italic
/HoeflerText-BlackItalic
/Impact
/InaiMathi
/IowanOldStyle-Roman
/IowanOldStyle-Bold
/IowanOldStyle-Italic
/IowanOldStyle-BoldItalic
/IowanOldStyle-Black
/IowanOldStyle-BlackItalic
/IowanOldStyle-Titling
/Kailasa
/Kailasa-Bold
/KannadaMN
/KannadaMN-Bold
/KannadaSangamMN
/KannadaSangamMN-Bold
/Kefa-Regular
/Kefa-Bold
/KhmerMN
/KhmerMN-Bold
/KhmerSangamMN
/Kokonor
/KozGoPr6N-Bold
/KozGoPr6N-ExtraLight
/KozGoPr6N-Heavy
/KozGoPr6N-Light
/KozGoPr6N-Medium
/KozGoPr6N-Regular
/KozGoPro-Bold
/KozGoPro-ExtraLight
/KozGoPro-Heavy
/KozGoPro-Light
/KozGoPro-Medium
/KozGoPro-Regular
/KozMinPr6N-Bold
/KozMinPr6N-ExtraLight
/KozMinPr6N-Heavy
/KozMinPr6N-Light
/KozMinPr6N-Medium
/KozMinPr6N-Regular
/KozMinPro-Bold
/KozMinPro-ExtraLight
/KozMinPro-Heavy
/KozMinPro-Light
/KozMinPro-Medium
/KozMinPro-Regular
/Krungthep
/KufiStandardGK
/.KufiStandardGKPUA
/LaoMN
/LaoMN-Bold
/LaoSangamMN
/LetterGothicStd-Bold
/LetterGothicStd-BoldSlanted
/LetterGothicStd-Slanted
/LetterGothicStd
/LithosPro-Black
/LithosPro-Regular
/Luminari-Regular
/MalayalamMN
/MalayalamMN-Bold
/MalayalamSangamMN
/MalayalamSangamMN-Bold
/Marion-Regular
/Marion-Italic
/Marion-Bold
/MesquiteStd
/MicrosoftSansSerif
/MinionPro-Bold
/MinionPro-BoldCn
/MinionPro-BoldCnIt
/MinionPro-BoldIt
/MinionPro-It
/MinionPro-Medium
/MinionPro-MediumIt
/MinionPro-Regular
/MinionPro-Semibold
/MinionPro-SemiboldIt
/DiwanMishafiGold
/DiwanMishafi
/Mshtakan
/MshtakanOblique
/MshtakanBold
/MshtakanBoldOblique
/Muna
/.MunaPUA
/MunaBold
/.MunaPUABold
/MunaBlack
/.MunaPUABlack
/MyanmarMN
/MyanmarMN-Bold
/MyanmarSangamMN
/MyanmarSangamMN-Bold
/MyriadArabic-Bold
/MyriadArabic-BoldIt
/MyriadArabic-It
/MyriadArabic-Regular
/MyriadHebrew-Bold
/MyriadHebrew-BoldIt
/MyriadHebrew-It
/MyriadHebrew-Regular
/MyriadPro-Bold
/MyriadPro-BoldCond
/MyriadPro-BoldCondIt
/MyriadPro-BoldIt
/MyriadPro-Cond
/MyriadPro-CondIt
/MyriadPro-It
/MyriadPro-Regular
/MyriadPro-Semibold
/MyriadPro-SemiboldIt
/Nadeem
/.NadeemPUA
/NewPeninimMT
/NewPeninimMT-Inclined
/NewPeninimMT-BoldInclined
/NewPeninimMT-Bold
/NuevaStd-Bold
/NuevaStd-BoldCond
/NuevaStd-BoldCondItalic
/NuevaStd-Cond
/NuevaStd-CondItalic
/NuevaStd-Italic
/OCRAStd
/OratorStd-Slanted
/OratorStd
/OriyaMN
/OriyaMN-Bold
/OriyaSangamMN
/OriyaSangamMN-Bold
/Papyrus-Condensed
/Papyrus
/Phosphate-Inline
/Phosphate-Solid
/PlantagenetCherokee
/PoplarStd
/PrestigeEliteStd-Bd
/PTMono-Bold
/PTMono-Regular
/PTSans-Regular
/PTSans-Italic
/PTSans-NarrowBold
/PTSans-Narrow
/PTSans-CaptionBold
/PTSans-Caption
/PTSans-BoldItalic
/PTSans-Bold
/PTSerif-Regular
/PTSerif-Italic
/PTSerif-BoldItalic
/PTSerif-Bold
/PTSerif-Caption
/PTSerif-CaptionItalic
/Raanana
/RaananaBold
/RosewoodStd-Regular
/Sana
/.SanaPUA
/Sathu
/SavoyeLetPlain
/.SavoyeLetPlainCC
/Seravek
/Seravek-Italic
/Seravek-MediumItalic
/Seravek-Medium
/Seravek-LightItalic
/Seravek-Light
/Seravek-ExtraLightItalic
/Seravek-ExtraLight
/Seravek-BoldItalic
/Seravek-Bold
/ShreeDev0714
/ShreeDev0714-Bold
/ShreeDev0714-Italic
/ShreeDev0714-Bold-Italic
/Silom
/SinhalaMN
/SinhalaMN-Bold
/SinhalaSangamMN
/SinhalaSangamMN-Bold
/Skia-Regular
/SnellRoundhand
/SnellRoundhand-Bold
/SnellRoundhand-Black
/STSongti-SC-Black
/STSongti-SC-Bold
/STSongti-TC-Bold
/STSongti-SC-Light
/STSong
/STSongti-TC-Light
/STSongti-SC-Regular
/STSongti-TC-Regular
/StencilStd
/STIXGeneral-Regular
/STIXGeneral-Bold
/STIXGeneral-BoldItalic
/STIXGeneral-Italic
/STIXIntegralsD-Bold
/STIXIntegralsD-Regular
/STIXIntegralsSm-Bold
/STIXIntegralsSm-Regular
/STIXIntegralsUp-Bold
/STIXIntegralsUpD-Bold
/STIXIntegralsUpD-Regular
/STIXIntegralsUp-Regular
/STIXIntegralsUpSm-Bold
/STIXIntegralsUpSm-Regular
/STIXNonUnicode-Regular
/STIXNonUnicode-Bold
/STIXNonUnicode-BoldItalic
/STIXNonUnicode-Italic
/STIXSizeFiveSym-Regular
/STIXSizeFourSym-Bold
/STIXSizeFourSym-Regular
/STIXSizeOneSym-Bold
/STIXSizeOneSym-Regular
/STIXSizeThreeSym-Bold
/STIXSizeThreeSym-Regular
/STIXSizeTwoSym-Bold
/STIXSizeTwoSym-Regular
/STIXVariants-Regular
/STIXVariants-Bold
/SukhumvitSet-Thin
/SukhumvitSet-Light
/SukhumvitSet-Text
/SukhumvitSet-Medium
/SukhumvitSet-SemiBold
/SukhumvitSet-Bold
/Superclarendon-Regular
/Superclarendon-Italic
/Superclarendon-LightItalic
/Superclarendon-Light
/Superclarendon-BoldItalic
/Superclarendon-Bold
/Superclarendon-BlackItalic
/Superclarendon-Black
/Tahoma-Bold
/Tahoma
/TamilMN
/TamilMN-Bold
/TamilSangamMN
/TamilSangamMN-Bold
/TeamViewer10
/TektonPro-Bold
/TektonPro-BoldCond
/TektonPro-BoldExt
/TektonPro-BoldObl
/TeluguMN
/TeluguMN-Bold
/TeluguSangamMN
/TeluguSangamMN-Bold
/TimesNewRomanPS-BoldItalicMT
/TimesNewRomanPS-BoldMT
/TimesNewRomanPS-ItalicMT
/TimesNewRomanPSMT
/TrajanPro-Bold
/TrajanPro-Regular
/Trattatello
/Trebuchet-BoldItalic
/TrebuchetMS-Bold
/TrebuchetMS-Italic
/TrebuchetMS
/Verdana-BoldItalic
/Verdana-Bold
/Verdana-Italic
/Verdana
/Waseem
/WaseemLight
/Webdings
/Wingdings2
/Wingdings3
/Wingdings-Regular
/Zapfino
/AbadiMT-CondensedExtraBold
/AbadiMT-CondensedLight
/AndaleMono
/Arial-Black
/ArialNarrow
/ArialNarrow-Bold
/ArialNarrow-Italic
/ArialNarrow-BoldItalic
/ArialRoundedMTBold
/BaskOldFace
/Bauhaus93
/BellMT
/BellMTBold
/BellMTItalic
/BernardMT-Condensed
/BookAntiqua
/BookAntiqua-Bold
/BookAntiqua-Italic
/BookAntiqua-BoldItalic
/BookmanOldStyle
/BookmanOldStyle-Bold
/BookmanOldStyle-Italic
/BookmanOldStyle-BoldItalic
/Braggadocio
/BritannicBold
/Calibri-Light
/CalistoMT
/CalisMTBol
/CalistoMT-Italic
/CalistoMT-BoldItalic
/Century
/CenturyGothic
/CenturyGothic-Bold
/CenturyGothic-Italic
/CenturyGothic-BoldItalic
/CenturySchoolbook
/CenturySchoolbook-Bold
/CenturySchoolbook-Italic
/CenturySchoolbook-BoldItalic
/ColonnaMT
/ComicSansMS
/ComicSansMS-Bold
/CooperBlack
/CopperplateGothic-Bold
/CopperplateGothic-Light
/CurlzMT
/Desdemona
/EdwardianScriptITC
/EngraversMT
/EngraversMT-Bold
/EurostileRegular
/EurostileBold
/FootlightMTLight
/Garamond
/Garamond-Bold
/Garamond-Italic
/Georgia
/Georgia-Bold
/Georgia-Italic
/Georgia-BoldItalic
/GillSans-UltraBold
/GloucesterMT-ExtraCondensed
/GoudyOldStyleT-Regular
/GoudyOldStyleT-Bold
/GoudyOldStyleT-Italic
/Haettenschweiler
/Harrington
/Impact
/ImprintMT-Shadow
/KinoMT
/LucidaBlackletter
/LucidaBright
/LucidaBright-Demi
/LucidaBright-Italic
/LucidaBright-DemiItalic
/LucidaCalligraphy-Italic
/LucidaFax
/LucidaFax-Demi
/LucidaFax-Italic
/LucidaFax-DemiItalic
/LucidaHandwriting-Italic
/LucidaSans
/LucidaSans-Demi
/LucidaSans-Italic
/LucidaSans-DemiItalic
/LucidaSans-Typewriter
/LucidaSans-TypewriterBold
/LucidaSans-TypewriterOblique
/LucidaSans-TypewriterBoldOblique
/MaturaMTScriptCapitals
/Mistral
/Modern-Regular
/MonotypeCorsiva
/MonotypeSorts
/MT-Extra
/NewsGothicMT
/NewsGothicMT-Bold
/NewsGothicMT-Italic
/Onyx
/PerpetuaTitlingMT-Light
/PerpetuaTitlingMT-Bold
/Playbill
/Rockwell
/Rockwell-Bold
/Rockwell-Italic
/Rockwell-BoldItalic
/Rockwell-ExtraBold
/Stencil
/Tahoma
/Tahoma-Bold
/TrebuchetMS
/TrebuchetMS-Bold
/TrebuchetMS-Italic
/Trebuchet-BoldItalic
/LatinWide
/Courier
/NotDefFont
(py3.5) ➜ pdftabextract git:(master) ✗ pdffonts 11-reprint-osx.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
(py3.5) ➜ pdftabextract git:(master) ✗ pdftotext 11-reprint-osx.pdf
(py3.5) ➜ pdftabextract git:(master) ✗ pstopdf
Usage: pstopdf [inputfile] [-o outname] [-l] [-p] [-i]
Try: man pstopdf
(py3.5) ➜ pdftabextract git:(master) ✗ man pstopdf
(py3.5) ➜ pdftabextract git:(master) ✗ gs -dBATCH -dNOPAUSE -dSAFER \
-dEmbedAllFonts -dSubsetFonts=true -dMaxSubsetPct=99 \
-dAutoFilterMonoImages=false \
-dAutoFilterGrayImages=false \
-dAutoFilterColorImages=false \
-dDownsampleColorImages=false \
-dDownsampleGrayImages=false \
-dDownsampleMonoImages=false \
-sDEVICE=pdfwrite \
-dFirstPage=3 -dLastPage=3 \
-sOutputFile=mypg3out-111.pdf -f 111.pdf
GPL Ghostscript 9.21 (2017-03-16)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Requested FirstPage is greater than the number of pages in the file: 1
No pages will be processed (FirstPage > LastPage).
```
I used this site…
https://www.zamzar.com/convert/pdf-to-html/
…which worked amazingly well considering the size
of the PDF file - 34.3MB
coothead
1 Like
toolman
November 5, 2018, 10:50pm
20
Hi,
Thank you everyone for your help on this, it has been a real help and very much appreciated
2 Likes
system
Closed
February 5, 2019, 5:50am
21
This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.