SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    SitePoint Member
    Join Date
    May 2010
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Need javascript code to identify non-searchable PDF

    Hi,

    Can someone give me a javascript code or macro that, when run on a PDF file, can identify whether it is a searchable or non-searchable PDF?

  2. #2
    SitePoint Member KittyZheng's Avatar
    Join Date
    May 2010
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PDF documents do not have an attribute that prevents them from being searched. However the content is often encoded or encrypted in such a way that would make plain text searches impossible.

    You will find the easiest way to decode a document is to use a third-party PDF component. I use ABCpdf from webSupergoo, which supports CSharp, VB.NET, VBScript, ASP, ASP.NET. It also provides a COM interface for interoperation with other languages.

    Given the complexity of PDF files it is very unlikely you will find JavaScript code that'll do this for you.

    The following VBScript example shows how to decompress content streams in a PDF using ABCpdf. Copy the code into a text file and change the file extension from '.txt' to '.vbs'

    Code:
    theFile = WScript.Arguments.Item(0)
    
    Set theDoc = CreateObject("ABCpdf7.Doc")
    theDoc.Read theFile
    theCount = theDoc.GetInfo(0, "Count")
    For i = 1 to theCount
      theDoc.GetInfo i, "Decompress"
    Next
    theDoc.SaveOptions.Linearize = false
    theDoc.Save theFile & "_dec.pdf"
    
    MsgBox "Done"
    Simply drop your PDF file on to the VBScript file to decompress.

    Be aware that text in PDF files is typically broken into short arbitary fragments and might require additional work to reconstruct. ABCpdf has a GetText function that simplifies this task.

    The contents of a PDF document can also be protected by encryption. ABCpdf supports a number of encryption standards and can decrypt these files for you, if you know the password. See the documentation on the Encryption object for further details.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •