JavaScript
Article
By Imran Latif

Custom PDF Rendering in JavaScript with Mozilla’s PDF.Js

By Imran Latif

This article was peer reviewed by Jani Hartikainen, Florian Rappl, Jezen Thomas and Jeff Smith. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!

When it comes to the Web, almost every modern browser supports viewing of PDF documents natively. But, that native component is outside of the developer’s control. Imagine that because of some business rule in your web app, you wanted to disable the Print button, or display only few pages while others require paid membership. You can use browser’s native PDF rendering capability by using the embed tag, but since you don’t have programmatic access you can’t control the rendering phase to suit your needs.

Luckily, there now exists such a tool, PDF.js, created by Mozilla Labs, which can render PDF documents in your browser. Most importantly, you as a developer have full control over rendering the PDF document’s pages as per your requirements. Isn’t this cool? Yes, it is!

Let’s see what PDF.js actually is.

What Is PDF.js

PDF.js is Portable Document Format (PDF) built around HTML5-based technologies, which means it can be used in modern browsers without installing any third-party plugins.

PDF.js is already in use at many different places including some online file sharing services like Dropbox, CloudUp, and Jumpshare to let users view PDF documents online without relying on browser’s native PDF rendering capability.

PDF.js is without any doubt an awesome and essential tool to have in your web app, but integrating it isn’t as straightforward as it might seem. There is little to no documentation available on how to integrate certain features like rendering text-layers or annotations (external/internal links), and supporting password protected files.

In this article, we will be exploring PDF.js, and looking at how we can integrate different features. Some of the topics which we will cover are:

  • Basic Integration
  • Rendering Using SVG
  • Rendering Text-Layers
  • Zooming in/Out

Basic Integration

Downloading the Necessary Files

PDF.js, as it’s name states is a JavaScript library which can be used in browser to render PDF documents. The first step is to fetch necessary JavaScript files required by PDF.js to work properly. Following are two main files required by PDF.js:

  • pdf.js
  • pdf.worker.js

To fetch aforementioned files, if you are a Node.js user, you can follow these steps as mentioned on the GitHub repo. After you are done with the gulp generic command, you will have those necessary files.

If, like me, you don’t feel comfortable with Node.js there is an easier way. You can use following URLs to download necessary files:

The above mentioned URLs point to Mozilla’s live demo of PDF.js. By downloading files this way, you will always have latest version of the library.

Web Workers and PDF.js

The two files you downloaded contain methods to fetch, parse and render a PDF document. pdf.js is the main library, which essentially has methods to fetch a PDF document from some URL. But parsing and rendering PDF is not a simple task. In fact, depending on the nature of the PDF, the parsing and rendering phases might take a bit longer which might result in the blocking of other JavaScript functions.

HTML5 introduced Web Workers, which are used to run code in a separate thread from that of browser’s JavaScript thread. PDF.js relies heavily on Web Workers to provide a performance boost by moving CPU-heavy operations, like parsing and rendering, off of the main thread. Running processing expensive code in Web Workers is the default in PDF.js but can be turned off if necessary.

Promises in PDF.js

The JavaScript API of PDF.js is quite elegant and easy to use and is heavily based on Promises. Every call to the API returns a Promise, which allows asynchronous operations to be handled cleanly.

Hello World!

Let’s integrate a simple ‘Hello World!’ PDF document. The document which we are using in this example can be found at http://mozilla.github.io/pdf.js/examples/learning/helloworld.pdf.

Create a project under your local web-server such that it can be accessed using http://localhost/pdfjs_learning/index.html. PDF.js makes Ajax calls to fetch documents in chunks, so in order to make the Ajax call work locally we need to place PDF.js files in a local web-server. After creating the pdfjs_learning folder on your local web-server, place the files (pdf.js, pdf.worker.js) in it that you downloaded above. Place the following code in index.html:

<!DOCTYPE html>
<html>
  <head>
    <title>PDF.js Learning</title>
  </head>
  <body>
    <script type="text/javascript" src="pdf.js"></script>
  </body>
</html>

As you can see, we’ve included a link to the main library file, pdf.js. PDF.js automatically detects whether your browser supports Web Workers, and if it does, it will attempt to load pdf.worker.js from the same location as pdf.js. If the file is in another location, you can configure it using PDFJS.workerSrc property right after including the main library:

<script type="text/javascript" src="pdf.js"></script>
<script type="text/javascript">
    PDFJS.workerSrc = "/path/to/pdf.worker.js";
</script>

If your browser doesn’t support Web Workers there’s no need to worry as pdf.js contains all the code necessary to parse and render PDF documents without using Web Workers, but depending on your PDF documents it might halt your main JavaScript execution thread.

Let’s write some code to render the ‘Hello World!’ PDF document. Place the following code in a script tag, below the pdf.js tag.

// URL of PDF document
var url = "http://mozilla.github.io/pdf.js/examples/learning/helloworld.pdf";

// Asynchronous download PDF
PDFJS.getDocument(url)
  .then(function(pdf) {
    return pdf.getPage(1);
  })
  .then(function(page) {
    // Set scale (zoom) level
    var scale = 1.5;

    // Get viewport (dimensions)
    var viewport = page.getViewport(scale);

    // Get canvas#the-canvas
    var canvas = document.getElementById('the-canvas');

    // Fetch canvas' 2d context
    var context = canvas.getContext('2d');

    // Set dimensions to Canvas
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    // Prepare object needed by render method
    var renderContext = {
      canvasContext: context,
      viewport: viewport
    };

    // Render PDF page
    page.render(renderContext);
  });

Now create a <canvas> element with an id the-canvas within body tag.

<canvas id="the-canvas"></canvas>

After creating the <canvas> element, refresh your browser and if you have placed everything at it’s proper place, you should see Hello, world! printed in your browser. But that’s not an ordinary Hello, world!. The Hello, world! you are seeing is basically an entire PDF document being rendered in your browser by using JavaScript code. Embrace the awesomeness!

Let’s discuss different parts of aforementioned code which made PDF document rendering possible.

PDFJS is a global object which you get when you include pdf.js file in browser. This object is the base object and contains various methods.

PDFJS.getDocument() is the main entry point and all other operations are performed within it. It is used to fetch the PDF document asynchronously, sending multiple Ajax requests to download document in chunks, which is not only fast but efficient as well. There are different parameters which can be passed to this method but the most important one is the URL pointing to a PDF document.

PDFJS.getDocument() returns a Promise which can be used to place code which will be executed when PDF.js is done fetching document. The success callback of the Promise is passed an object which contains information about fetched PDF document. In our example, this argument is named pdf.

You might be wondering if, since the PDF document is fetched in chunks, for documents that are huge in size the success callback would only be called after a delay of quite few seconds (or even minutes). In fact, the callback will fire as soon as the necessary bytes for first page have been fetched.

pdf.getPage() is used to get individual pages in a PDF document. When you provide a valid page number, getPage() returns a promise which, when resolved, gives us a page object that represents the requested page. The pdf object also has a property, numPages, which can be used to get total number of pages in PDF document.

scale is the zoom-level we want PDF document’s pages to render at.

page.getViewport() returns PDF document’s page dimensions for the provided zoom-level.

page.render() requires an object with different key/value pairs to render PDF page onto the Canvas. In our example, we have passed Canvas element’s 2d context and viewport object which we get from page.getViewport method.

Rendering Using SVG

PDF.js supports two modes of rendering. It’s default and popular mode of rendering is Canvas based. But it also allows you to render PDF documents using SVG. Let’s render the Hello World! PDF document from previous example in SVG.

Update success callback of pdf.getPage() with the following code to see PDF.js’ SVG rendering in action.

.then(function(page) {

  // Set scale (zoom) level
  var scale = 1.5;

  // Get viewport (dimensions)
  var viewport = page.getViewport(scale);

  // Get div#the-svg
  var container = document.getElementById('the-svg');

  // Set dimensions
  container.style.width = viewport.width + 'px';
  container.style.height = viewport.height + 'px';

  // SVG rendering by PDF.js
  page.getOperatorList()
    .then(function (opList) {
      var svgGfx = new PDFJS.SVGGraphics(page.commonObjs, page.objs);
      return svgGfx.getSVG(opList, viewport);
    })
    .then(function (svg) {
      container.appendChild(svg);
    });

});

Replace the <canvas> element in your body tag with <div id="the-svg"></div> and refresh your browser.

If you have placed the code correctly you will see Hello, world! being rendered, but this time it’s using SVG instead of Canvas. Go ahead and check the HTML of the page and you will see that the entire rendering has been done using standard SVG components.

As you can see, PDF.js doesn’t restrict you to a single rendering mechanism. You can either use Canvas or SVG rendering depending upon your requirements. For the rest of the article, we will be using Canvas-based rendering.

Rendering Text-Layers

PDF.js gives you the ability to render text layers atop PDF pages that have been rendered using Canvas. To do this, we need to fetch an additional JavaScript file from PDF.js GitHub’s repo. Go ahead and download the text_layer_builder.js plugin. We also need to fetch its corresponding CSS file, text_layer_builder.css. Download both files and place them in the pdfjs_learning folder on your local server.

Before we get into actual text-layer rendering, let’s get a PDF document with some more content than the ‘Hello World!’ example. The document which we are going to render is again taken from Mozilla’s live demo, here.

Since this document contains multiple pages, we need to adjust our code a bit. First, remove the <div> tag we created in the last example, and replace it with this:

<div id="container"></div>

This container will be used to hold multiple pages of PDF document. The structure for placing pages rendered as Canvas elements is quite simple. Within div#container each page of the PDF will have its own <div>. The id attribute of <div> will have the format page-#{pdf_page_number}. For example, the first page in a PDF document would have a <div> with id attribute set as page-1 and 12th page would have page-12. Inside each of these page-#{pdf_page_number} divs, there will be a Canvas element.

Let’s replace the success callback of getDocument() with the following code. Don’t forget to update the url variable with http://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf(or some other online PDF document of your choice).

PDFJS.getDocument(url)
  .then(function(pdf) {

    // Get div#container and cache it for later use
    var container = document.getElementById("container");

    // Loop from 1 to total_number_of_pages in PDF document
    for (var i = 1; i <= pdf.numPages; i++) {

        // Get desired page
        pdf.getPage(i).then(function(page) {

          var scale = 1.5;
          var viewport = page.getViewport(scale);
          var div = document.createElement("div");

          // Set id attribute with page-#{pdf_page_number} format
          div.setAttribute("id", "page-" + (page.pageIndex + 1));

          // This will keep positions of child elements as per our needs
          div.setAttribute("style", "position: relative");

          // Append div within div#container
          container.appendChild(div);

          // Create a new Canvas element
          var canvas = document.createElement("canvas");

          // Append Canvas within div#page-#{pdf_page_number}
          div.appendChild(canvas);

          var context = canvas.getContext('2d');
          canvas.height = viewport.height;
          canvas.width = viewport.width;

          var renderContext = {
            canvasContext: context,
            viewport: viewport
          };

          // Render PDF page
          page.render(renderContext);
        });
    }
});

Refresh your browser and wait for few seconds (while the new PDF document is fetched in background) and as soon as the document has finished loading you should see beautifully rendered PDF pages in your browser. Now we’ve seen how to render multiple pages, let’s discuss how to render the text-layers.

Add the following two lines to index.html to include the necessary files required for text-layer rendering:

<link type="text/css" href="text_layer_builder.css" rel="stylesheet">
<script type="text/javascript" src="text_layer_builder.js"></script>

PDF.js renders the text-layer above the Canvases within multiple <div> elements, so it’s better to wrap all those <div> elements within a container element. Replace page.render(renderContext) line with following code to see text-layers in action:

page.render(renderContext)
  .then(function() {
    // Get text-fragments
    return page.getTextContent();
  })
  .then(function(textContent) {
    // Create div which will hold text-fragments
    var textLayerDiv = document.createElement("div");

    // Set it's class to textLayer which have required CSS styles
    textLayerDiv.setAttribute("class", "textLayer");

    // Append newly created div in `div#page-#{pdf_page_number}`
    div.appendChild(textLayerDiv);

    // Create new instance of TextLayerBuilder class
    var textLayer = new TextLayerBuilder({
      textLayerDiv: textLayerDiv, 
      pageIndex: page.pageIndex,
      viewport: viewport
    });

    // Set text-fragments
    textLayer.setTextContent(textContent);

    // Render text-fragments
    textLayer.render();
  });

Refresh your browser and this time you will not only see PDF pages being rendered but you can also select and copy text from them. PDF.js is so cool!

Let’s discuss some important portions of above code snippet.

page.render(), as with any other method in PDF.js, returns a promise which is resolved when a PDF page has been successfully rendered onto the screen. We can use the success callback to render text-layers.

page.getTextContent() is a method which returns text fragments for that particular page. This returns a promise as well and in success callback of that promise text fragments representation is returned.

TextLayerBuilder is a class which requires some parameters we already have from pdf.getPage() for each page. The textLayerDiv parameter represents the <div> which will be used as a container for hosting multiple <div>s each representing some particular text fragment.

The newly created instance of TextLayerBuilder has two important methods: setTextContent(), which is used to set text fragments returned by page.getTextContent(), and render(), which is used to render text-layer.

As you can see we are assigning a CSS class textLayer to textLayerDiv. This class has styles which will make sure that the text fragments fit nicely atop the Canvas elements so that user can select/copy text in a natural way.

Zooming In/Out

With PDF.js you can also control the zooming of PDF document. In fact, zooming is quite straightforward and we just need to update the scale value. Increase or decrease scale with your desired factor to alter the zoom level. This is left as an exercise for the reader, but do try this out and let us know how you get on in the comments.

Conclusion

PDF.js is an awesome tool which provides us with a flexible alternative to the browsers’ native PDF components using JavaScript. The API is simple, precise and elegant and can be used as you see fit. Let me know in comments about how you are intending to use PDF.js in your next project!

More:
  • Steve Husting

    How is this different than linking to a PDF and showing it?

  • ElDerecho

    That is… neat.

  • I can see a nice use for this in e-learning applications. A lesson can be uploaded as a PDF, and then the lesson could be controlled using JS, keeping track of progress, etc. Very nice.

    • Imran Latif

      Hi Brian, thanks for liking the article. Yeah, there are multiple use-cases for PDF.js and the one you mentioned makes total sense. PDF.js would be more than ideal for such situation

  • Steve Husting

    So you are referring to the ability to set the zoom level programmatically, and overlay text over the PDF? This overlaying text would be like a watermark or something?

    • Imran Latif

      There are other use-cases as well like displaying only few pages of a PDF document while others are bound to a paid subscription etc. The ones you mentioned are quite valid as well.

      • أحمد صوالحة

        how to do that? how to limit number of pages? thanks

    • ashish yadav

      After using PDF.JS you get the ability to manipulate your PDF document and do things same as you do by using ADOBE ACROBAT READER.
      You can add any type of annotation on your PDF.

      very nice article Imran
      cheers

  • Thanks for the insight AGAIN! Am working with PDFCrowd API in current role/project. It is fine and everything works as expected with initial tests. Good to know their are other options available (client side).

    • Imran Latif

      Hi Steve. Thanks for liking the article. You are right, PDF.js is quite stable and powerful when it comes to rendering PDF files on client-side using JavaScript.

  • Hentry

    Is there a way to add images or annotation over the top of pdf in PDF js??

  • Laurens

    Hi,

    Very nice article, it comes in really helpfull and i’m using it now! I only have the problem that the page order isn’t good.

    It displays like:
    Page 1
    LAST PAGE
    Page 2
    Page 3
    Page 4
    Page 5

    Is this a known issue or is it just me that did something wrong?

    • Imran Latif

      HI Laurens, thanks for liking the article.

      PDF.js gives you the flexibility to render pages in whichever order you see fit. However, the issue which you have reported seems to be because of wrong placements of divs/canvases in your HTML.

      Thanks

    • Joe

      Try this

      PDFJS.disableRange = true;

      before:

      PDFJS.getDocument(url)
      .then(function(pdf) {

      • Jorge Pirela

        Enable worker too:

        PDFJS.disableWorker = false;
        PDFJS.disableRange = true;

        before:

        PDFJS.getDocument(url)
        .then(function(pdf) {

  • John Stevenson

    Thank you for such a great and useful article.

    • Imran Latif

      @disqus_nHHCkCclu2:disqus: Thanks for liking the article.

  • Imran Latif

    The error you pointed is happening because you might forgot to include `text_layer_builder.js` in your `head` tag.

    • xin

      I have included the js in head tag. However, error logged in the console.
      ReferenceError: TextLayerBuilder is not defined

      • Imran Latif

        Hi Xin,

        Can you please share your code so that I can look into it?

      • Thomas

        Hello Xin, I get the same error.
        How did you resolve it ?
        thank you

  • أحمد صوالحة

    Dir Sir,
    1- How can I define Arabic text, If not supported by the viewer? I mean , If the font is not supported by the viewer, how can I add the definition of the font.
    2- How can I highlight some text and make it as a link to display popup, I mean , I want to highlight some text by some color, say red, if the user clicked the highligthed text, a popup window appears with some other information about the highlighted text.
    3- how to hide the javascript from the end user, I need my customization to the viewer to be hidden from the end user
    thanks and best regards.

  • Jean Hernández

    Awesome, you explain functionalities better than documentation does. Please, do you have more info about pdf.js like “for dummies” manual or something like that?

    PD. sorry for my bad english

    • Imran Latif

      Hi Jean, thanks for liking the article. I’ am more than glad to help you out. Unfortunately, I didn’t have any other documentation/article on PDF.js although it’s my plan to write a more advanced level article to cover some advanced topics like rendering retina-ready output, rendering password protected files, handle printing support etc.

      I will keep you posted :-).

      P.S, you English is quite good.

      Thanks

      • Jean Hernández

        Printing support!, that is absolutely necessary!. Thanks

  • Mohammed Munawar

    Thank you for the useful PDF rendering methods. Kindly provide the rendering samples in zip to download and try.

  • Ganku Ganesh

    Is it possible to navigate the pdf using named destination or bookmark…

  • Sena Onsho

    Hi, is it possible to print a pdf version from IE, FF and Chrome instead of html file?

  • Madhawi Kumar

    Hi Imran, Thanks for writing nice article on PDF.js. Have you also written any article for PDF.js printing support. Please provide us the link.

    Thanks,
    Madhawi

    • Imran Latif

      Hi @madhawikumar:disqus,

      Thanks for liking the article. I really appreciate that. As far as writing article on PDF.js printing support is concerned, nope, I haven’t got a chance to write an article on this. If you like me to write an article on advanced usage of PDF.js, please do tweet SitePoint’s JavaScript channel so that we could take that idea one step forward.

      Thanks

  • Vaibhav Namburi

    Hey Imran,
    This is an amazing article, thank you so much for taking the time to write this – definitely saved me hours and hours of trying to make sense of how these things work, theres alot about making things happen online, very few explain what exactly is happening – so thanks again.
    I do have one question, we are now loading text – which is great, but could you advice me on how i can get search enabled ? Perhaps allow for a highlight of the words that match ?
    I’m using React.JS to make this and your example has worked well – just need this text search to happen
    Thank you again in advance, looking forward to your response.
    Regards,
    Vaibhav

    • Imran Latif

      Hi Vaibhav,

      Thanks for your comment and liking my article. I really appreciate your kind words.

      As far as searching thing is concerned, yes, we can do that. But what kind of search your are looking for?

      If you are talking about browser search then it is working fine as per my testing in Chrome. Since we are displaying text-layer atop canvases at the correct locations, browser search will work fine.

      If you are talking about implementing your own search control where user would search by typing something and you will be searching PDF for those words, then yes, it can be achieved as well. You would need to save text-layer in an array or some relevant data-structure and then you can do some regex to extract matching words.

      If you like to discuss this in more detail then please don’t hesitate to comment here or if you like then I would suggest that you could create a GitHub repo and create an issue and paste that issue’s link here and we can collaborate there.

      Thanks

      • Vaibhav Namburi

        Hey Imran,

        No worries, quality articles need the appreciation, especially such a popular library that has been so so poorly documented.

        Awesome, that is what I was thinking about, since the text layer exposes the words to be indexed by the native browser search as well, however for my client, they want an inline text search.

        I initially used your approach of drawing in a canvas along with react, however I had to backfall on the native approach they provide in the simpleviewer.html example where they use PDFJS.PDFViewer and pass the container div into the function.

        pdfViewer = new PDFJS.PDFViewer({
        container,
        });

        Here’s a link to the project i’m working on, https://vnamburi92@bitbucket(dot)org/vnamburi92/pdfviewer(dot)git, if you go to PDF.js you’ll notice that i’m using that above method to load the pdf, not page by page, which i can – if you look at Page.js – the reason i’m not doing so is because –
        a) i can’t employ their native PDFJS.PDFFindController if i do a page by page render using PDFJS.PDFPageView
        b) using new PDFJS.PDFViewer lets me easily control page scrolls, which my existing client requires.

        • Imran Latif

          Vaibhav,

          Thanks for your reply. ‘PDFJS.PDFViewer’ provides high-level API which is quite difficult to extend for custom solutions. I don’t know any event/hook which could be used to extend functionality when text-layers are drawn since we should have text-layers at our disposal for inline-searching.

          However, since you have its `.js` file, you can look for code where it is rendering text-layer and you can push those text-layers into a global array etc. I know this is kinda hack but this is inevitable because I think there might not be available some event/hook to access text-layers directly.

          Once you have text-layers, you can use regex to extract matches by looping over text-layers.

          The above solution is according to my current understanding of your requirement and there’s a fair chance that I might be wrong entirely :-).

          Thanks

  • Jvandemerwe

    Best article on the subject. Thank you very much. With this article I could make my PDFViewer Panel for ExtJS even better.

    • Imran Latif

      Hi @Jvandemerwe:disqus,

      Thanks for liking the article and I’ am really glad that you find it quite useful.

      Thanks.

  • Piyush Kumar

    How to ADD WATERMARK?? can you guide me?

  • Joe

    Just excellent! Thank you for writing this article. This helped me successfully get pdf.js working in a way that I can understand what is going on.

  • Pavan

    I am unable to find and highlight text using Ctrl + F
    with the above method of rendering whole PDF at once.

    Please help me to resolve this issue.

  • Gaurab

    hello sir i am new for pdf.js and i want to draw on pdf page and used various annotation on it so please can you help me regarding this.

  • pdf.js is not letting us to edit the acro form in the pdf file. How to do that? I want to fill the form and then download the updated pdf.

Recommended
Sponsors
Get the latest in JavaScript, once a week, for free.