Trouble fetching checkbox and radio fields with PyPDF2

prahladyeri · August 23, 2022, 6:31am

My project involves reading text from a bunch of PDF form files for which I’m using PyPDF2 open source library. There is no issue in getting the text data as follows:

reader = PdfReader("data/test.pdf")
cnt = len(reader.pages)
print("reading pdf (%d pages)" % cnt)
page = reader.pages[cnt-1]
lines = page.extract_text().splitlines()
print("%d lines extracted..." % len(lines))

However, this text doesn’t contain the checked statuses of the radio and checkboxes. I just get normal text (like “Yes No” for example) instead of these values.

I also tried the reader.get_fields() and reader.get_form_text_fields() methods as described in their documentation but they return empty values. I also tried reading it through annotations but no "/Annots" found on the page. When I open the PDF in a notepad++ to see its meta data, this is what I get:

%PDF-1.4
%²³´µ
%Generated by ExpertPdf v9.2.2

It appears to me that these checkboxes aren’t usual form fields used in PDF but appear similar to HTML elements. Is there any way to extract these fields using python?

system · November 22, 2022, 1:31pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PDFLib get trimbox values PHP	1	790	October 20, 2016
Check box in Pdf in PHP using fpdf PHP	4	11208	October 8, 2014
Upload a PDF file and render its field values into the same web page JavaScript jquery	2	1614	July 2, 2022
How to add HTML checkbox in TCPDF PHP	4	21943	October 8, 2014
Using PDFBox to Extract OCR Text from PDFs in .NET .NET	1	2098	December 14, 2011

Trouble fetching checkbox and radio fields with PyPDF2

Related topics