I'm trying to extract the text included in this PDF file using Python
.
I'm using the PyPDF2 package (version 1.27.2), and have the following script:
import PyPDF2with open("sample.pdf", "rb") as pdf_file: read_pdf = PyPDF2.PdfFileReader(pdf_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.pages[0] page_content = page.extractText()print(page_content)
When I run the code, I get the following output which is different from that included in the PDF document:
! " # $ % # $ % &% $ &' ( ) * % + , - % . / 0 1 ' * 2 3% 45' % 1 $ # 2 6 % 3/ % 7 / ) ) / 8 % &) / 2 6 % 8 # 3" % 3" * % 31 3/ 9 # &)%
How can I extract the text as is in the PDF document?