Read PDF with Python? | SoloLearn: Learn to code for FREE!

+4

Read PDF with Python?

Is it possible to read/write pdf with Python?

7/31/2020 7:04:09 PM

Fu Foy

11 Answers

New Answer

+10

oh... find pip in your python installation. that dir should be added to path variables (windows) pip install <your module> It is essential for pythonistas to know how.

+8

Fu Foy I didn't want to serve the complete answer and hoped you are curious. Worked as designed🏆 you already used pip?

+7

more or less. PDF is hardcore. ask pypi for pdf library.

+7

Fu Foy You can alternatively use PyPDF2 as described in https://automatetheboringstuff.com/2e/chapter15/ . But you also have to install PyPDF2 via pip, how Oma Falk explained.

+6

https://pypi.org/search/?q=Pdf Thank you. I started to learn python a few days ago. I didn't even know, that pypi exists. :)

+4

Thanks to both of you. Your advices are very helpful to me. Best community ever 👍🏼

+4

I found out with your help, that pip and PyPDF2 exists, what it is and how to install it. Now I can handle .pdf! Thanks again! For the other beginners: "pip install..." in windows console (I entered it in Python. Also "python --version" and I was wondering: Python doesn't know python??)

+2

Hello yes you can work with pdf document in python Python can read PDF files and print out the content after extracting the text from it. For that, we have to first install the required module which is PyPDF2. Below is the command to install the module. You should have pip already installed in your python environment. pip install pypdf2 On the successful installation of this module we can read PDF files using the methods available in the module. Reading Single Page import PyPDF2 pdfName = 'path\xyz.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) page = read_pdf.getPage(0) page_content = page.extractText() print page_content When we run the above program, we get the output Reading Multiple Pages To read a pdf with multiple pages and print each of the page with a page number we use a loop with getPageNumber() function. In the below example we the PDF file which has two pages. The contents are printed under two separate page headings. import PyPDF2 pdfName = 'Path\xyz2.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) for i in xrange(read_pdf.getNumPages()): page = read_pdf.getPage(i) print 'Page No - ' + str(1+read_pdf.getPageNumber(page)) page_content = page.extractText() print page_content More information follow the below link https://www.geeksforgeeks.org/working-with-pdf-files-in-python/

+1

No. I'm just about to find out, how to install/add that :)

+1

yee, you definitely can. you can use PyPdf2 to read files. for example: import PyPDF2 pdfFileObj = open('example.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) print(pdfReader.numPages) pageObj = pdfReader.getPage(0) print(pageObj.extractText()) pdfFileObj.close() (code source: geeksforgeeks) as for writing pdfs, you can use fpdf for example: from fpdf import FPDF pdf = FPDF() pdf.add_page() pdf.set_font("Arial", size = 15) pdf.cell(200, 10, txt = "GeeksforGeeks", ln = 1, align = 'C') # add another cell pdf.cell(200, 10, txt = "A Computer Science portal for geeks.", ln = 2, align = 'C') (code source: again geeksforgeeks) happy coding!

+1

There is a module which you can learn about here: https://medium.com/@umerfarooq_26378/python-for-pdf-ef0fac2808b0