Python Guide to Extract Text from PDFs π
Learn to extract text from PDFs, including digital and scanned files, using Python and PyPDF2 in this tutorial.

Adrian Dolinay
3.2K views β’ Apr 17, 2023

About this video
Tutorial on how to extract text from PDF files. Learn the difference between natively digital and scanned PDFs, extract text from a digital PDF using PyPDF2 and extract text from a scanned PDF using optical character recognition with pytesseract.
Tesseract executable download for Windows: https://github.com/UB-Mannheim/tesseract/wiki
Tesseract Installation for Linux: https://linuxhint.com/install-tesseract-ocr-linux/
Tesseract Installation for Mac: https://www.oreilly.com/library/view/building-computer-vision/9781838644673/95de5b35-436b-4668-8ca2-44970a6e2924.xhtml
The notebook can be found in the "Data Science with Python" folder within the below repo. GitHub Repo - https://github.com/ad17171717/YouTube-Tutorials/tree/main/Python/Extract%20Text%20from%20PDF
CONNECT:
LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/
GitHub: https://github.com/ad17171717
Twitter: https://twitter.com/DolinayG
Odysee: https://odysee.com/@adriandolinay:0
Medium: https://medium.com/@adriandolinay
|-Video Chapters-|
0:00 - Intro
0:10 - Installing packages
1:41 - Text extraction definition
2:21 - Extracting text from a natively digital PDF
4:44 - Extracting text from a scanned PDF using OCR
8:35 - References and additional learning
Tesseract executable download for Windows: https://github.com/UB-Mannheim/tesseract/wiki
Tesseract Installation for Linux: https://linuxhint.com/install-tesseract-ocr-linux/
Tesseract Installation for Mac: https://www.oreilly.com/library/view/building-computer-vision/9781838644673/95de5b35-436b-4668-8ca2-44970a6e2924.xhtml
The notebook can be found in the "Data Science with Python" folder within the below repo. GitHub Repo - https://github.com/ad17171717/YouTube-Tutorials/tree/main/Python/Extract%20Text%20from%20PDF
CONNECT:
LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/
GitHub: https://github.com/ad17171717
Twitter: https://twitter.com/DolinayG
Odysee: https://odysee.com/@adriandolinay:0
Medium: https://medium.com/@adriandolinay
|-Video Chapters-|
0:00 - Intro
0:10 - Installing packages
1:41 - Text extraction definition
2:21 - Extracting text from a natively digital PDF
4:44 - Extracting text from a scanned PDF using OCR
8:35 - References and additional learning
Tags and Topics
Browse our collection to discover more content in these categories.
Video Information
Views
3.2K
Likes
33
Duration
9:10
Published
Apr 17, 2023
User Reviews
4.3
(3) Related Trending Topics
LIVE TRENDSRelated trending topics. Click any trend to explore more videos.