Python Guide to Extract Text from PDFs πŸ“„

Learn to extract text from PDFs, including digital and scanned files, using Python and PyPDF2 in this tutorial.

Python Guide to Extract Text from PDFs πŸ“„
Adrian Dolinay
3.2K views β€’ Apr 17, 2023
Python Guide to Extract Text from PDFs πŸ“„

About this video

Tutorial on how to extract text from PDF files. Learn the difference between natively digital and scanned PDFs, extract text from a digital PDF using PyPDF2 and extract text from a scanned PDF using optical character recognition with pytesseract.

Tesseract executable download for Windows: https://github.com/UB-Mannheim/tesseract/wiki
Tesseract Installation for Linux: https://linuxhint.com/install-tesseract-ocr-linux/
Tesseract Installation for Mac: https://www.oreilly.com/library/view/building-computer-vision/9781838644673/95de5b35-436b-4668-8ca2-44970a6e2924.xhtml

The notebook can be found in the "Data Science with Python" folder within the below repo. GitHub Repo - https://github.com/ad17171717/YouTube-Tutorials/tree/main/Python/Extract%20Text%20from%20PDF

CONNECT:
LinkedIn: https://www.linkedin.com/in/adrian-dolinay-frm-96a289106/
GitHub: https://github.com/ad17171717
Twitter: https://twitter.com/DolinayG
Odysee: https://odysee.com/@adriandolinay:0
Medium: https://medium.com/@adriandolinay

|-Video Chapters-|
0:00 - Intro
0:10 - Installing packages
1:41 - Text extraction definition
2:21 - Extracting text from a natively digital PDF
4:44 - Extracting text from a scanned PDF using OCR
8:35 - References and additional learning

Tags and Topics

Browse our collection to discover more content in these categories.

Video Information

Views

3.2K

Likes

33

Duration

9:10

Published

Apr 17, 2023

User Reviews

4.3
(3)
Rate:

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.