Python

Search

# extract images from pdf file

# extract images from pdf file
import fitz
doc = fitz.open("file.pdf")
for i in range(len(doc)):
    for img in doc.getPageImageList(i):
        xref = img[0]
        pix = fitz.Pixmap(doc, xref)
        if pix.n < 5:       # this is GRAY or RGB
            pix.writePNG("p%s-%s.png" % (i, xref))
        else:               # CMYK: convert to RGB first
            pix1 = fitz.Pixmap(fitz.csRGB, pix)
            pix1.writePNG("p%s-%s.png" % (i, xref))
            pix1 = None
        pix = None

Comment

extract images from pdf

# STEP 1
# import libraries
import fitz
import io
from PIL import Image
  
# STEP 2
# file path you want to extract images from
file = "/content/pdf_file.pdf"
  
# open the file
pdf_file = fitz.open(file)
  
# STEP 3
# iterate over PDF pages
for page_index in range(len(pdf_file)):
    
    # get the page itself
    page = pdf_file[page_index]
    image_list = page.getImageList()
      
    # printing number of images found in this page
    if image_list:
        print(f"[+] Found a total of {len(image_list)} images in page {page_index}")
    else:
        print("[!] No images found on page", page_index)
    for image_index, img in enumerate(page.getImageList(), start=1):
        
        # get the XREF of the image
        xref = img[0]
          
        # extract the image bytes
        base_image = pdf_file.extractImage(xref)
        image_bytes = base_image["image"]
          
        # get the image extension
        image_ext = base_image["ext"]

Comment

PREVIOUS	NEXT

Code Example
Python :: dict comprehension python
Python :: python find minimum date in list
Python :: python code for create diamond shape with integer
Python :: python check if false in dict
Python :: python how to iterate through a list of lists
Python :: how to specify root geometry in tkinter
Python :: python requests insecure request warning
Python :: slicing in python list
Python :: insert an element in list python
Python :: how to find highest number in list python
Python :: Python NumPy array_split Function Syntax
Python :: max python
Python :: python binary float
Python :: python increase one item in list
Python :: mongodb and python
Python :: python cv2 how to update image
Python :: nltk python how to tokenize text
Python :: python pass arguments in command line
Python :: ValueError: only one element tensors can be converted to Python scalars
Python :: How to Get the length of all items in a list of lists in Python
Python :: xlabel not showing matplotlib
Python :: random.randint(0,20) + pyrthon
Python :: python tutorial
Python :: add title to tkinter window python
Python :: How to shift non nan values up and put nan values down
Python :: pyqt math
Python :: conv2d default stride
Python :: python tkinter programming project ideas
Python :: django prevent duplicate entries
Python :: python index for all matches

Search

PYTHON

# extract images from pdf file

extract images from pdf

ADD CONTENT