How to Extract Images from PDF - A Complete Tool-by-Tool Guide

2026-03-31 · About 9 min read

Understanding Image Structure in PDFs

To correctly extract images from PDFs, you first need to understand how images are stored within PDF files. A PDF isn't simply a collection of images - it's a composite document format integrating text, vector graphics, raster images, and fonts.

Image storage methods in PDF:

Embedded raster images: Stored compressed in formats like JPEG, JPEG2000, CCITT (FAX), Flate (PNG equivalent). Original image data exists directly within the PDF
Inline images: Small images embedded directly within content streams. Can be difficult to extract
Mask images: Images with transparency information. Body image and mask stored as separate objects
Form XObjects: Containers for images and graphics reused across multiple pages

Extraction considerations:

Display size and actual resolution may differ (high-res images displayed small)
A single visible image may comprise multiple objects (body + mask + color space definition)
Scanned PDFs store each entire page as a single image
PDF security settings (password protection, copy restriction) may prevent extraction

Method selection depends on whether you want to "extract original image data as-is" or "convert page appearance to images." The former is embedded image extraction; the latter is page rendering (rasterization).

Command-Line Tool Extraction

Command-line tools are ideal for batch extraction from large numbers of PDFs and integration into scripts.

pdfimages (Poppler utility):

The most reliable tool for extracting embedded images as-is (without recompression).

Install: macOS brew install poppler, Ubuntu apt install poppler-utils
Basic command: pdfimages -all document.pdf output_prefix
-all option: Extracts images in original format (JPEG, PNG, TIFF, etc.). Without this, images convert to PPM/PBM
-j option: Extracts JPEG images as JPEG (avoids reapplying lossy compression)
-f / -l options: Specify page range (e.g., -f 3 -l 7 for pages 3-7)
Image listing: pdfimages -list document.pdf displays embedded image info (size, color space, compression)

pdftoppm (full page rasterization):

Convert entire pages to images: pdftoppm -png -r 300 document.pdf output_prefix
-r 300: Output at 300 DPI (print quality)
Use for scanned PDFs or when you want to capture layout as images

Ghostscript:

High-quality page rendering: gs -dNOPAUSE -dBATCH -sDEVICE=png16m -r300 -sOutputFile=page_%03d.png document.pdf
Produces the most accurate output as a PDF rendering engine

Python-Based Image Extraction

Python enables customized extraction logic combined with post-processing (renaming, filtering, conversion) for flexible workflows.

PyMuPDF (fitz) - Recommended:

A fast, feature-rich PDF manipulation library that makes image extraction straightforward.

import fitz
doc = fitz.open("document.pdf")
for page_num in range(len(doc)):
page = doc[page_num]
images = page.get_images(full=True)
for img_index, img in enumerate(images):
xref = img[0]
base_image = doc.extract_image(xref)
image_bytes = base_image["image"]
image_ext = base_image["ext"]
with open(f"page{page_num+1}_img{img_index+1}.{image_ext}", "wb") as f:
f.write(image_bytes)

PyMuPDF advantages:

Extracts maintaining original image format (no recompression)
Can retrieve image resolution, color space, and size information
Handles mask image processing
Page rendering (rasterization) available in the same library

pdf2image (Poppler wrapper):

Convenient for converting entire pages to images. Internally calls pdftoppm.

from pdf2image import convert_from_path
images = convert_from_path("document.pdf", dpi=300)
for i, image in enumerate(images):
image.save(f"page_{i+1}.png", "PNG")

For PDFs with many pages, use first_page and last_page parameters to control memory usage.

GUI Tools and Online Services

For users unfamiliar with command-line or programming, here are GUI tools and online services. However, avoid online services for confidential documents - use local tools instead.

Desktop GUI tools:

Adobe Acrobat Pro: "Tools → Export PDF → Image" converts pages to images. Set resolution in "Edit → Preferences → Page Display." For individual images: right-click in Edit mode → "Save Image As"
PDF-XChange Editor (Windows): Free version supports image extraction. "Document → Export Images" for batch extraction
Preview (macOS): Open PDF, select page thumbnails in sidebar → drag and drop to save as images. Cannot extract individual embedded images
GIMP: When importing PDF, select pages and resolution. Loaded as layers

Online services (not recommended for confidential documents):

iLovePDF: Browser-based PDF image extraction. Free plan available
SmallPDF: PDF to image conversion. Simple drag-and-drop operation
PDF24 Tools: German free tool. Supports both image extraction and page rasterization

Online service precautions:

Uploaded PDFs are temporarily stored on servers. Never upload documents containing confidential information
Review terms of service regarding data handling
Verify when files are deleted from servers after processing
Use local tools whenever possible; limit online services to non-confidential documents

Maximizing Extracted Image Quality

Here are techniques for preserving maximum quality when extracting images from PDFs, plus solutions for common problems.

Quality preservation principles:

Avoid recompression: Extract JPEG-stored images as JPEG. Converting to PNG only increases file size without improving quality
Extract at original resolution: Use the embedded image's actual resolution, not its display size in the PDF. Verify with pdfimages -list
Maintain color space: Converting CMYK images to RGB changes colors. For print use, extract as CMYK

Common problems and solutions:

Split images: A single visible image may be stored as multiple tiles. Rendering the full page with PyMuPDF's page.get_pixmap() and cropping the needed area is reliable
Unapplied masks: Transparent images separated into body and mask. PyMuPDF's extract_image() auto-applies masks, but pdfimages may require manual compositing
Color differences: Embedded ICC profiles must be applied for correct colors. Check color space with fitz's base_image["colorspace"]
Rotation/transformation: Images rotated or transformed in the PDF are extracted in their pre-transformation state. Apply transformations in post-processing as needed

For scanned PDFs:

Since scanned PDFs store entire pages as single images, "image extraction" and "page rendering" are synonymous. Use pdftoppm -r 300 or PyMuPDF's page.get_pixmap(dpi=300) for high-resolution output.

Books on PDF workflows can be found on Amazon

Batch Processing and Automation Scripts

Here are batch processing scripts for extracting images from large numbers of PDF files, plus practical automation patterns.

Shell script batch processing:

#!/bin/bash
for pdf in *.pdf; do
dir="${pdf%.pdf}"
mkdir -p "$dir"
pdfimages -all "$pdf" "$dir/img"
done

This script creates a folder named after each PDF in the current directory and extracts images into it.

Advanced Python batch processing:

Customizations possible include:

Filtering images below minimum size (icons, decorations)
Standardizing filenames to "PDFname_pagenumber_sequence" format
Extracting only images above a specific resolution
Auto-converting to WebP after extraction to reduce file size
Generating CSV reports (list of extracted images, sizes, formats)

Practical use patterns:

Chart extraction from reports: Extract charts from internal report PDFs for reuse in presentations
Product image extraction from catalogs: Batch extract product images from catalog PDFs for e-commerce registration
Figure extraction from papers: Extract figures from academic paper PDFs for citation/reference organization
OCR preprocessing of scanned documents: Extract page images from scanned PDFs to feed to OCR engines

Important notes:

Copyright awareness: When extracting and reusing images from others' PDFs, verify copyright compliance
Password-protected PDFs: Use qpdf --decrypt to remove passwords before processing (only with legitimate authorization)
Large PDFs: Design page-by-page processing to prevent memory exhaustion

How to Extract Images from PDF - A Complete Tool-by-Tool Guide

Understanding Image Structure in PDFs

Command-Line Tool Extraction

Python-Based Image Extraction

GUI Tools and Online Services

Maximizing Extracted Image Quality

Batch Processing and Automation Scripts

Related Articles

Batch Image Processing Workflows - Designing and Implementing Efficient Bulk Processing

Image Format Comparison - JPEG/PNG/WebP/AVIF/GIF/BMP Features and Use Cases

Video Frame Extraction Techniques

Introduction to Steganography - Hiding Information Within Images

Web Image Performance Audit - Practical Guide to Core Web Vitals Improvement

Image Compression Guide for Email - Maintaining Quality Within Size Limits

Related Terms