Remove all Images from a Pdf in Python

I had a PDF file with certain images that I wanted to remove and keep only the text portion. There was no program that could do this for me quickly, so I wrote a Python script to do just that, using the awesome pikepdf.

First Install pikepdf using pip

pip install pikepdf

And then run the following code.Using the examples on the pikepdf documentation page, I was able to write this code in less than 5 minutes.

from pikepdf import Pdf, PdfImage, Name

# define a function that takes the page number as argument 
def remove_image(page) :
    image_name, image = next(iter(page.images.items()))
    new_image = example.make_stream(b'\xff')
    new_image.Width, new_image.Height = 1, 1
    new_image.BitsPerComponent = 1
    new_image.ImageMask = True
    new_image.Decode = [0, 1]
    page.Resources.XObject[image_name] = new_image

# open your pdf, src.pdf here 
example = Pdf.open('src.pdf')

# iterate through each page.
for page in example.pages :
    remove_image(page)

# finally save your pdf     
example.save('destination.pdf')

Open your destination.pdf and verify if all the images are indeed removed.

Leave a Comment

Your email address will not be published.