I had a PDF file with certain images that I wanted to remove and keep only the text portion. There was no program that could do this for me quickly, so I wrote a Python script to do just that, using the awesome pikepdf.
First Install pikepdf using pip
pip install pikepdf
And then run the following code.Using the examples on the pikepdf documentation page, I was able to write this code in less than 5 minutes.
from pikepdf import Pdf, PdfImage, Name # define a function that takes the page number as argument def remove_image(page) : image_name, image = next(iter(page.images.items())) new_image = example.make_stream(b'\xff') new_image.Width, new_image.Height = 1, 1 new_image.BitsPerComponent = 1 new_image.ImageMask = True new_image.Decode = [0, 1] page.Resources.XObject[image_name] = new_image # open your pdf, src.pdf here example = Pdf.open('src.pdf') # iterate through each page. for page in example.pages : remove_image(page) # finally save your pdf example.save('destination.pdf')
Open your destination.pdf and verify if all the images are indeed removed.