Getting Things Done – Extract Pages from Large PDF

By Martin Jansen, Owner of Jansen-PCINFO

In Business as well at home, sometimes it is necessary to extract pages from a large PDF.  During Choir as well as Cantor Masses at Church I organize my music by extracting songs from a large 1280 page fully indexed PDF which amounts to 45.5 MB of space.  For reference, the Choral Praise book just about equals the number of pages in “War and Peace” by Leo Tolstoy:

In Business

My last job in business involved the task of putting together sales packets from disparate sources. Previously this had all been done at the printer, copying the individual pages, laying them out and making copies of the copies.  As we know, making copies of copies, no matter how good the copier, often results in poor quality prints.  I found it’s much easier to create the packet electronically and then send it all to the printer at once.  This, too, involved extracting pages from PDFs and collating the files into a single PDF.

Choral and Cantor Prep

While I won’t see Carnegie Hall anytime soon, I still need to practice my music.  Rather than shuffling through the many pages I create a list of songs in small PDFs.

Our Music Ministry ladies send out a helpful seasonal Music List.  It is a .DOCX file created in Microsoft Word and easily opened in LibreOffice Writer.  I see the song title and the page number from both the Breaking Bread and the CP3.  From there, I copy the July 3, 2022… text and create a new folder with the same name.  I then copy the ‘Lift Up Your Hearts’ text.

The new folder is top of the list when I sort by Date Modified in the Nemo File Manager.  Just about any file manager can create folders and sort.

I’m using the built-in PDF reader in Linux Mint called Xreader, but any good PDF reader can do the same functions described below.

I open the large PDF called cp3u and find an index of song numbers:

I click on the number from the Music List, 395 which brings me to page 616 in the PDF:

I click on the Print icon (or Ctrl+P) and use the Print to File with an output format of PDF.

I then click in the File: box to paste in the title of the song in the July 3… folder:

Since the Entrance song is two pages, I print pages 616-617 and a new small PDF is created.

I ran through this process three times more for the Presentation, Communion and Sending Forth songs.

Although this seems like a laborious process, it really takes only a few minutes to extract the files and create my music list for a Mass.

I add the cantor list, the Psalm, Gospel Acclamation and parts of the Mass and my song list is complete:

I did change to a dark theme while writing this article.  I number the songs to place them in order and put the Breaking Bread number (the number announced before the song is sung) as the second number divided by underscores.

A Word About Size

PDF files can vary greatly in size, that is, space taken up on the drive.  Most of the files that I create and scan equals between 40 and 100 kilobytes per page.  I have, however, seen PDFs that are 1 megabyte or larger per page.  Many times the reason is that the PDF contains images that are huge.  Control the size of the images and you control the size of the PDF.  A program like IrfanView can resize large images into something more manageable.

Preference For Installed Apps

At home as well as business I prefer to use installed apps as opposed to online websites like I Love PDF.  It just gives me pause to upload my proprietary information into a remote server somewhere.  Online websites work well, but I hesitate to use them due to privacy concerns.  One other app that I use frequently is PDF Arranger in Linux that allows me to extract, merge and split PDFs in a graphical environment.  In Windows, a far less costly PDF tool is PDFtk Pro, which I used to extract specific pages from a large PDF.  At $3.99 it is a bargain, but unfortunately the pay link to download the software is currently broken.

Working with PDFs is easy with the right tools.