HowTo: Split PDF File – Linux Command Line

Sometimes it is required to extract some pages from a PDF file and save them as another PDF document.

In Linux we can easily split PDF documents by pages using the command line utility called pdftk.

From this article you will learn how to extract individual pages or a range of pages from a PDF file and save them as another PDF document.

Cool Tip: Plan to send this PDF somewhere or just keep? How about to protect it with a password? This is really easy for ones who split PDF files from the command line! Read more →

First of all it is required to install the pdftk utility:

$ sudo apt-get install pdftk

Split PDF File

Extract the 5th page from the ORIG_FILE.pdf and save it to the NEW_FILE.pdf:

$ pdftk ORIG_FILE.pdf cat 5 output NEW_FILE.pdf

Extract several individual pages:

$ pdftk ORIG_FILE.pdf cat 1 4 6 output NEW_FILE.pdf

Cool Tip: Merge PDF files in Linux using the ghostscript command! Read more →

Extract a range of pages:

$ pdftk ORIG_FILE.pdf cat 1-5 output NEW_FILE.pdf

Extract the combination of individual pages and a range of pages:

$ pdftk ORIG_FILE.pdf cat 1 5 7 10-12 output NEW_FILE.pdf

4 Replies to “HowTo: Split PDF File – Linux Command Line”

  1. Владимир says: Reply

    Спасибо!

  2. Michael Shinas says: Reply

    Thanks for your excellent tip. I managed to change one of the commands a bit and thought it would be useful to share it here.
    My requirement was to extract all pages of a pdf file to seperate pdf files, one per page. This can be achieved as follows (The `seq -w 10` gives a sequence from 1 to 10 with leading zeros, replace 10 with the number of pages in the scan.pdf file):
    for i in `seq -w 10`; do pdftk scan.pdf cat $i output scan_$i.pdf; done

    1. This is a great little script, and thanks for the heads up on the seq command.

      But use the burst option instead of cat and pdftk will do this for you.

      ” burst Splits a single input PDF document into individual pages. Also creates a report named doc_data.txt which is the same as the output from dump_data. If the output section is omitted, then PDF pages are named: pg_%04d.pdf, e.g.: pg_0001.pdf, pg_0002.pdf, etc.”

  3. You’re not really extracting, you’re copy pages from one pdf to another. There is a difference. Extract mean to remove. Also a strange behavior: Extract 1/2 A.pdf to B.pdf and 1/2 A.pdf to C.pdf you wind up with B+C>A.

    There’s got to be a better solution.

Leave a Reply