Sometimes it is required to extract some pages from a PDF file and save them as another PDF document.
In Linux we can easily split PDF documents by pages using the command line utility called pdftk
.
From this article you will learn how to extract individual pages or a range of pages from a PDF file and save them as another PDF document.
Cool Tip: Plan to send this PDF somewhere or just keep? How about to protect it with a password? This is really easy for ones who split PDF files from the command line! Read more →
First of all it is required to install the pdftk
utility:
$ sudo apt-get install pdftk
Split PDF File
Extract the 5th page from the ORIG_FILE.pdf
and save it to the NEW_FILE.pdf
:
$ pdftk ORIG_FILE.pdf cat 5 output NEW_FILE.pdf
Extract several individual pages:
$ pdftk ORIG_FILE.pdf cat 1 4 6 output NEW_FILE.pdf
Cool Tip: Merge PDF files in Linux using the ghostscript
command! Read more →
Extract a range of pages:
$ pdftk ORIG_FILE.pdf cat 1-5 output NEW_FILE.pdf
Extract the combination of individual pages and a range of pages:
$ pdftk ORIG_FILE.pdf cat 1 5 7 10-12 output NEW_FILE.pdf
Спасибо!
Thanks for your excellent tip. I managed to change one of the commands a bit and thought it would be useful to share it here.
My requirement was to extract all pages of a pdf file to seperate pdf files, one per page. This can be achieved as follows (The `seq -w 10` gives a sequence from 1 to 10 with leading zeros, replace 10 with the number of pages in the scan.pdf file):
for i in `seq -w 10`; do pdftk scan.pdf cat $i output scan_$i.pdf; done
This is a great little script, and thanks for the heads up on the seq command.
But use the burst option instead of cat and pdftk will do this for you.
” burst Splits a single input PDF document into individual pages. Also creates a report named doc_data.txt which is the same as the output from dump_data. If the output section is omitted, then PDF pages are named: pg_%04d.pdf, e.g.: pg_0001.pdf, pg_0002.pdf, etc.”
You’re not really extracting, you’re copy pages from one pdf to another. There is a difference. Extract mean to remove. Also a strange behavior: Extract 1/2 A.pdf to B.pdf and 1/2 A.pdf to C.pdf you wind up with B+C>A.
There’s got to be a better solution.