In this article, we would cover how to convert PDF to text in Ubuntu. PDF stands for Portable Document Format. It would contain text, images and multimedia objects. So, if our PDF file contains images, multimedia objects and those objects have text in them. Then, what we are about to cover won't convert text embedded in images/multimedia objects.
pdftotext - is the command-line utility which is used to extract text from PDFs. It is available through package - poppler-utils. Therefore, we will first cover the installation steps for the package.
Note: Following operations would require you to have superuser privileges. In case you don't have one then, we advise you to contact your System Administrator for assistance.
Install poppler-utils in Ubuntu
Since, the package is already available through standard Ubuntu repository. Therefore, first update the repository to ensure we get to have the latest version of the package available. Hence, open a terminal and issue the following -
sudo apt update
Next, to install poppler-utils package -
sudo apt install poppler-utils
We can use pdftotext command-line utility now.
Convert PDF to text in Ubuntu
The following is the syntax for
pdftotext utility -
pdftotext [options] [PDF_file] [text_file]
Let's say we have a PDF file - test.pdf and resultant file as - out.txt
Now, if we want to convert all the pages of PDF file to text file then use the following code -
pdftotext test.pdf out.txt
We can also specify the first page to convert through -f option -
pdftotext -f 4 test.pdf out.txt
It would start to convert the PDF from fourth page till the last page.
Similarly, for last page to convert (-l option) -
pdftotext -l 3 test.pdf out.txt
It would convert the first three pages.
Furthermore, we can also use a combination -
pdftotext -f 2 -l 5 test.pdf out.txt
It will convert pages between 2 and 5.
Apart from that, we can also set the encoding (default value is UTF-8) for text file. This can be done through -
pdftotext -enc <encoding_name> test.pdf out.txt
In conclusion, we have covered how to convert PDF to text in Ubuntu here.