Searching & Converting PDF, DJVU documents using Thunar

Here is where you can post tips and tricks to share with other users of MX. Do not ask for help in this Forum.
Post Reply
Message
Author
User avatar
manyroads
Posts: 2624
Joined: Sat Jun 30, 2018 6:33 pm

Searching & Converting PDF, DJVU documents using Thunar

#1 Post by manyroads »

  • Have you found a PDF file that is not digitally searchable, and you really wish it was?
  • Do you wish you could easily convert monstrously huge pdf files to more compact djvu files, without a significant loss in functionality?
I have built and published a Tutorial on how-to accomplish these tasks using MXLinux and Thunar Custom Actions. (If the devs want a copy of my tutorial, let me know and I'll happily donate it to the "cause".)

Here's the tutorial (on my eirenicon llc site):

http://eirenicon.org/2019/01/31/searchi ... documents/

Once the above tutorial functions are implemented, you will be able to:
  • Convert PDF documents To DJVU via Thunar Custom Action. (Note: The resultant DJVU documents will be searchable even if the original PDF was not... this only works for English, so far as I know.)
  • Make a non-searchable PDF 'searchable" via Thunar Custom Action. (Note: This only works for English, so far as I know.)
Pax vobiscum,
Mark Rabideau - ManyRoads Genealogy -or- eirenicon llc. (geeky stuff)
i3wm, bspwm, hlwm, dwm, spectrwm ~ Linux #449130
"For every complex problem there is an answer that is clear, simple, and wrong." -- H. L. Mencken

User avatar
Antediluvian
Posts: 304
Joined: Sun May 20, 2018 7:42 pm

Re: Searching & Converting PDF, DJVU documents using Thunar

#2 Post by Antediluvian »

Thank you very much, manyroads, for the useful, detailed and clear tutorial!
I have a few questions/observations:

1. I used MXPI to download pdfsandwich and pdf2djvu together. The terminal responded with

Code: Select all

Suggested packages:
  graphicsmagick-dbg
Recommended packages:
  edisplay
Should I install those? (I have not.)

2. The two apps do not appear in the Whisker Menu, which I suppose is for the better.

3. pdfsandwich converted a color pdf to greyscale. It also very slightly degraded (rasterized) the text & images. (not a complaint, just an observation)

4. Interestingly, after OCR conversion of a service manual I could search & find words but not numbers (typically 6 digits).

5. When pdf2djvu was applied to a 28 page color pdf file before OCR conversion it reduced the size from 5.4 to 3.5 MB. However, when the same 5.4 MB file was first converted to OCR (size now 3.6 MB and in greyscale) and then converted to .djvu the file size was 11.9 MB. Strange?

User avatar
manyroads
Posts: 2624
Joined: Sat Jun 30, 2018 6:33 pm

Re: Searching & Converting PDF, DJVU documents using Thunar

#3 Post by manyroads »

Antediluvian wrote: Thu Jan 31, 2019 9:51 pm Thank you very much, manyroads, for the useful, detailed and clear tutorial!
I have a few questions/observations:

1. I used MXPI to download pdfsandwich and pdf2djvu together. The terminal responded with

Code: Select all

Suggested packages:
  graphicsmagick-dbg
Recommended packages:
  edisplay
Should I install those? (I have not.)

2. The two apps do not appear in the Whisker Menu, which I suppose is for the better.

3. pdfsandwich converted a color pdf to greyscale. It also very slightly degraded (rasterized) the text & images. (not a complaint, just an observation)

4. Interestingly, after OCR conversion of a service manual I could search & find words but not numbers (typically 6 digits).

5. When pdf2djvu was applied to a 28 page color pdf file before OCR conversion it reduced the size from 5.4 to 3.5 MB. However, when the same 5.4 MB file was first converted to OCR (size now 3.6 MB and in greyscale) and then converted to .djvu the file size was 11.9 MB. Strange?
I'm glad this helped... :happy:
  • I do not have either edisplay, graphicsmagick-dbg installed. (I guess that means we don't "need" them.)
Here is what synaptic says about edisplay: ExactImage is a fast C++ image processing library. Unlike many other library
Here is what synaptic says about GraphicsMagick- it provides libraries in several programming languages to read, write and manipulate image files across a large number of formats, from the widely used jpeg, tiff, bmp or xpm to special-purpose formats such as fits or image formats found on some photo CDs. There are functions for finegrained image processing tasks, as well as conversion routines between the various image formats.

The GraphicsMagick library is a fork of ImageMagick and therefore offers an interface that is similar in features, but intended to be more stable across releases. While compatibility does not go so far that the GraphicsMagick library serves as a drop-in replacement for ImageMagick, conversion can usually be done with little effort. frameworks it allows operation in several color spaces and bit depths natively, resulting in low memory and computational requirements.
  • Because the apps are terminal base they do not show up like an x-based app on whiskermenu.
  • pdfsandwich color to grayscale conversion probably can be fixed with a script modification. Or perhaps edisplay might help.
  • OCR conversions are always problemtic, be certain you have all the correct tesseract languages and features you want installed. You can do that in synaptic.
Pax vobiscum,
Mark Rabideau - ManyRoads Genealogy -or- eirenicon llc. (geeky stuff)
i3wm, bspwm, hlwm, dwm, spectrwm ~ Linux #449130
"For every complex problem there is an answer that is clear, simple, and wrong." -- H. L. Mencken

User avatar
Antediluvian
Posts: 304
Joined: Sun May 20, 2018 7:42 pm

Re: Searching & Converting PDF, DJVU documents using Thunar

#4 Post by Antediluvian »

Thanks for the tips, manyroads!

Thunar's "Configure custom actions..." is new to me. In your tutorial is there a reason why you didn't set Appearance Conditions to the following to limit the appearance of the action to pdf files?
Screenshot.png
You do not have the required permissions to view the files attached to this post.

User avatar
manyroads
Posts: 2624
Joined: Sat Jun 30, 2018 6:33 pm

Re: Searching & Converting PDF, DJVU documents using Thunar

#5 Post by manyroads »

Antediluvian wrote: Sat Feb 02, 2019 7:09 pm Thanks for the tips, manyroads!

Thunar's "Configure custom actions..." is new to me. In your tutorial is there a reason why you didn't set Appearance Conditions to the following to limit the appearance of the action to pdf files?
Screenshot.png
There is an option for text but pdf is actually an image (package). You can play with the appearance conditions so they fit your need(s). I am about to offer some jpg to png and png to jpg options that only appear with images (but that means the options only appear when you right click on the image itself).
Pax vobiscum,
Mark Rabideau - ManyRoads Genealogy -or- eirenicon llc. (geeky stuff)
i3wm, bspwm, hlwm, dwm, spectrwm ~ Linux #449130
"For every complex problem there is an answer that is clear, simple, and wrong." -- H. L. Mencken

User avatar
manyroads
Posts: 2624
Joined: Sat Jun 30, 2018 6:33 pm

Re: Searching & Converting PDF, DJVU documents using Thunar

#6 Post by manyroads »

Antediluvian wrote: Thu Jan 31, 2019 9:51 pm Thank you very much, manyroads, for the useful, detailed and clear tutorial!
[...]
3. pdfsandwich converted a color pdf to greyscale. It also very slightly degraded (rasterized) the text & images. (not a complaint, just an observation)
[...]
You need the following version pf pdfsandwich for things to work correctly. see: viewtopic.php?f=134&t=48372
Pax vobiscum,
Mark Rabideau - ManyRoads Genealogy -or- eirenicon llc. (geeky stuff)
i3wm, bspwm, hlwm, dwm, spectrwm ~ Linux #449130
"For every complex problem there is an answer that is clear, simple, and wrong." -- H. L. Mencken

Post Reply

Return to “Tips & Tricks by users”