How to best digitalize (scan) documents - issues with different programs

Hello,

I need to digitalize a lot of documents at the moment. Therefor I am scanning those and save them as pdf-files. I tried several programs but I did not find any that is fully satisfying. I’ll share my experiences and ask for feedback on how best working on this and on the things, I did not fully understand.

My setup:
Manjaro Linux with Cinnamon desktop
Scanner: Canon Pixma MX925 Printer/Scanner: Having an ADF with double sided scan possible.
Driver package canon-pixma-mx920-complete via AUR installed.

Simple-scan: When using the ADF the scan is not centered. I always need to set the scan frame for the first page manually. All in all I get good results scaning in “Text” mode with a white background and the documents are readable in good quality. Files are very small (150dpi, ~28k per page).
Is there a way to save a kind of preset scan frame e.g. in a config file or so?
Saving the file always produces an error message “Cild-process coul not run (no such file or directory)”. The file gets saved anyway. So why the error?

Despite simple-scan all other programs will have a non-white page background when scanning with grey scale. And files are significantly bigger. But scanning in sketch-mode (b/w) makes text nearly unreadable or of very poor quality. I did not find out how simple-scan is post-processing the files to get those good results. I played around in other programs to find setting producing comparable results without success.

Skanlite:
Application reacts with some delay on user interaction. But generally it is working.
Scans in greyscale mode do not have white background. - in scet-mode with different thresholds no satisfying result and still quite big files.

Skanpage:
Also huge delay and saving does not work at all.

XSane:
Works well, but same issue having greyscale with not white bg or sketch mode with poor quality.

gscan2pdf:
Here I can set a whitepoint and then the background is nearly white. but behind text areas it remains grey.

So how whould you propose to efficiently digitalize documents? Which settings and workflows would you recommend?

With XSane, I presume you’ve tried adjusting the “dynamic range” using the Histogram? It’s what I do when scanning documents, to avoid that grey background.

My scanner BTW: Mustek BearPaw 2400CU+

I’ve largely stuck with XSane so can’t really comment on the other programs. (I do have Skanlite installed. I’ll try that one a bit later this evening).

i use a very small program called simple-scan. Works very well with my hp.

pamac install --no-confirm simple-scan

Good luck

@stormschip Yes, simple-scan was on my list. It’s the one I#d like to know, how the post-processing is done.

XSane is OK, you can still get pretty nice result with setting contrast and gamma. For me Gamma=1, Contrast 40-50 working OK with scanning in color 300 dpi. Then you can do some post-processing also.

But I’ll suggest you to try NAPS2. With 300 dpi and Gray mode, contrast 60, you get quite good result, but you need do correction after, to get clean white paper, and black text. In NAPS after scanning try sliders: Contrast +300 or +350. This is probably best result i could get with my HP scanner and standard b/w docs on Linux.

NAPS:
https://aur.archlinux.org/packages/naps2-bin

1 Like

If taking one additional step isn’t too much, I suggest taking a look at Stirling PDF. It’s an open source web app that does all kinds of things with PDFs and among the tools is one that cleans up scans, OCRs them and adds the text to the PDF for you.

Alternatively I can recommend the mobile app Turboscan (Android and iOS). It’s very quick and efficient, can produce very nice looking color reduced PDFs (or full color or b/w ones) and has an option for auto uploading your scans. I actually scan everything with Turboscan these days.

1 Like