Tool to scan and process documents to palerless
Project description
Scan and prepare your document for paperless-ng
Features
The main goal of this project is to prepare the documents by using heavy tools without needed to wait (=> doing them in background) and using too many resources on my desktop. It's the reason wy it's a little bit complicated to put ti in place.
Features:
- Scan the images optionally by using the Automatic Document Feeder
- Easily scan double sided images using the Automatic Document Feeder
- Change the images levels
- Deskew the images
- Crop the images
- Sharpen the images (disable by default)
- Dither the images (disable by default)
- Autorotate the images by using tesseract (To have the text on the right side)
- Assisted split, used to split a prospectus page in more pages (Requires to modify the yaml...)
- Append credit cart, used to have the too faces of a credit cart on the same page.
Install
On the desktop
$ python3 -m pip install scan-to-paperless
$ sudo activate-global-python-argcomplete
$ echo PATH=$PATH:~/venv/bin >> ~/.bashrc
Create the configuration file on <home_config>/scan-to-paperless.yaml
(on Linux it's ~/.config/scan-to-paperless.yaml
), with:
scan_folder: /home/sbrunner/Paperless/scan/
scanimage_arguments: # Additional argument passed to the scanimage command
- --device=... # Use `scanimage --list` to get the possible values
- --format=png
- --mode=color
- --resolution=300
default_args:
## Level
# true: => do level on 15% - 85% (under 15 % will be black above 85% will be white)
# false: => 0% - 100%
# <number>: => (0 + <number>)% - (100 - number)%
level:
# If no level specified, do auto level
auto_level: False
# min level if no level end no autolovel
min_level: 15
# max level if no level end no autolovel
max_level: 95
## Crop
no_crop: False # Don't do any crop
marging_horizontal: 9 # mm, the horizontal margin used on autodetect content
marging_vertical: 6 # mm, the vertical margin used on autodetect content
dpi: 300 # The DPI uset to convert the mm to pixel
# Sharpen
sharpen: False # Do the sharpen
# Dither
dither: False # Do the dither
## OCR
tesseract: True # Use tesseract to to an OCR on the document
tesseract_lang: fra+eng # The used language
On the NAS
The Docker support is required, Personally I use a Synology DiskStation DS918+, and you can get the *.syno.json files to configure your Docker services.
Otherwise use:
docker run --rm --restart=unless-stopped \
--volume=<scan_folder>:/source \
--volume=<consume_folder>:/destination \
sbrunner/scan-to-paperless
You can set the environment variable PROGRESS
to TRUE
to get all the intermediate images.
Repoitory link
You should find a way to synchronise or using sharing to link the scan folder on your desktop and on your nas.
You should also link the consume folder to paperless-ng
probabls just by using the same folder.
Usage
-
Use the
scan
command to import your document, to scan your documents. -
The document is transferred to your NAS (I use Syncthing).
-
The documents will be processed on the NAS.
-
Use
scan-process-status
to know the status of your documents. -
Validate your documents.
-
If your happy with that remove the
REMOVE_TO_CONTINUE
file. (To restart the process remove one of the generated images, to cancel the job just remove the folder). -
The process will continue his job and import the document in
paperless-ng
.
Nice feature
Double sized scanning
-
Pour your sheets on the Automatic Document Feeder.
-
Run
scan
with the option--double-sided
. -
Press enter to start scanning the first side of all sheets.
-
Put again all your sheets on the Automatic Document Feeder without turning them.
The scan utils will rotate and reorder all the sheets to get a good document.
Credit card scanning
The options --append-credit-card
will append all the sheets vertically to have the booth face of the credit card on the same page.
Assisted split
-
Do your scan as usual with the extra option
--assisted-split
. -
After the process do his first pass you will have images with lines and numbers. The lines represent the detected potential split of the image, the length indicate the strength of the detection. In your config you will have something like:
assisted_split:
- destinations:
- 4 # Page number of the left part of the image
- 1 # Same for the right page of the image
image: image-1.png # name of the image
limits:
- margin: 0 # Margin around the split
name: 0 # Number visible on the generated image
value: 375 # The position of the split (can be manually edited)
vertical: true # Will split the image vertically
- ...
source: /source/975468/7-assisted-split/image-1.png
- ...
Edit your config file, you should have one more destination then the limits. If you put destinatination like that: 2.1, it mean that it will be the first part of the page 2 and the 2.2 will be the second part.
-
Delete the file
REMOVE_TO_CONTINUE
. -
After the process do his first pass you will have the final generated images.
-
If it's OK delete the file
REMOVE_TO_CONTINUE
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file scan_to_paperless-0.18.0-py2.py3-none-any.whl
.
File metadata
- Download URL: scan_to_paperless-0.18.0-py2.py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cd5aa2928298663a6495004335be7e08d1b95c1f7d4e90573a9a7152f747981 |
|
MD5 | 31db9a3ef4fd8977c4e132dd7ce1cd23 |
|
BLAKE2b-256 | 6fdcd40720fc0d464beff2ab7f0a6881059645688787232198bfd965050268f1 |