Receipt and bill parser using OCR
Project description
receiptparser
Summary
A receipt and bill parser written in Python. Can be used as a Python module or CLI tool.
It was originally based on receipt-parser, but has effectively been completely rewritten/replaced.
So far, only German receipts are supported, but other countries can be added using a simple YAML configuration file.
Installation
pip3 install receiptparser
CLI Usage
A simple example to read all images (.jpg) from a directory and print the recognized data to stdout:
receiptparser tests/data/germany/img/
You can customize the output as follows:
receiptparser -v0 --format "{date:%Y-%m-%d} - {market} - {postal} - {sum}.jpg" tests/data/germany/img/
In this case, -v0
suppresses any output, except for what you specify in the --format FORMAT
parameter. FORMAT is a Python format string as specified here.
The following values can be used in the format string:
- market: The recognized name of the business
- postal: The recognized postal code of the business
- date: The recognized date of the bill or receipt
- sum: The dollar (or Euro, or other currency) amount of the bill or receipt
Syntax
usage: receiptparser [-h] [-c CONFIG] [--config-file CONFIG_FILE] [-s] [-t TESSERACT] [-f FORMAT] [-v {0,1,2}] input
positional arguments:
input file or directory from which images will be read
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
built-in config to use
--config-file CONFIG_FILE
like -c, but point to a file instead
-t TESSERACT, --tesseract TESSERACT
output directory for OCR recognized text (default is to discard)
-f FORMAT, --format FORMAT
format of the recognized output. default is pretty-printing
-v {0,1,2}, --verbosity {0,1,2}
increase output verbosity
Python usage
from receiptparser.config import read_config
from receiptparser.parser import process_receipt
config = read_config('my_config.yml')
receipt = process_receipt(config, "my_receipt.jpg", out_dir=None, verbosity=0)
print("Filename: ", receipt.filename)
print("Market: ", receipt.market)
print("Postal code:", receipt.postal)
print("Date: ", receipt.date)
print("Amount: ", receipt.sum)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file receiptparser-1.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: receiptparser-1.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0500e4692f95c83b84ea286ef35e288548fcb94b43e3afc36dfa31df70e4b8a |
|
MD5 | 4aaef0f3e322e29ae43ec6892774bfc5 |
|
BLAKE2b-256 | 7aedaf893f13a5dc3e9bbe012741fca6a05c40d586c8e7fc14154ed170ecb8ac |