Skip to main content

Receipt and bill parser using OCR

Project description

receiptparser

Build Status Coverage Status Code Climate Documentation Status

Summary

A receipt and bill parser written in Python. Can be used as a Python module or CLI tool.

It was originally based on receipt-parser, but has effectively been completely rewritten/replaced.

So far, only German receipts are supported, but other countries can be added using a simple YAML configuration file.

Installation

pip3 install receiptparser

CLI Usage

A simple example to read all images (.jpg) from a directory and print the recognized data to stdout:

receiptparser tests/data/germany/img/

You can customize the output as follows:

receiptparser -v0 --format "{date:%Y-%m-%d} - {market} - {postal} - {sum}.jpg" tests/data/germany/img/

In this case, -v0 suppresses any output, except for what you specify in the --format FORMAT parameter. FORMAT is a Python format string as specified here. The following values can be used in the format string:

  • market: The recognized name of the business
  • postal: The recognized postal code of the business
  • date: The recognized date of the bill or receipt
  • sum: The dollar (or Euro, or other currency) amount of the bill or receipt

Syntax

usage: receiptparser [-h] [-c CONFIG] [--config-file CONFIG_FILE] [-s] [-t TESSERACT] [-f FORMAT] [-v {0,1,2}] input

positional arguments:
  input                 file or directory from which images will be read

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        built-in config to use
  --config-file CONFIG_FILE
                        like -c, but point to a file instead
  -s, --sharpen         whether to sharpen the image before OCR
  -t TESSERACT, --tesseract TESSERACT
                        output directory for OCR recognized text (default is to discard)
  -f FORMAT, --format FORMAT
                        format of the recognized output. default is pretty-printing
  -v {0,1,2}, --verbosity {0,1,2}
                        increase output verbosity

Python usage

from receiptparser.config import read_config
from receiptparser.parser import process_receipt

config = read_config('my_config.yml')
receipt = process_receipt(config, "my_receipt.jpg", sharpen=False, out_dir=None, verbosity=0)

print("Filename:   ", receipt.filename)
print("Market:     ", receipt.market)
print("Postal code:", receipt.postal)
print("Date:       ", receipt.date)
print("Amount:     ", receipt.sum)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

receiptparser-1.0.3-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file receiptparser-1.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: receiptparser-1.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for receiptparser-1.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c59bff717ff0d5cc222070d8c33f7a2d12c879e682275bd3c24a1eb443e81c69
MD5 d02e9af78d9297fe5199b3fe6df155e9
BLAKE2b-256 e011a7c4dba86690dbac96f7c8660377e3bc0c776f3692cc9a160be149eba884

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page