Skip to main content

A tool for fixing BibTeX reference list with DBLP API.

Project description

reffix: Fixing BibTeX reference list with DBLP API 🔧

reffix GitHub GitHub issues PyPI PyPI downloads Github stars

➡️ Reffix is a simple tool for improving the BibTeX list of references in your paper. It can fix several common errors such as incorrect capitalization, missing URLs, or using arXiv pre-prints instead of published version.

➡️ Reffix queries the DBLP API, so it does not require any local database of papers.

➡️ Reffix uses a conservative approach to keep your bibliography valid.

➡️ The tool is developed with NLP papers in mind, but it can be used on any BibTeX list of references containing computer science papers present on DBLP.

Quickstart

👉️ You can now install reffix from PyPI:

pip install -U reffix
reffix [BIB_FILE]

See the Installation and Usage section below for more details.

Example

Before the update (Google Scholar):

  • ❎ arXiv version
  • ❎ no URL
  • ❎ capitalization lost
 {  
    'ENTRYTYPE': 'article',
    'ID': 'duvsek2020evaluating',
    'author': 'Du{\\v{s}}ek, Ond{\\v{r}}ej and Kasner, Zden{\\v{e}}k',
    'journal': 'arXiv preprint arXiv:2011.10819',
    'title': 'Evaluating semantic accuracy of data-to-text generation with '
             'natural language inference',
    'year': '2020'
}

After the update (DBLP + preserving capitalization):

  • ✔️ ACL version
  • ✔️ URL included
  • ✔️ capitalization preserved
 {   
    'ENTRYTYPE': 'inproceedings',
    'ID': 'duvsek2020evaluating',
    'author': 'Ondrej Dusek and\nZdenek Kasner',
    'bibsource': 'dblp computer science bibliography, https://dblp.org',
    'biburl': 'https://dblp.org/rec/conf/inlg/DusekK20.bib',
    'booktitle': 'Proceedings of the 13th International Conference on Natural '
                 'Language\n'
                 'Generation, {INLG} 2020, Dublin, Ireland, December 15-18, '
                 '2020',
    'editor': 'Brian Davis and\n'
              'Yvette Graham and\n'
              'John D. Kelleher and\n'
              'Yaji Sripada',
    'pages': '131--137',
    'publisher': 'Association for Computational Linguistics',
    'timestamp': 'Mon, 03 Jan 2022 00:00:00 +0100',
    'title': '{Evaluating} {Semantic} {Accuracy} of {Data-to-Text} '
             '{Generation} with {Natural} {Language} {Inference}',
    'url': 'https://aclanthology.org/2020.inlg-1.19/',
    'year': '2020'
}

Main features

  • Completing referencesreffix queries the DBLP API with the paper title and the first author's name to find a complete reference for each entry in the BibTeX file.
  • Replacing arXiv preprintsreffix can try to replace arXiv pre-prints with the version published at a conference or in a journal whenever possible.
  • Preserving titlecase – in order to preserve correct casing, reffix wraps individual uppercased words in the paper title in curly brackets.
  • Conservative approach:
    • the original .bib file is preserved
    • no references are deleted
    • papers are updated only if the title and at least one of the authors match
    • the version of the paper corresponding to the original entry should be selected first
  • Interactive mode – you can confirm every change manually.

The package uses bibtexparser for parsing the BibTex files, DBLP API for updating the references, and the titlecase package for optional extra titlecasing.

Installation

You can install reffix from PyPI:

pip install reffix

For development, you can install the package in the editable mode:

pip install -e .[dev]

Usage

Run the script with the .bib file as the first argument:

reffix [IN_BIB_FILE]

By default, the program will run in batch mode, save the outputs in the file with an extra ".fixed" suffix, and keep the arXiv versions.

The following command will run reffix in interactive mode, save the outputs to a custom file, and replace arXiv versions:

reffix [IN_BIB_FILE] -o [OUT_BIB_FILE] -i -a

Flags

short long description
-o --out Output filename. If not specified, the default filename <original_name>.fixed.bib is used.
-i --interact Interactive mode. Every replacement of an entry with DBLP result has to be confirmed manually.
-a --replace-arxiv Replace arXiv versions. If a non-arXiv version (e.g. published at a conference or in a journal) is found at DBLP, it is preferred to the arXiv version.
-t --force-titlecase Force titlecase for all entries. The titlecase package is used to fix casing of titles which are not titlecased. (Note that the capitalizaton rules used by the package may be a bit different.)
-s --sort-by Multiple sort conditions compatible with bibtexparser.BibTexWriter applied in the provided order. Example: -s ENTRYTYPE year sorts the list by the entry type as its primary key and year as its secondary key. ID can be used to refer to the Bibtex key. The default None value keeps the original order of Bib entries.
--no-publisher Suppress publishers in conference papers and journals (still kept for books).
--process-conf-loc Parse conference dates and locations, remove from proceedings names, store locations under address.
--no-formatting Disable automatic BibTeX formatting.

Notes

Although reffix uses a conservative approach, it provides no guarantees that the output references are actually correct.

If you want to make sure that reffix does not introduce any unwanted changes, please use the interactive mode (flag -i).

The tool depends on DBLP API which may change any time in the future. I will try to update the script if necessary, but it may still occasionally break. I welcome any pull requests with improvements.

Please be considerate regarding the DBLP API and do not generate high traffic for their servers :-)

Contact

For any questions or suggestions, send an e-mail to kasner@ufal.mff.cuni.cz.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reffix-1.2.2.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

reffix-1.2.2-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file reffix-1.2.2.tar.gz.

File metadata

  • Download URL: reffix-1.2.2.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for reffix-1.2.2.tar.gz
Algorithm Hash digest
SHA256 f17bbb54b4a4042714d6c10659dfae31e7ffb884698f571a4264c8c86cd32aa3
MD5 83fcf8136f074eb08b45f98c0a7fd550
BLAKE2b-256 69339243bb0249dbca50e44aabd21c8ef4570b63a0dee9eabbc8d4bdc935b7cc

See more details on using hashes here.

File details

Details for the file reffix-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: reffix-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for reffix-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 748cc7c2df56432489ebc151100f634160900fb66f64c35606ae07c201c882ba
MD5 c1a57f5e6cd0b9212bb0aea30cb29aba
BLAKE2b-256 dbacca02edb4a624ffd5f2a8f7df381ae237075ccceaf0bc184e0e13198fb0ab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page