Skip to main content

Small library for extracting references used in scholarly communication.

Project description

https://travis-ci.org/inspirehep/refextract.svg?branch=master https://coveralls.io/repos/github/inspirehep/refextract/badge.svg?branch=master

About

A small library for extracting references used in scholarly communication.

Install

$ pip install refextract

Usage

To get structured information from a publication reference:

>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': '',
}

To extract references from a PDF:

>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

To extract directly from a URL:

>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

Notes

refextract depends on pdftotext.

Acknowledgments

refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:

  • Alessio Deiana

  • Federico Poli

  • Gerrit Rindermann

  • Graham R. Armstrong

  • Grzegorz Szpura

  • Jan Aage Lavik

  • Javier Martin Montull

  • Micha Moskovic

  • Samuele Kaplun

  • Thorsten Schwander

  • Tibor Simko

License

GPLv2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refextract-1.0.0.tar.gz (5.9 MB view details)

Uploaded Source

File details

Details for the file refextract-1.0.0.tar.gz.

File metadata

  • Download URL: refextract-1.0.0.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.7

File hashes

Hashes for refextract-1.0.0.tar.gz
Algorithm Hash digest
SHA256 11e34aeab3571d6aca7178ff44e8e96f14e97b37903af9287defad936b7883a1
MD5 d51e654040024f34a012dbcde28d77db
BLAKE2b-256 fd076f3c3f0e4f2733d63ae3e236f3b5eb8dd78f2a554cad705dbdbaddb0b45b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page