Skip to main content

Cleans the LaTeX code of your paper to submit to arXiv.

Project description

arxiv_latex_cleaner

This tool allows you to easily clean the LaTeX code of your paper to submit to arXiv. From a folder containing all your code, e.g. /path/to/latex/, it creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to arXiv.

Example call:

arxiv_latex_cleaner /path/to/latex --im_size 500 --images_whitelist='{"images/im.png":2000}'

Or simply from a config file

arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml

Installation:

pip install arxiv-latex-cleaner
:exclamation: arxiv_latex_cleaner is only compatible with Python >=3 :exclamation:

Alternatively, you can download the source code:

git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help

And install as a command-line program directly from the source code:

python setup.py install

Main features:

Privacy-oriented

  • Removes all auxiliary files (.aux, .log, .out, etc.).
  • Removes all comments from your code (yes, those are visible on arXiv and you do not want them to be). These also include \begin{comment}\end{comment} and \iffalse\fi environments.
  • Optionally removes user-defined commands entered with commands_to_delete (such as \todo{} that you redefine as the empty string at the end).
  • Optionally allows you to define custom regex replacement rules through a cleaner_config.yaml file.

Size-oriented

There is a 50MB limit on arXiv submissions, so to make it fit:

  • Removes all unused .tex files (those that are not in the root and not included in any other .tex file).
  • Removes all unused images that take up space (those that are not actually included in any used .tex file).
  • Optionally resizes all images to im_size pixels, to reduce the size of the submission. You can whitelist some images to skip the global size using images_whitelist.
  • Optionally compresses .pdf files using ghostscript (Linux and Mac only). You can whitelist some PDFs to skip the global size using images_whitelist.

TikZ picture source code concealment

To prevent the upload of tikzpicture source code or raw simulation data, this feature:

  • Replaces the tikzpicture environment \begin{tikzpicture} ... \end{tikzpicture} with the respective \includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf}.
  • Requires externally compiled TikZ pictures as .pdf files in folder EXTERNAL_TIKZ_FOLDER. See section 53 in the PGF/TikZ manual on TikZ picture externalization.
  • Only replaces environments with preceding \tikzsetnextfilename{picture_name} command (as in \tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}) where the externalized picture_name.pdf filename matches picture_name.

More sophisticated pattern replacement based on regex group captures

Sometimes it is useful to work with a set of custom LaTeX commands when writing a paper. To get rid of them upon arXiv submission, one can simply revert them to plain LaTeX with a regular expression insertion.

{
    "pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
    "insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
    "description" : "Replace figcomp"
}

The pattern above will find all \figcomp{path}{w1}{w2} commands and replace them with \parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}. Note that the insertion template is filled with the named groups captures from the pattern. Note that the replacement is processed before all \includegraphics commands are processed and corresponding file paths are copied, making sure all figure files are copied to the cleaned version. See also cleaner_config.yaml for details on how to specify the patterns.

Usage:

usage: arxiv_latex_cleaner@v0.1.10 [-h] [--resize_images] [--im_size IM_SIZE]
                                   [--compress_pdf]
                                   [--pdf_im_resolution PDF_IM_RESOLUTION]
                                   [--images_whitelist IMAGES_WHITELIST]
                                   [--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]]
                                   [--verbose]
                                   [--config CONFIG_PATH]
                                   input_folder

Clean the LaTeX code of your paper to submit to arXiv. Check the README for
more information on the use.

positional arguments:
  input_folder          Input folder containing the LaTeX code.

optional arguments:
  -h, --help            show this help message and exit
  --resize_images       Resize images.
  --im_size IM_SIZE     Size of the output images (in pixels, longest side).
                        Fine tune this to get as close to 10MB as possible.
  --compress_pdf        Compress PDF images using ghostscript (Linux and Mac
                        only).
  --pdf_im_resolution PDF_IM_RESOLUTION
                        Resolution (in dpi) to which the tool resamples the
                        PDF images.
  --images_whitelist IMAGES_WHITELIST
                        Images (and PDFs) that won't be resized to the default
                        resolution,but the one provided here. Value is pixel
                        for images, and dpi forPDFs, as in --im_size and
                        --pdf_im_resolution, respectively. Format is a
                        dictionary as: '{"path/to/im.jpg": 1000}'
  --commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]
                        LaTeX commands that will be deleted. Useful for e.g.
                        user-defined \todo commands.
  --use_external_tikz EXTERNAL_TIKZ_FOLDER
                        Folder (relative to input folder) containing
                        externalized TikZ figures in PDF format.
  --verbose             Enable detailed output.
  --config CONFIG_PATH
                        Read Settings from config file, such as "cleaner_config.yaml"

Note

This is not an officially supported Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_latex_cleaner-0.1.10.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

arxiv_latex_cleaner-0.1.10-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_latex_cleaner-0.1.10.tar.gz.

File metadata

  • Download URL: arxiv_latex_cleaner-0.1.10.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for arxiv_latex_cleaner-0.1.10.tar.gz
Algorithm Hash digest
SHA256 fbdea05d041d5fc5c9375e075e662edfc6e276f1671233efc51d7f5cbcc934f9
MD5 c594ee3be2e95c53fe8fc2e012760d67
BLAKE2b-256 744ec5ce6f80a6cadbcc7fc464f1ef08fdb95f68c72c02fc2a61bdd2e3ce328a

See more details on using hashes here.

Provenance

File details

Details for the file arxiv_latex_cleaner-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: arxiv_latex_cleaner-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for arxiv_latex_cleaner-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 1b3cd64b4c9217f42043008673a6763512a114eebff482318e4ad1ecf067ce4f
MD5 aeaf4103d62b4df6a89130118a5c7233
BLAKE2b-256 3962853bab40f98808927b644aa40eec4a41f52e3bb2b4ea64643ab0c33e5948

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page