Skip to main content

Convert Word documents (docx) to clean HTML and Markdown

Project description

# Word-to-HTML/JSON (for EnduringWord commentary)

We aim bto build a converter for our german translations (see [ICF project](https://bibel-kommentar.de)) of the [Enduring Word](https://enduringword.com/) commentary from Word to HTML/JSON, in order to publish it at EnduringWord and Bibleserver.

#### Important ressources: - (Input) Word-files at OneDrive. - (Output) HTML/JSON at [/examples](https://github.com/VolkerBergen/bible_commentary/tree/main/examples).

#### Platforms:

## ConvertX

Our Python implementation convertx converts Docx-to-HTML (and will be extended for markdown).

Installation: pip install git+https://github.com/VolkerBergen/convertx

For multi-language (en/de) support also pip install pycld2

CLI (single file) - convertx document.docx output.html - convertx document.docx output.md

CLI (full directory) - cd into directory and run convertx html. - cd into directory and run convertx markdown.

Additional arguments: - convertx html –output-dir=output

## Project Outline

### Enduring Word - Point of contact: Andrea Kölsch - HTML converter convertx nearly done. - tbd: WordPress plugin (e.g. [mammoth](https://de.wordpress.org/plugins/mammoth-docx-converter/)) - tbd: Auto-upload of files to WordPress

### Bibleserver - Point of contact: Timotheus Israel - tbd: JSON (using UTF-8, see [/examples](https://github.com/VolkerBergen/bible_commentary/tree/main/examples)) - tbd: Clarify auto-upload possibilities - tbd: How are chapters divided into single-multiple JSON-files?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convertx-0.0.1.tar.gz (7.7 kB view details)

Uploaded Source

File details

Details for the file convertx-0.0.1.tar.gz.

File metadata

  • Download URL: convertx-0.0.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for convertx-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e087f555680bc4555a23c6e88146980631bc8222afb0d47e7bc767292a763809
MD5 3c89437f8c1e08a23ad7175144c046ea
BLAKE2b-256 32ca3231bff802ed7b671bd091f077d71c842dede1c233fac48c4845e4439b2a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page