Skip to main content

Pumping text from etherpads into publications

Project description

etherpump

PyPI version GPL license

Pumping text from etherpads into publications

A command-line utility that extends the multi writing and publishing functionalities of the etherpad by exporting the pads in multiple formats.

Many pads, many networks

Etherpump is a friendly fork of etherdump, a command line tool written by Michael Murtaugh that converts etherpad pages to files. This fork is made out of curiosities in the tool, a wish to study it and shared sparks of enthusiasm to use it in different situations within Varia.

Etherpump is a stretched version of etherdump. It is a playground in which we would like to add features to the initial tool that diffuse actions of dumping into pumping. So most of all, etherpump is a work-in-progress, exploring potential uses of etherpads to edit, structure and publish various types of content.

Added features are:

  • opt-in publishing with the __PUBLISH__ magic word
  • the publication command, that listens to custom magic words such as __RELEARN__

See the Change log / notes section for further changes.

Etherpump is a tool that is used from the command line. It pumps all pads of one etherpad installation to a folder, saving them as different text files, such as plain text and HTML. It also creates an index file, that allows one to easily navigate through the list of pads. Etherpump follows a document-driven idea of publishing, which means that it converts pads as database entries into pads as files. This seems to be a redundant act of copying, but is actually an important in-between step that allows for many different publishing projects and experiments.

We started to get to know etherpump through various editions of Relearn and/or the worksessions organized by Constant. Collaborative writing on an etherpad has been an important ingredient for these situations. The habit of using pads branched into the day-to-day practice of Varia, where we use etherpads for all sorts of things, ranging from organising remote-meetings with 10+ people, to writing and designing PDF documents collaboratively.

After installing etherpump on the Varia server, we collectively decided to not want to publish pads by default. Discussions in the group around the use of etherpads, privacy and ideas of what publishing means, led to a need to have etherpump only start the indexing work after it recognizes a __PUBLISH__ marker on a pad. We decided to work on a __PUBLISH__ vs. __NOPUBLISH__ branch of etherdump, which we now fork into etherpump.

Change log / notes

October 2020

Use the more friendly packaging tool Poetry for publishing.

Further performance tweaks, informative logging and miscellaneous big fixing.

Decolonize our Git praxis and use the main branch.


January 2020

Added experimental trio and asks support for the pull command which enables pads to be processed concurrently. The default --connection option is set to 20 which may overpower the target server. If in doubt, set this to a lower number (like 5). This functionality is experimental, be cautious and please report bugs!

Removed fancy progress bars for pulling because concurrent processing makes that hard to track. For now, we simply output whichever padid we're finished with.


October 2019

Improve etherpump --help handling to make it easier for new users.

Added the python-dateutil and pypandoc dependencies

Added a fancy progress bar with tqdm for long running etherpump pull --all calls

Started with the experimental library API.


September 2019

Forking etherpump into etherpump.

https://git.vvvvvvaria.org/varia/etherpump

Migrating the source code to Python 3.

Integrate PyPi publishing with setuptools.


May - September 2019

etherpump is used to produce the Ruminating Relearn section of the Network Of One's Own 2 (NOOO2) publication.

A new command is added to make a web publication, based on the custom magic word __RELEARN__.


June 2019

Multiple conversations around etherpump emerged during Relearn Curved in Varia, Rotterdam.

Including the idea of executable pads (etherhooks), custom magic words, a federated snippet protocol (etherstekje) and more.

https://varia.zone/relearn-2019.html


April 2019

Installation of etherpump on the Varia server.

https://etherpump.vvvvvvaria.org/


March 2019

The __PUBLISH__ vs. __NOPUBLISH__ was added to the etherpump repository by decentral1se.

https://gitlab.constantvzw.org/aa/etherpump/issues/3


Originally designed for use at: Constant.

More notes can be found in the git repository of etherdump.

Install etherpump

$ pip install etherpump

Etherpump only supports Python >= 3.6.

Command-line example

$ mkdir mydump
$ cd myddump
$ etherpump init

The program then interactively asks some questions:

Please type the URL of the etherpad (e.g. https://pad.vvvvvvaria.org):

https://pad.vvvvvvaria.org/

The APIKEY is the contents of the file APIKEY.txt in the etherpad folder.

Please paste the APIKEY:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The settings are placed in a file called .etherpump/settings.json and are used (by default) by future commands.

Common Workflows

Text+Meta performance wrangling

If you have a lot of pads, you might want to try the following to speed things up. This example is something we do at Varia. Firstly, you download all the pads text + metadata as the only formats. This is likely what you want when you're trying to work directly with the text. You can do that like so:

$ etherpump pull --text --meta --publish-opt-in

The key here is to get the --meta so that etherpump is able to read quickly skip it on the following run if there are no new revisions. So, in practice, you get a slower first run and faster following runs as more pads are skipped from actually doing a file system write to save the contents which we already have.

Library API Example

Etherpump can be used as a library.

All commands can be imported and run programmatically.

>>> from etherpump.api import pull
>>> pull(['--text', '--meta', '--publish-opt-in'])

Subcommands

To see all available subcommands, run:

$ etherpump --help

For help on each individual subcommand, run:

$ etherpump revisionscount --help

Publishing

Please use "semver" conventions for versions.

Here are the steps to follow (e.g. for a 0.1.3 release):

  • Change the version number in the etherpump/__init__.py __VERSION__ to 0.1.3
  • Change the version number in the pyproject.toml version field to 0.1.3
  • git add . && git commit -m "Publish new 0.1.3 version" && git tag 0.1.3 && git push --tags
  • Run poetry publish --build

You should have a PyPi account and be added as an owner/maintainer on the etherpump package.

Testing

It can be quite handy to run a very temporary local Etherpad instance to test against. This is possible with Docker.

$ docker run -d --name etherpad -p 9001:9001 etherpad/etherpad
$ docker exec -ti etherpad cat APIKEY.txt;echo

Then you can etherpump init to that local Etherpad for experimentation and testing. You use http://localhost:9001 as the pad URL.

Later on, you can remove the Etherpad with:

$ docker rm -f --volumes etherpad

Maintenance utilities

Tools to help things stay tidy over time.

$ make

Please see the following links for further reading:

Keeping track of Etherpad-lite

License

GNU AFFERO GENERAL PUBLIC LICENSE, Version 3.

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etherpump-0.0.16.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

etherpump-0.0.16-py3-none-any.whl (52.9 kB view details)

Uploaded Python 3

File details

Details for the file etherpump-0.0.16.tar.gz.

File metadata

  • Download URL: etherpump-0.0.16.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.0 Linux/4.9.0-13-amd64

File hashes

Hashes for etherpump-0.0.16.tar.gz
Algorithm Hash digest
SHA256 868f11afbe3016e5125e574786df54aac83f018f5851179f55c21fcf9f9aad98
MD5 6f39d57711fb35db29d725735b9aea60
BLAKE2b-256 c0f776fc24ae4388d851c59b4cd07c465e86adfd5932e8cb1e5d1fc50bf634a6

See more details on using hashes here.

File details

Details for the file etherpump-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: etherpump-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 52.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.0 Linux/4.9.0-13-amd64

File hashes

Hashes for etherpump-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 ec88bc3222a9ae4cc0816e97d4770455c202dc34b6ee87de8ac0d0bc45b2a4a5
MD5 7227ae460b89046225bb9cdbc53db884
BLAKE2b-256 3b2267ff508e2f7c2759ead3e1bf90af732d92a77ed026d6a3b0f33ca92dad09

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page