Skip to main content

Examine snapshots in eeb archives such as the Internet Archive's Wayback Machine

Project description

memento-cli

Build Status

A command line tool interacting with Memento (RFC 7089) supporting web archives, such as the Internet Archive's Wayback Machine.

For more background on why this tool was created see: https://inkdroid.org/2023/09/14/memento-bisect/

Usage

List Snapshots

To list all the available snapshots (or Mementos) for a given snapshot you can use the list command:

$ memento list https://web.archive.org/web/20230407140923/https:/help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2017-12-29 05:40:51 https://web.archive.org/web/20171229054051/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-03 20:03:00 https://web.archive.org/web/20180103200300/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-04 06:39:58 https://web.archive.org/web/20180104063958/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-06 16:08:07 https://web.archive.org/web/20180106160807/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 06:10:07 https://web.archive.org/web/20180112061007/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 17:40:16 https://web.archive.org/web/20180112174016/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 18:40:34 https://web.archive.org/web/20180112184034/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 19:11:48 https://web.archive.org/web/20180112191148/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:05:57 https://web.archive.org/web/20180120190557/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:19:20 https://web.archive.org/web/20180120191920/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
...

Since memento works with any RFC 7089 supporting archive you can use it to list versions in other web archives as well:

$ memento list https://www.webarchive.org.uk/wayback/archive/20130501020401/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-05-01 02:03:57 https://www.webarchive.org.uk/wayback/archive/20130501020357mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-05-01 02:04:01 https://www.webarchive.org.uk/wayback/archive/20130501020401mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-07-29 12:58:03 https://www.webarchive.org.uk/wayback/archive/20130729125803mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-07-29 12:58:06 https://www.webarchive.org.uk/wayback/archive/20130729125806mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2021-01-22 06:38:21 https://www.webarchive.org.uk/wayback/archive/20210122063821mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2022-03-14 16:36:16 https://www.webarchive.org.uk/wayback/archive/20220314163616mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/

Searching for Changes (Bisect)

Let's suppose you know that the Twitter Hateful Conduct Policy used to have language about:

women, people of color, lesbian, gay, bisexual, transgender, queer, intersex, asexual individuals

You can see it in the Internet Archive Wayback Machine in 2019. But you can't see it on the page in 2023. To identify when the change was introduced, you can bisect the version history to search for the version where the text went missing, using the two snapshots and the --text option. This will perform a binary search between the two versions looking for the text.

$ memento bisect --missing --text "women, people of color, lesbian, gay" \
  https://web.archive.org/web/20190711134608/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy \
  https://web.archive.org/web/20230621094005/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy

The --text value can be a regular expression too if you want. If you only provide one snapshot URL it will use that as the start index, and use the last snapshot in the archive as the end.

The bisect command uses a browser behind the scenes (using Selenium) in order to fully render the page. If you wanted to find out when some text appears (rather than goes missing) then remove the --missing parameter from the command.

And if you would prefer to examine the pages in between manually, leave off the --text parameter and memento will prompt you to continue, and show you the browser it is controlling.

If you would like to see the browser when using --text use the --show-browser option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memento_cli-0.0.4.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

memento_cli-0.0.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file memento_cli-0.0.4.tar.gz.

File metadata

  • Download URL: memento_cli-0.0.4.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.2 Darwin/23.1.0

File hashes

Hashes for memento_cli-0.0.4.tar.gz
Algorithm Hash digest
SHA256 81b31f8df3f44ce449d83bb600435e34eb0376346cc62ed225c66c5d38d26bf0
MD5 62f608405603b0c57f8de3e0facc4fe3
BLAKE2b-256 22248680807a14cf66774b1301066ec261cb4190a3e3580139cec3e68449ef08

See more details on using hashes here.

File details

Details for the file memento_cli-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: memento_cli-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.2 Darwin/23.1.0

File hashes

Hashes for memento_cli-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 adf7f2536c019832e4345a30d0ab469c39e401b4555bde5e1c84ccb6296a0eb0
MD5 0d0dc58c03b3d06ad78d4ec868c11c38
BLAKE2b-256 25a6d6bddc420cd7a808a7a03893d1ea38772bf79e3c02e639c27104cff91aaf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page