Examine snapshots in eeb archives such as the Internet Archive's Wayback Machine
Project description
memento-cli
A command line tool interacting with Memento (RFC 7089) supporting web archives, such as the Internet Archive's Wayback Machine.
For more background on why this tool was created see: https://inkdroid.org/2023/09/14/memento-bisect/
Usage
List Snapshots
To list all the available snapshots (or Mementos) for a given snapshot you can use the list
command:
$ memento list https://web.archive.org/web/20230407140923/https:/help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2017-12-29 05:40:51 https://web.archive.org/web/20171229054051/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-03 20:03:00 https://web.archive.org/web/20180103200300/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-04 06:39:58 https://web.archive.org/web/20180104063958/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-06 16:08:07 https://web.archive.org/web/20180106160807/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 06:10:07 https://web.archive.org/web/20180112061007/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 17:40:16 https://web.archive.org/web/20180112174016/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 18:40:34 https://web.archive.org/web/20180112184034/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-12 19:11:48 https://web.archive.org/web/20180112191148/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:05:57 https://web.archive.org/web/20180120190557/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
2018-01-20 19:19:20 https://web.archive.org/web/20180120191920/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
...
Since memento works with any RFC 7089 supporting archive you can use it to list versions in other web archives as well:
$ memento list https://www.webarchive.org.uk/wayback/archive/20130501020401/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-05-01 02:03:57 https://www.webarchive.org.uk/wayback/archive/20130501020357mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-05-01 02:04:01 https://www.webarchive.org.uk/wayback/archive/20130501020401mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2013-07-29 12:58:03 https://www.webarchive.org.uk/wayback/archive/20130729125803mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition
2013-07-29 12:58:06 https://www.webarchive.org.uk/wayback/archive/20130729125806mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2021-01-22 06:38:21 https://www.webarchive.org.uk/wayback/archive/20210122063821mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
2022-03-14 16:36:16 https://www.webarchive.org.uk/wayback/archive/20220314163616mp_/http://www.vam.ac.uk/content/exhibitions/david-bowie-is/david-bowie-is-inside-the-exhibition/
Searching for Changes (Bisect)
Let's suppose you know that the Twitter Hateful Conduct Policy used to have language about:
women, people of color, lesbian, gay, bisexual, transgender, queer, intersex, asexual individuals
You can see it in the Internet Archive Wayback Machine in 2019. But you can't see it on the page in 2023. To identify when the change was introduced, you can bisect the version history to search for the version where the text went missing, using the two snapshots and the --text
option. This will perform a binary search between the two versions looking for the text.
$ memento bisect --missing --text "women, people of color, lesbian, gay" \
https://web.archive.org/web/20190711134608/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy \
https://web.archive.org/web/20230621094005/https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
The --text
value can be a regular expression too if you want. If you only provide one snapshot URL it will use that as the start index, and use the last snapshot in the archive as the end.
The bisect command uses a browser behind the scenes (using Selenium) in order to fully render the page. If you wanted to find out when some text appears (rather than goes missing) then remove the --missing
parameter from the command.
And if you would prefer to examine the pages in between manually, leave off the --text
parameter and memento will prompt you to continue, and show you the browser it is controlling.
If you would like to see the browser when using --text
use the --show-browser
option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for memento_cli-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | adf7f2536c019832e4345a30d0ab469c39e401b4555bde5e1c84ccb6296a0eb0 |
|
MD5 | 0d0dc58c03b3d06ad78d4ec868c11c38 |
|
BLAKE2b-256 | 25a6d6bddc420cd7a808a7a03893d1ea38772bf79e3c02e639c27104cff91aaf |