Skip to main content

Archive all episodes from your favorite podcasts

Project description

Podcast Archiver

Podcast Archiver Logo

PyPI PyPI - Python Version

Code Quality

Linter: Ruff Code style: Black Dependency management: poetry

Archive all episodes from your favorite podcasts.

The archiver takes the feed URLs of your favorite podcasts and downloads all available episodes for you. Even those files "hidden" in a paged feed will be tapped, so you'll have an entire backup of the series. The archiver also supports updating an existing archive, so that it lends itself to be set up as a cronjob.

Outline

In my experience, very few full-fledged podcast clients are able to access a paged feed (following IETF RFC5005), so only the last few episodes of a podcast will be available to download. When you discover a podcast that has been around for quite a while, you'll have a hard time to follow the "gentle listener's duty" and listen to the whole archive. The script in this repository is supposed to help you acquiring every last episode of your new listening pleasure.

Before downloading any episode the function first fetches all available pages of the feed and prepares a list. That way, you will never miss any episode.

Setup

Python package

podcast-archiver is Python 3.9+ compatible.

# Latest tagged/published version on PyPI:
pip install podcast-archiver

# Latest master from GitHub:
pip install git+https://github.com/janw/podcast-archiver.git

Docker image

Alternatively podcast-archiver is available as a docker image as well:

# Latest tagged/published version, same as on PyPI:
docker run --rm ghcr.io/janw/podcast-archiver:latest

# Latest master from GitHub:
docker run --rm ghcr.io/janw/podcast-archiver:edge

Usage

Run podcast-archiver --help for details on how to use it.

Full-fledged example

podcast-archiver -d ~/Music/Podcasts \
    --subdirs \
    --date-prefix \
    --progress \
    --verbose \
    -f http://logbuch-netzpolitik.de/feed/m4a \
    -f http://raumzeit-podcast.de/feed/m4a/ \
    -f https://feeds.lagedernation.org/feeds/ldn-mp3.xml

Process the feed list from a file

If you have a larger list of podcasts and/or want to update the archive on a cronjob basis, the -f argument can be outsourced into a text file. The text file may contain one feed URL per line, looking like this:

podcast-archiver -d ~/Music/Podcasts -s -u -f feedlist.txt

where feedlist.txt contains the URLs as if entered into the command line:

    http://logbuch-netzpolitik.de/feed/m4a
    http://raumzeit-podcast.de/feed/m4a/
    https://feeds.lagedernation.org/feeds/ldn-mp3.xml

This way, you can easily add and remove feeds to the list and let the archiver fetch the newest episodes for example by adding it to your crontab.

Excursion: Unicode Normalization in Slugify

The --slugify option removes all ambiguous characters from folders and filenames used in the archiving process. The removal includes unicode normalization according to Compatibility Decomposition. What? Yeah, me too. I figured this is best seen in an example, so here's a fictitious episode name, and how it would be translated to an target filename using the Archiver:

SPR001_Umlaute sind ausschließlich in schönen Sprachen/Dialekten zu finden.mp3

will be turned into

SPR001_Umlaute-sind-ausschlielich-in-schonen-SprachenDialekten-zu-finden.mp3

Note that "decorated" characters like ö are replaced with their basic counterparts (o), while somewhat ligatur-ish ones like ß (amongst most unessential punctuation) are removed entirely.

Todo

  • Add ability to define a preferred format on feeds that contain links for multiple audio codecs.
  • Add ability to define a range of episodes or time to download only episode from that point on or from there to the beginning or or or …
  • Add ability to choose a prefix episodes with the episode number (rarely necessary, since most podcasts feature some kind of episode numbering in the filename)
  • Add unittests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

podcast_archiver-0.5.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

podcast_archiver-0.5.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file podcast_archiver-0.5.1.tar.gz.

File metadata

  • Download URL: podcast_archiver-0.5.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure

File hashes

Hashes for podcast_archiver-0.5.1.tar.gz
Algorithm Hash digest
SHA256 62010a2525710884c6980f12a3419a88bed9335d6e6032d6a027636e61dce4ad
MD5 520a3811ade4f7c91deb71c713eb3db0
BLAKE2b-256 42a73096396acc981483696a1b30cba15de380d510bf790fd34634689a1803fa

See more details on using hashes here.

Provenance

File details

Details for the file podcast_archiver-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: podcast_archiver-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure

File hashes

Hashes for podcast_archiver-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e196a9c8f188a3293f606328aeda23b8b871bdf7d9b82882d104cd53d752e8cc
MD5 1ba22b93e31d13614ece49e56cd08aa4
BLAKE2b-256 fdc9821ce7cb18fe83bfcbef23c9554b3d4d6a61221500ee615bdcfad38da830

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page