Partridge is python library for working with GTFS feeds using pandas DataFrames.
Project description
Partridge
=========
.. image:: https://img.shields.io/pypi/v/partridge.svg
:target: https://pypi-hypernode.com/pypi/partridge
.. image:: https://img.shields.io/travis/remix/partridge.svg
:target: https://travis-ci.org/remix/partridge
Partridge is python library for working with `GTFS <https://developers.google.com/transit/gtfs/>`__ feeds using `pandas <https://pandas.pydata.org/>`__ DataFrames.
The implementation of Partridge is heavily influenced by our experience at `Remix <https://www.remix.com/>`__ ingesting, analyzing, and debugging thousands of GTFS feeds from hundreds of agencies.
At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed. The root node can optionally be filtered to create a view of the feed specific to your needs. It's most common to filter a feed down to specific dates (``service_id``), routes (``route_id``), or both.
.. figure:: dependency-graph.png
:alt: dependency graph
Philosphy
---------
The design of Partridge is guided by the following principles:
- as much as possible
- favor speed
- allow for extension
- succeed lazily on expensive paths
- fail eagerly on inexpensive paths
- as little as possible
- do anything other than efficiently read GTFS files into DataFrames
- take an opinion on the GTFS spec
Usage
-----
.. code:: python
import datetime
import partridge as ptg
path = 'path/to/sfmta-2017-08-22.zip'
service_ids_by_date = ptg.read_service_ids_by_date(path)
service_ids = service_ids_by_date[datetime.date(2017, 9, 25)]
feed = ptg.feed(path, view={
'trips.txt': {
'service_id': service_ids,
'route_id': '12300', # 18-46TH AVENUE
},
})
assert set(feed.trips.service_id) == service_ids
assert list(feed.routes.route_id) == ['12300']
# Buses running the 18 - 46th Ave line use 88 stops (on September 25, 2017, at least).
assert len(feed.stops) == 88
Features
--------
- Surprisingly fast :)
- Load only what you need into memory
- Built-in support for resolving service dates
- Easily extended to support fields and files outside the official spec
(TODO: document this)
- Handle nested folders and bad data in zips
- Predictable type conversions
Installation
------------
.. code:: console
pip install partridge
Thank You
---------
I hope you find this library useful. If you have suggestions for
improving Partridge, please open an `issue on
GitHub <https://github.com/remix/partridge/issues>`__.
History
=======
0.8.0 (2018-03-14)
------------------
* Gracefully handle completely empty files. This change unifies the behavior of reading from a CSV
with a header only (no data rows) and a completely empty (zero bytes)
file in the zip.
0.7.0 (2018-03-09)
------------------
* Fix handling of nested folders and zip containing nested folders.
* Add ``ptg.get_filtered_feed`` for multi-file filtering.
0.6.1 (2018-02-24)
------------------
* Fix bug in ``ptg.read_service_ids_by_date``. Reported by @cjer in #27.
0.6.0 (2018-02-21)
------------------
* Published package no longer includes unnecessary fixtures to reduce the size.
* Naively write a feed object to a zip file with ``ptg.write_feed_dangerously``.
* Read the earliest, busiest date and its ``service_id``'s from a feed with ``ptg.read_busiest_date``.
* Bug fix: Handle ``calendar.txt``/``calendar_dates.txt`` entries w/o applicable trips.
0.6.0.dev1 (2018-01-23)
-----------------------
* Add support for reading files from a folder. Thanks again @danielsclint!
0.5.0 (2017-12-22)
------------------
* Easily build a representative view of a zip with ``ptg.get_representative_feed``. Inspired by `peartree <https://github.com/kuanb/peartree/blob/3bfc3f49ae6986d6020913b63c8ee32582b3dcc3/peartree/paths.py#L26>`_.
* Extract out GTFS zips by agency_id/route_id with ``ptg.extract_{agencies,routes}``.
* Read arbitrary files from a zip with ``feed.get('myfile.txt')``.
* Remove ``service_ids_by_date``, ``dates_by_service_ids``, and ``trip_counts_by_date`` from the feed class. Instead use ``ptg.{read_service_ids_by_date,read_dates_by_service_ids,read_trip_counts_by_date}``.
0.4.0 (2017-12-10)
------------------
* Add support for Python 2.7. Thanks @danielsclint!
0.3.0 (2017-10-12)
------------------
* Fix service date resolution for raw_feed. Previously raw_feed considered all days of the week from calendar.txt to be active regardless of 0/1 value.
0.2.0 (2017-09-30)
------------------
* Add missing edge from fare_rules.txt to routes.txt in default dependency graph.
0.1.0 (2017-09-23)
------------------
* First release on PyPI.
=========
.. image:: https://img.shields.io/pypi/v/partridge.svg
:target: https://pypi-hypernode.com/pypi/partridge
.. image:: https://img.shields.io/travis/remix/partridge.svg
:target: https://travis-ci.org/remix/partridge
Partridge is python library for working with `GTFS <https://developers.google.com/transit/gtfs/>`__ feeds using `pandas <https://pandas.pydata.org/>`__ DataFrames.
The implementation of Partridge is heavily influenced by our experience at `Remix <https://www.remix.com/>`__ ingesting, analyzing, and debugging thousands of GTFS feeds from hundreds of agencies.
At the core of Partridge is a dependency graph rooted at ``trips.txt``. Disconnected data is pruned away according to this graph when reading the contents of a feed. The root node can optionally be filtered to create a view of the feed specific to your needs. It's most common to filter a feed down to specific dates (``service_id``), routes (``route_id``), or both.
.. figure:: dependency-graph.png
:alt: dependency graph
Philosphy
---------
The design of Partridge is guided by the following principles:
- as much as possible
- favor speed
- allow for extension
- succeed lazily on expensive paths
- fail eagerly on inexpensive paths
- as little as possible
- do anything other than efficiently read GTFS files into DataFrames
- take an opinion on the GTFS spec
Usage
-----
.. code:: python
import datetime
import partridge as ptg
path = 'path/to/sfmta-2017-08-22.zip'
service_ids_by_date = ptg.read_service_ids_by_date(path)
service_ids = service_ids_by_date[datetime.date(2017, 9, 25)]
feed = ptg.feed(path, view={
'trips.txt': {
'service_id': service_ids,
'route_id': '12300', # 18-46TH AVENUE
},
})
assert set(feed.trips.service_id) == service_ids
assert list(feed.routes.route_id) == ['12300']
# Buses running the 18 - 46th Ave line use 88 stops (on September 25, 2017, at least).
assert len(feed.stops) == 88
Features
--------
- Surprisingly fast :)
- Load only what you need into memory
- Built-in support for resolving service dates
- Easily extended to support fields and files outside the official spec
(TODO: document this)
- Handle nested folders and bad data in zips
- Predictable type conversions
Installation
------------
.. code:: console
pip install partridge
Thank You
---------
I hope you find this library useful. If you have suggestions for
improving Partridge, please open an `issue on
GitHub <https://github.com/remix/partridge/issues>`__.
History
=======
0.8.0 (2018-03-14)
------------------
* Gracefully handle completely empty files. This change unifies the behavior of reading from a CSV
with a header only (no data rows) and a completely empty (zero bytes)
file in the zip.
0.7.0 (2018-03-09)
------------------
* Fix handling of nested folders and zip containing nested folders.
* Add ``ptg.get_filtered_feed`` for multi-file filtering.
0.6.1 (2018-02-24)
------------------
* Fix bug in ``ptg.read_service_ids_by_date``. Reported by @cjer in #27.
0.6.0 (2018-02-21)
------------------
* Published package no longer includes unnecessary fixtures to reduce the size.
* Naively write a feed object to a zip file with ``ptg.write_feed_dangerously``.
* Read the earliest, busiest date and its ``service_id``'s from a feed with ``ptg.read_busiest_date``.
* Bug fix: Handle ``calendar.txt``/``calendar_dates.txt`` entries w/o applicable trips.
0.6.0.dev1 (2018-01-23)
-----------------------
* Add support for reading files from a folder. Thanks again @danielsclint!
0.5.0 (2017-12-22)
------------------
* Easily build a representative view of a zip with ``ptg.get_representative_feed``. Inspired by `peartree <https://github.com/kuanb/peartree/blob/3bfc3f49ae6986d6020913b63c8ee32582b3dcc3/peartree/paths.py#L26>`_.
* Extract out GTFS zips by agency_id/route_id with ``ptg.extract_{agencies,routes}``.
* Read arbitrary files from a zip with ``feed.get('myfile.txt')``.
* Remove ``service_ids_by_date``, ``dates_by_service_ids``, and ``trip_counts_by_date`` from the feed class. Instead use ``ptg.{read_service_ids_by_date,read_dates_by_service_ids,read_trip_counts_by_date}``.
0.4.0 (2017-12-10)
------------------
* Add support for Python 2.7. Thanks @danielsclint!
0.3.0 (2017-10-12)
------------------
* Fix service date resolution for raw_feed. Previously raw_feed considered all days of the week from calendar.txt to be active regardless of 0/1 value.
0.2.0 (2017-09-30)
------------------
* Add missing edge from fare_rules.txt to routes.txt in default dependency graph.
0.1.0 (2017-09-23)
------------------
* First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
partridge-0.8.0.tar.gz
(594.8 kB
view details)
Built Distribution
File details
Details for the file partridge-0.8.0.tar.gz
.
File metadata
- Download URL: partridge-0.8.0.tar.gz
- Upload date:
- Size: 594.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43a837e5db391871127dc0de808dac20a5201b6081959d0d016737f41c4d8f1d |
|
MD5 | 5b543047d3cb956f7f6b3cb4a254edd5 |
|
BLAKE2b-256 | b6429ed5b9a7a5907dd9189e343c06dadc72b0fe8314b3fa5760f474c2d6c94b |
File details
Details for the file partridge-0.8.0-py2.py3-none-any.whl
.
File metadata
- Download URL: partridge-0.8.0-py2.py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cd22cb02d88a35b37dfeb8bef01fc84674919f63980645188bf42187992a568 |
|
MD5 | ee493f4ceebc50d0989373db59a767a8 |
|
BLAKE2b-256 | f0d4494e31cf73ee64273f2a901ecd36670daa395755448f42ae0e8a62ac2531 |