Quickly move data from postgres to numpy or pandas.
Project description
Quickly move data from postgres to numpy or pandas.
API
to_arrays(query, *, bind=None)
Run the query returning a the results as np.ndarrays.
Parameters
----------
query : sa.sql.Selectable
The query to run. This can be a select or a table.
bind : sa.Engine, optional
The engine used to create the connection. If not provided
``query.bind`` will be used.
Returns
-------
arrays : dict[str, (np.ndarray, np.ndarray)]
A map from column name to the result arrays. The first array holds the
values and the second array is a boolean mask for NULLs. The values
where the mask is False are 0 interpreted by the type.
to_dataframe(query, *, bind=None, null_values=None)
Run the query returning a the results as a pd.DataFrame.
Parameters
----------
query : sa.sql.Selectable
The query to run. This can be a select or a table.
bind : sa.Engine, optional
The engine used to create the connection. If not provided
``query.bind`` will be used.
null_values : dict[str, any]
The null values to use for each column. This falls back to
``warp_prism.null_values`` for columns that are not specified.
Returns
-------
df : pd.DataFrame
A pandas DataFrame holding the results of the query. The columns
of the DataFrame will be named the same and be in the same order as the
query.
register_odo_dataframe_edge()
Register an odo edge for sqlalchemy selectable objects to dataframe.
This edge will have a lower cost that the default edge so it will be
selected as the fasted path.
If the selectable is not in a postgres database, it will fallback to the
default odo edge.
Comparisons
A quick comparison between warp_prism, odo, and pd.read_sql_table.
In this example we will read real data for VIX from quandl stored in a local postgres database using warp_prism, odo, and pd.read_sql_table. After that, we will use odo to create a table with two float columns and 1000000 rows and query it with the tree tools again.
In [1]: import warp_prism
In [2]: from odo import odo, resource
In [3]: import pandas as pd
In [4]: table = resource(
...: 'postgresql://localhost/bz::yahoo_index_vix',
...: schema='quandl',
...: )
In [5]: warp_prism.to_dataframe(table).head()
Out[5]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [6]: odo(table, pd.DataFrame).head()
Out[6]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [7]: pd.read_sql_table(table.name, table.bind, table.schema).head()
Out[7]:
asof_date open_ high low close volume \
0 2016-01-08 22.959999 27.080000 22.480000 27.010000 0.0
1 2015-12-04 17.430000 17.650000 14.690000 14.810000 0.0
2 2015-10-29 14.800000 15.460000 14.330000 14.610000 0.0
3 2015-12-21 19.639999 20.209999 18.700001 18.700001 0.0
4 2015-10-26 14.760000 15.430000 14.680000 15.290000 0.0
adjusted_close timestamp
0 27.010000 2016-01-11 23:14:54.682220
1 14.810000 2016-01-11 23:14:54.682220
2 14.610000 2016-01-11 23:14:54.682220
3 18.700001 2016-01-11 23:14:54.682220
4 15.290000 2016-01-11 23:14:54.682220
In [8]: len(warp_prism.to_dataframe(table))
Out[8]: 6565
In [9]: %timeit warp_prism.to_dataframe(table)
100 loops, best of 3: 7.55 ms per loop
In [10]: %timeit odo(table, pd.DataFrame)
10 loops, best of 3: 49.9 ms per loop
In [11]: %timeit pd.read_sql_table(table.name, table.bind, table.schema)
10 loops, best of 3: 61.8 ms per loop
In [12]: big_table = odo(
...: pd.DataFrame({
...: 'a': np.random.rand(1000000),
...: 'b': np.random.rand(1000000)},
...: ),
...: 'postgresql://localhost/test::largefloattest',
...: )
In [13]: %timeit warp_prism.to_dataframe(big_table)
1 loop, best of 3: 248 ms per loop
In [14]: %timeit odo(big_table, pd.DataFrame)
1 loop, best of 3: 1.51 s per loop
In [15]: %timeit pd.read_sql_table(big_table.name, big_table.bind)
1 loop, best of 3: 1.9 s per loop
Installation
Warp Prism can be pip installed but requires numpy to build its C extensions:
$ pip install numpy
$ pip install warp_prism
License
Warp Prism is licensed under the Apache 2.0.
Warp Prism is sponsored by Quantopian where it is used to fetch data for use in Zipline through the Pipeline API or interactively with Blaze.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file warp_prism-0.1.1.tar.gz
.
File metadata
- Download URL: warp_prism-0.1.1.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d57df5dbadf1d2764de27e2e3edb339cb296cbf1be71c5b9c9d317d5ca5d598c |
|
MD5 | fdf6efdebf4019c21e135588539d460f |
|
BLAKE2b-256 | 61efd40839723444d99ac54dc207ce130b2ca96d66c7dfda91a650fe48a0da3e |