Code for working with the PSYCOP cohort
Project description
Installation
For development
pip install . -e
The -e
flag marks the install as editable, "overwriting" the package as you edit the source files.
Recommended to also add black as a pre-commit hook:
pre-commit install
For use
pip install git+https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils.git
sql_load
Currently only contains one function to load a view from SQL, sql_load
from loaders import sql_load
view = ...
df = sql_load(...)
TimeSeriesFlattener
To train baseline models (logistic regression, elastic net, SVM, XGBoost/random forest etc.), we need to represent the longitudinal data in a tabular, flattened way.
In essence, we need to generate a training example for each prediction time, where that example contains "latest_blood_pressure" (float), "X_diagnosis_within_n_hours" (boolean) etc.
To generate this, I propose the time-series flattener class (TimeSeriesFlattener
). It builds a dataset like described above.
TimeSeriesFlattener
class FlattenedTimeSeries:
Attributes:
prediction_df (dataframe): Cols: dw_ek_borger, prediction_time, (value if relevant).
Methods:
add_outcome
outcome_df (dataframe): Cols: dw_ek_borger, datotid, (value if relevant).
lookahead_window (float): How far ahead to look for an outcome. If none found, use fallback.
resolve_multiple (str): How to handle more than one record within the lookbehind. Suggestions: earliest, latest, mean, max, min.
fallback (list): How to handle lack of a record within the lookbehind. Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
name (str): What to name the column
add_predictor
predictor (dataframe): Cols: dw_ek_borger, datotid, (value if relevant).
lookback_window (float): How far back to look for a predictor. If none found, use fallback.
resolve_multiple (str): How to handle more than one record within the lookbehind. Suggestions: earliest, latest, mean, max, min.
fallback (list): How to handle lack of a record within the lookbehind. Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
name (str): What to name the column
Inspiration-code can be found in previous commits.
Example
import FlattenedTimeSeries
dataset = FlattenedTimeSeries(prediction_df = prediction_times)
dataset.add_outcome(
outcome=type_2_diabetes,
lookahead_window=730,
resolve_multiple="max",
fallback=[0],
name="t2d",
)
dataset.add_predictor(
predictor=hba1c,
lookback_window=365,
resolve_multiple="max",
fallback=["latest", 40],
name="hba1c",
)
Dataset now looks like this:
dw_ek_borger | datetime_prediction | outc_t2d_within_next_730_days | pred_max_hba1c_within_prev_365_days |
---|---|---|---|
1 | yyyy-mm-dd hh:mm:ss | 0 | 48 |
2 | yyyy-mm-dd hh:mm:ss | 0 | 40 |
3 | yyyy-mm-dd hh:mm:ss | 1 | 44 |
For binary outcomes, add_predictor
with fallback = [0]
would take a df with only the times where the event occurred, and then generate 0's for the rest.
I propose we create the above functionality on a just-in-time basis, building the features as we need them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file psycopmlutils-0.0.3.tar.gz
.
File metadata
- Download URL: psycopmlutils-0.0.3.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d938cec7f77de2c0b8ab069e794c897e463e119994c91208b876315db6263fcf |
|
MD5 | 8cd3f44e5b123184f157281c23693411 |
|
BLAKE2b-256 | ae1a3657b7b8052aad1d99504dcca2259a35a3678752f1a1794c90df5f27780d |
File details
Details for the file psycopmlutils-0.0.3-py2.py3-none-any.whl
.
File metadata
- Download URL: psycopmlutils-0.0.3-py2.py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 271a5b71c65252c75b682b594dbb5b90135e4621ec314ddb39cc0b0f94e4dfff |
|
MD5 | 4afaeb03549771d99d547333ce8deb52 |
|
BLAKE2b-256 | 75195f9d1a427933c0fd8ddf07f6327b98ee6f5b0d0ef9fc8bb4756fe385055f |