Skip to main content

Code for working with the PSYCOP cohort

Project description

Installation

For development

pip install . -e The -e flag marks the install as editable, "overwriting" the package as you edit the source files.

Recommended to also add black as a pre-commit hook: pre-commit install

For use

pip install git+https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils.git

sql_load

Currently only contains one function to load a view from SQL, sql_load

from loaders import sql_load

view = ...
df = sql_load(...)

TimeSeriesFlattener

To train baseline models (logistic regression, elastic net, SVM, XGBoost/random forest etc.), we need to represent the longitudinal data in a tabular, flattened way.

In essence, we need to generate a training example for each prediction time, where that example contains "latest_blood_pressure" (float), "X_diagnosis_within_n_hours" (boolean) etc.

To generate this, I propose the time-series flattener class (TimeSeriesFlattener). It builds a dataset like described above.

TimeSeriesFlattener

class FlattenedTimeSeries:
  Attributes:
    prediction_df (dataframe): Cols: dw_ek_borger, prediction_time, (value if relevant).

  Methods:
    add_outcome
        outcome_df (dataframe): Cols: dw_ek_borger, datotid, (value if relevant).
        lookahead_window (float): How far ahead to look for an outcome. If none found, use fallback.
        resolve_multiple (str): How to handle more than one record within the lookbehind. Suggestions: earliest, latest, mean, max, min.
        fallback (list): How to handle lack of a record within the lookbehind. Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
        name (str): What to name the column
    
    add_predictor
        predictor (dataframe): Cols: dw_ek_borger, datotid, (value if relevant).
        lookback_window (float): How far back to look for a predictor. If none found, use fallback.
        resolve_multiple (str): How to handle more than one record within the lookbehind. Suggestions: earliest, latest, mean, max, min.
        fallback (list): How to handle lack of a record within the lookbehind. Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
        name (str): What to name the column

Inspiration-code can be found in previous commits.

Example

import FlattenedTimeSeries

dataset = FlattenedTimeSeries(prediction_df = prediction_times)

dataset.add_outcome(
    outcome=type_2_diabetes,
    lookahead_window=730,
    resolve_multiple="max",
    fallback=[0],
    name="t2d",
)

dataset.add_predictor(
    predictor=hba1c,
    lookback_window=365,
    resolve_multiple="max",
    fallback=["latest", 40],
    name="hba1c",
)

Dataset now looks like this:

dw_ek_borger datetime_prediction outc_t2d_within_next_730_days pred_max_hba1c_within_prev_365_days
1 yyyy-mm-dd hh:mm:ss 0 48
2 yyyy-mm-dd hh:mm:ss 0 40
3 yyyy-mm-dd hh:mm:ss 1 44

For binary outcomes, add_predictor with fallback = [0] would take a df with only the times where the event occurred, and then generate 0's for the rest.

I propose we create the above functionality on a just-in-time basis, building the features as we need them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psycopmlutils-0.0.3.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

psycopmlutils-0.0.3-py2.py3-none-any.whl (6.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file psycopmlutils-0.0.3.tar.gz.

File metadata

  • Download URL: psycopmlutils-0.0.3.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for psycopmlutils-0.0.3.tar.gz
Algorithm Hash digest
SHA256 d938cec7f77de2c0b8ab069e794c897e463e119994c91208b876315db6263fcf
MD5 8cd3f44e5b123184f157281c23693411
BLAKE2b-256 ae1a3657b7b8052aad1d99504dcca2259a35a3678752f1a1794c90df5f27780d

See more details on using hashes here.

File details

Details for the file psycopmlutils-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: psycopmlutils-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for psycopmlutils-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 271a5b71c65252c75b682b594dbb5b90135e4621ec314ddb39cc0b0f94e4dfff
MD5 4afaeb03549771d99d547333ce8deb52
BLAKE2b-256 75195f9d1a427933c0fd8ddf07f6327b98ee6f5b0d0ef9fc8bb4756fe385055f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page