A package for training and evaluating multimodal knowledge graph embeddings
Project description
PyKEEN
PyKEEN (Python KnowlEdge EmbeddiNgs) is a Python package designed to train and evaluate knowledge graph embedding models (incorporating multi-modal information). It is part of the KEEN Universe.
Installation • Quickstart • Datasets • Models • Support
Installation
The development version of PyKEEN can be downloaded and installed from PyPI on Python 3.7+ with:
$ pip install pykeen
The development version of PyKEEN can be downloaded and installed from GitHub on Python 3.7+ with:
$ git clone https://github.com/pykeen/pykeeen.git pykeen
$ cd pykeen
$ pip install -e .
$ # Install pre-commit
$ pip install pre-commit
$ pre-commit install
PyKEEN has several extras for installation that are defined in the [options.extras_require]
section
of the setup.cfg
. They can be included with installation using the bracket notation like in
pip install pykeen[docs]
or pip install -e .[docs]
. Several can be listed, comma-delimited like in
pip install pykeen[docs,plotting]
.
Name | Description |
---|---|
plotting |
Plotting with seaborn and generation of word clouds |
mlflow |
Tracking of results with mlflow |
docs |
Building of the documentation |
templating |
Building of templated documentation, like the README |
Contributing
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
Quickstart
This example shows how to train a model on a data set and test on another data set.
The fastest way to get up and running is to use the pipeline function. It provides a high-level entry into the extensible functionality of this package. The following example shows how to train and evaluate the TransE model on the Nations dataset. By default, the training loop uses the stochastic local closed world assumption (sLCWA) training approach and evaluates with rank-based evaluation.
from pykeen.pipeline import pipeline
result = pipeline(
model='TransE',
dataset='nations',
)
The results are returned in a dataclass that has attributes for the trained model, the training loop, and the evaluation.
PyKEEN is extensible such that:
- Each model has the same API, so anything from
pykeen.models
can be dropped in - Each training loop has the same API, so
pykeen.training.LCWATrainingLoop
can be dropped in - Triples factories can be generated by the user with
from pykeen.triples.TriplesFactory
Implementation
Below are the models, data sets, training modes, evaluators, and metrics implemented
in pykeen
.
Datasets (13)
Name | Reference | Description |
---|---|---|
fb15k | pykeen.datasets.FB15k |
The FB15k data set. |
fb15k237 | pykeen.datasets.FB15k237 |
The FB15k-237 data set. |
hetionet | pykeen.datasets.Hetionet |
The Hetionet dataset is a large biological network. |
kinships | pykeen.datasets.Kinships |
The Kinships data set. |
nations | pykeen.datasets.Nations |
The Nations data set. |
openbiolink | pykeen.datasets.OpenBioLink |
The OpenBioLink dataset. |
openbiolinkf1 | pykeen.datasets.OpenBioLinkF1 |
The PyKEEN First Filtered OpenBioLink 2020 Dataset. |
openbiolinkf2 | pykeen.datasets.OpenBioLinkF2 |
The PyKEEN Second Filtered OpenBioLink 2020 Dataset. |
openbiolinklq | pykeen.datasets.OpenBioLinkLQ |
The low-quality variant of the OpenBioLink dataset. |
umls | pykeen.datasets.UMLS |
The UMLS data set. |
wn18 | pykeen.datasets.WN18 |
The WN18 data set. |
wn18rr | pykeen.datasets.WN18RR |
The WN18-RR data set. |
yago310 | pykeen.datasets.YAGO310 |
The YAGO3-10 data set is a subset of YAGO3 that only contains entities with at least 10 relations. |
Models (23)
Name | Reference | Citation |
---|---|---|
ComplEx | pykeen.models.ComplEx |
Trouillon et al., 2016 |
ComplExLiteral | pykeen.models.ComplExLiteral |
Agustinus et al., 2018 |
ConvE | pykeen.models.ConvE |
Dettmers et al., 2018 |
ConvKB | pykeen.models.ConvKB |
Nguyen et al., 2018 |
DistMult | pykeen.models.DistMult |
Yang et al., 2014 |
DistMultLiteral | pykeen.models.DistMultLiteral |
Agustinus et al., 2018 |
ERMLP | pykeen.models.ERMLP |
Dong et al., 2014 |
ERMLPE | pykeen.models.ERMLPE |
Sharifzadeh et al., 2019 |
HolE | pykeen.models.HolE |
Nickel et al., 2016 |
KG2E | pykeen.models.KG2E |
He et al., 2015 |
NTN | pykeen.models.NTN |
Socher et al., 2013 |
ProjE | pykeen.models.ProjE |
Shi et al., 2017 |
RESCAL | pykeen.models.RESCAL |
Nickel et al., 2011 |
RGCN | pykeen.models.RGCN |
Schlichtkrull et al., 2018 |
RotatE | pykeen.models.RotatE |
Sun et al., 2019 |
SimplE | pykeen.models.SimplE |
Kazemi et al., 2018 |
StructuredEmbedding | pykeen.models.StructuredEmbedding |
Bordes et al., 2011 |
TransD | pykeen.models.TransD |
Ji et al., 2015 |
TransE | pykeen.models.TransE |
Bordes et al., 2013 |
TransH | pykeen.models.TransH |
Wang et al., 2014 |
TransR | pykeen.models.TransR |
Lin et al., 2015 |
TuckER | pykeen.models.TuckER |
Balazevic et al., 2019 |
UnstructuredModel | pykeen.models.UnstructuredModel |
Bordes et al., 2014 |
Losses (7)
Name | Reference | Description |
---|---|---|
bce | pykeen.losses.BCELoss |
A wrapper around the PyTorch binary cross entropy loss. |
bceaftersigmoid | pykeen.losses.BCEAfterSigmoidLoss |
A loss function which uses the numerically unstable version of explicit Sigmoid + BCE. |
crossentropy | pykeen.losses.CrossEntropyLoss |
Evaluate cross entropy after softmax output. |
marginranking | pykeen.losses.MarginRankingLoss |
A wrapper around the PyTorch margin ranking loss. |
mse | pykeen.losses.MSELoss |
A wrapper around the PyTorch mean square error loss. |
nssa | pykeen.losses.NSSALoss |
An implementation of the self-adversarial negative sampling loss function proposed by [sun2019]_. |
softplus | pykeen.losses.SoftplusLoss |
A loss function for the softplus. |
Regularizers (5)
Name | Reference | Description |
---|---|---|
combined | pykeen.regularizers.CombinedRegularizer |
A convex combination of regularizers. |
lp | pykeen.regularizers.LpRegularizer |
A simple L_p norm based regularizer. |
no | pykeen.regularizers.NoRegularizer |
A regularizer which does not perform any regularization. |
powersum | pykeen.regularizers.PowerSumRegularizer |
A simple x^p based regularizer. |
transh | pykeen.regularizers.TransHRegularizer |
A regularizer for the soft constraints in TransH. |
Optimizers (6)
Name | Reference | Description |
---|---|---|
adadelta | torch.optim.Adadelta |
Implements Adadelta algorithm. |
adagrad | torch.optim.Adagrad |
Implements Adagrad algorithm. |
adam | torch.optim.Adam |
Implements Adam algorithm. |
adamax | torch.optim.Adamax |
Implements Adamax algorithm (a variant of Adam based on infinity norm). |
adamw | torch.optim.AdamW |
Implements AdamW algorithm. |
sgd | torch.optim.SGD |
Implements stochastic gradient descent (optionally with momentum). |
Training Loops (2)
Name | Reference | Description |
---|---|---|
lcwa | pykeen.training.LCWATrainingLoop |
A training loop that uses the local closed world assumption training approach. |
slcwa | pykeen.training.SLCWATrainingLoop |
A training loop that uses the stochastic local closed world assumption training approach. |
Negative Samplers (2)
Name | Reference | Description |
---|---|---|
basic | pykeen.sampling.BasicNegativeSampler |
A basic negative sampler. |
bernoulli | pykeen.sampling.BernoulliNegativeSampler |
An implementation of the bernoulli negative sampling approach proposed by [wang2014]_. |
Stoppers (2)
Name | Reference | Description |
---|---|---|
early | pykeen.stoppers.EarlyStopper |
A harness for early stopping. |
nop | pykeen.stoppers.NopStopper |
A stopper that does nothing. |
Evaluators (2)
Name | Reference | Description |
---|---|---|
rankbased | pykeen.evaluation.RankBasedEvaluator |
A rank-based evaluator for KGE models. |
sklearn | pykeen.evaluation.SklearnEvaluator |
An evaluator that uses a Scikit-learn metric. |
Metrics (6)
Metric | Description | Evaluator | Reference |
---|---|---|---|
Adjusted Mean Rank | The mean over all chance-adjusted ranks: mean_i (2r_i / (num_entities+1)). Lower is better. | rankbased | pykeen.evaluation.RankBasedMetricResults |
Average Precision Score | The area under the precision-recall curve, between [0.0, 1.0]. Higher is better. | sklearn | pykeen.evaluation.SklearnMetricResults |
Hits At K | The hits at k for different values of k, i.e. the relative frequency of ranks not larger than k. Higher is better. | rankbased | pykeen.evaluation.RankBasedMetricResults |
Mean Rank | The mean over all ranks: mean_i r_i. Lower is better. | rankbased | pykeen.evaluation.RankBasedMetricResults |
Mean Reciprocal Rank | The mean over all reciprocal ranks: mean_i (1/r_i). Higher is better. | rankbased | pykeen.evaluation.RankBasedMetricResults |
Roc Auc Score | The area under the ROC curve between [0.0, 1.0]. Higher is better. | sklearn | pykeen.evaluation.SklearnMetricResults |
Hyper-parameter Optimization
Samplers (2)
Name | Reference | Description |
---|---|---|
random | optuna.samplers.RandomSampler |
Sampler using random sampling. |
tpe | optuna.samplers.TPESampler |
Sampler using TPE (Tree-structured Parzen Estimator) algorithm. |
Experimentation
Reproduction
PyKEEN includes a set of curated experimental settings for reproducing past landmark experiments. They can be accessed and run like:
pykeen experiments reproduce tucker balazevic2019 fb15k
Where the three arguments are the model name, the reference, and the data set.
The output directory can be optionally set with -d
.
Ablation
PyKEEN includes the ability to specify ablation studies using the hyper-parameter optimization module. They can be run like:
pykeen experiments ablation ~/path/to/config.json
Acknowledgements
Supporters
This project has been supported by several organizations (in alphabetical order):
- Bayer
- Enveda Therapeutics
- Fraunhofer Institute for Algorithms and Scientific Computing
- Fraunhofer Institute for Intelligent Analysis and Information Systems
- Fraunhofer Center for Machine Learning
- Ludwig-Maximilians-Universität München
- Munich Center for Machine Learning (MCML)
- Siemens
- Smart Data Analytics Research Group (University of Bonn & Fraunhofer IAIS)
- Technical University of Denmark - DTU Compute - Section for Cognitive Systems
- Technical University of Denmark - DTU Compute - Section for Statistics and Data Analysis
- University of Bonn
Logo
The PyKEEN logo was designed by Carina Steinborn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.