Storage and database adapters available in project Thoth
Project description
This library provides a library called thoth-storages used in project Thoth. The library exposes core queries and methods for PostgreSQL database as well as adapters for manipulating with Ceph via its S3 compatible API.
Installation and Usage
The library can be installed via pip or Pipenv from PyPI:
pipenv install thoth-storages
The library does not provide any CLI, it is rather a low level library supporting other parts of Thoth.
You can run prepared testsuite via the following command:
pipenv install --dev
pipenv run python3 setup.py test
# To generate docs:
pipenv run python3 setup.py build_sphinx
Running PostgreSQL locally
You can use docker-compose.yaml present in this repository to run a local PostgreSQL instance, (make sure you installed podman-compose):
$ podman-compose up
After running the command above, you should be able to access a local PostgreSQL instance at localhost:5432. This is also the default configuration for PostgreSQL’s adapter - you don’t need to provide GRAPH_SERVICE_HOST explicitly. The default configuration uses database named postgres which can be accessed using postgres user and postgres password (SSL is disabled).
The provided docker-compose.yaml has also PGweb enabled for to have an UI for the database content. To access it visit http://localhost:8081/.
The provided docker-compose.yaml does not use any volume. After you containers restart, the content will not be available anymore.
If you would like to experiment with PostgreSQL programatically, you can use the following code snippet as a starting point:
from thoth.storages import GraphDatabase
graph = GraphDatabase()
graph.connect()
# To clear database:
# graph.drop_all()
# To initialize schema in the graph database:
# graph.initialize_schema()
Generating migrations and schema adjustment in deployment
If you make any changes to data model of the main PostgreSQL database, you need to generate migrations. These migrations state how to adjust already existing database with data in deployments. For this purpose, Alembic migrations are used. Alembic can (partially) automatically detect what has changed and how to adjust already existing database in a deployment.
Alembic uses incremental version control, where each migration is versioned and states how to migrate from previous state of database to the desired next state - these versions are present in alembic/versions directory and are automatically generated with procedure described bellow.
If you make any changes, follow the following steps which will generate version for you:
make sure your local PostgreSQL instance is running (follow Running PostgreSQL locally instructions above):
$ podman-compose up
Run Alembic CLI to generate versions for you:
# Make sure you have your environment setup: # pipenv install --dev # Make sure you are running the most recent version of schema: $ PYTHONPATH=. pipenv run alembic upgrade head # Actually generate a new version: $ PYTHONPATH=. pipenv run alembic revision --autogenerate -m "Added row to calculate sum of sums which will be divided by 42"
Review migrations generated by Alembic. Note NOT all changes are automatically detected by Alembic.
Make sure generated migrations are part of your pull request so changes are propagated to deployments:
$ git add thoth/storages/data/alembic/versions/
In a deployment, use Management API and its /graph/initialize endpoint to propagate database schema changes in deployment (Management API has to have recent schema changes present which are populated with new thoth-storages releases).
If running locally and you would like to propagate changes, run the following Alembic command to update migrations to the latest version:
$ PYTHONPATH=. pipenv run alembic upgrade head
If you would like to update schema programmatically run the following Python code:
from thoth.storages import GraphDatabase graph = GraphDatabase() graph.connect() graph.initilize_schema()
Generate schema images
You can use shipped CLI thoth-storages to automatically generate schema images out of the current models:
# First, make sure you have dev packages installed:
pipenv install --dev
PYTHONPATH=. pipenv run python3 ./thoth-storages generate-schema
The command above will produce 2 images named schema.png and schema_cache.png. The first PNG file shows schema for the main PostgreSQL instance and the latter one, as the name suggests, shows how cache schema looks like.
If the command above fails with the following exception:
FileNotFoundError: [Errno 2] "dot" not found in path.
make sure you have graphviz package installed:
dnf install -y graphviz
Creating own performance indicators
You can create your own performance indicators. To create own performance indicator, create a script which tests desired functionality of a library. An example can be matrix multiplication script present in performance repository. This script can be supplied to Dependency Monkey to validate certain combination of libraries in desired runtime and buildtime environment or directly on Amun API which will run the given script using desired software and hardware configuration. Please follow instructions on how to create a performance script shown in the README of performance repo.
To create relevant models, adjust thoth/storages/graph/models_performance.py file and add your model. Describe parameters (reported in @parameters section of performance indicator result) and result (reported in @result). The name of class should match name which is reported by performance indicator run.
class PiMatmul(Base, BaseExtension, PerformanceIndicatorBase):
"""A class for representing a matrix multiplication micro-performance test."""
# Device used during performance indicator run - CPU/GPU/TPU/...
device = Column(String(128), nullable=False)
matrix_size = Column(Integer, nullable=False)
dtype = Column(String(128), nullable=False)
reps = Column(Integer, nullable=False)
elapsed = Column(Float, nullable=False)
rate = Column(Float, nullable=False)
Online debugging of queries
You can print to logger all the queries that are performed to a PostgreSQL instance. To do so, set the following environment variable:
export THOTH_STORAGES_DEBUG_QUERIES=1
Online debugging of queries
You can print information about PostgreSQL adapter together with statisics on the graph cache and memory cache usage to logger (it has to have at least level INFO set). To do so, set the following environment variable:
export THOTH_STORAGES_LOG_STATS=1
These statistics will be printed once the database adapter is destructed.
Graph database cache
The implementation of this library also provides a cache to speed up queries to graph database. This cache is especially suitable for prod systems not to query for popular packages multiple times.
The cache can be created with shipped CLI tool:
# When using version from this Git repository:
PYTHONPATH=. THOTH_STORAGES_GRAPH_CACHE="cache.sqlite3" pipenv run ./thoth-storages graph-cache -c ../adviser/cache_conf.yaml
# When using a version installed from PyPI:
THOTH_STORAGES_GRAPH_CACHE="cache.sqlite3" thoth-storages graph-cache -c ../adviser/cache_conf.yaml
The command above creates a SQLite3 database which carries some of the data loaded from the PostgreSQL database which help resolver resolve software stacks faster. The path to cache can be supplied using environment variable THOTH_STORAGES_GRAPH_CACHE. By default, the module will create an in-memory SQLite3 database and will not persist it onto disk. If the configuration points to non-existing file, an SQLite3 database will be created and persisted onto disk with data which were added into it based on runtime usage. This naturally re-uses graph cache multiple times across runs (filled with the data needed) as expected.
Take a look at adviser repo, at cache_conf.yaml file specifically, to see how cache_conf.yaml file should be structured. An example could be:
python-packages:
- thoth-storages
- tensorflow
With the configuration above, the cache will be created. This cache will hold a serialized dependency graph of TensorFlow and thoth-storages packages, together with node information to effectively construct TensorFlow’s dependency graph for transitive queries.
Note only information which should not change over time are captured in the cache; for example, packages which were not yet resolved during cache creation are not added to cache so system explicitly asks for resolution results next time (they might be resolved meanwhile).
To enable inserts into graph cache, set THOTH_STORAGES_GRAPH_CACHE_INSERTS_DISABLED to 0 (the default value of 1 disables it). Disabling inserts might be benefitial in deployments where you want to avoid building cache (overhead needed to insert data into graph cache, checks of uniqueness of entries and cache index creation which in sum are expensive operations).
To disable graph cache completely, set THOTH_STORAGES_GRAPH_CACHE_DISABLED environment variable to 1 (the default value of 0 enables it).
Creating backups from Thoth deployment
You can use pg_dump and psql utilities to create dumps and restore the database content from dumps. This tool is pre-installed in the container image which is running PostgreSQL so the only thing you need to do is execute pg_dump in Thoth’s deployment in a PostgreSQL container to create a dump, use oc cp to retrieve dump (or directly use oc exec and create the dump from the cluster) and subsequently psql to restore the database content. The prerequisite for this is to have access to the running container (edit rights).
# Execute the following commands from the root of this Git repo:
# List PostgreSQL pods running:
$ oc get pod -l name=postgresql
NAME READY STATUS RESTARTS AGE
postgresql-1-glwnr 1/1 Running 0 3d
# Open remote shell to the running container in the PostgreSQL pod:
$ oc rsh -t postgresql-1-glwnr bash
# Perform dump of the database:
(cluster-postgres) $ pg_dump > pg_dump-$(date +"%s").sql
(cluster-postgres) $ ls pg_dump-*.sql # Remember the current dump name
(cluster-postgres) pg_dump-1569491024.sql
(cluster-postgres) $ exit
# Copy the dump to the current dir:
$ oc cp thoth-test-core/postgresql-1-glwnr:/opt/app-root/src/pg_dump-1569491024.sql .
# Start local PostgreSQL instance:
$ podman-compose up --detach
<logs will show up>
$ psql -h localhost -p 5432 --username=postgres < pg_dump-1569491024.sql
password: <type password "postgres" here>
<logs will show up>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file thoth-storages-0.19.8.tar.gz
.
File metadata
- Download URL: thoth-storages-0.19.8.tar.gz
- Upload date:
- Size: 52.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/36.5.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be1bef4241149ee42e80d6999fc969c9288a89906ea7aca36e002dce6237fdb1 |
|
MD5 | 727f9a725b7c4bccfb06ed38b3224407 |
|
BLAKE2b-256 | 6469f00b9c11b38764bd1aadb89a22e2f27751b1b889ffcc348f3e78a45172cc |
File details
Details for the file thoth_storages-0.19.8-py3-none-any.whl
.
File metadata
- Download URL: thoth_storages-0.19.8-py3-none-any.whl
- Upload date:
- Size: 80.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/36.5.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0619682a5ec6d96ca9146ad350bd385ee9f24591b7211be7949be7ea5eee030a |
|
MD5 | 71d5aa30856f0b0e0f770bc78e0d7836 |
|
BLAKE2b-256 | f3b768a7a5b3c951da88cba6a61a8fd4436acff8dfde50e1cc3baca3f8aa46b6 |