No project description provided
Project description
ckanext-collection
Base classes for viewing data series from CKAN.
Content
- Requirements
- Installation
- Usage
- Documentation
- Config settings
- Integrations
- License
Requirements
Compatibility with core CKAN versions:
CKAN version | Compatible? |
---|---|
2.9 | no |
2.10 | yes |
master | yes |
Installation
To install ckanext-collection:
-
Install the extension:
pip install ckanext-collection
-
Add
collection
to theckan.plugins
setting in your CKAN config file .
Usage
Collections can be registered via ckanext.collection.interfaces.ICollection
or via CKAN signals. Registered collection can be initialized anywhere in code
using helper and can be used in a number of generic endpoints that render
collection as HTML of export it into different formats.
Registration via interface:
from ckanext.collection.interfaces import CollectionFactory, ICollection
class MyPlugin(p.SingletonPlugin):
p.implements(ICollection, inherit=True)
def get_collection_factories(self) -> dict[str, CollectionFactory]:
return {
"my-collection": MyCollection,
}
get_collection_factories
returns a dictionary with collection names(letters,
digits, underscores and hyphens are allowed) as keys, and collection factories
as values. In most generic case, collection factory is just a collection
class. But you can use any function with signature (str, dict[str, Any], **Any) -> Collection
as a factory. For example, the following function is a
valid collection factory and it can be returned from get_collection_factories
def my_factory(name: str, params: dict[str, Any], **kwargs: Any):
"""Collection that shows 100 numbers per page"""
params.setdefault("rows_per_page", 100)
return MyCollection(name, params, **kwargs)
If you want to register a collection only if collection plugin is enabled, you can use CKAN signals instead of wrapping import from ckanext-collection into try except block:
class MyPlugin(p.SingletonPlugin):
p.implements(p.ISignal)
def get_signal_subscriptions(self) -> types.SignalMapping:
return {
tk.signals.ckanext.signal("collection:register_collections"): [
self.collect_collection_factories,
],
}
def collect_collection_factories(self, sender: None):
return {
"my-collection": MyCollection,
}
Data returned from the signal subscription is exactly the same as from
ICollection.get_collection_factories
. The only difference, signal
subscription accepts sender
argument which is always None
, due to internal
implementation of signals.
Documentation
Overview
The goal of this plugin is to supply you with generic classes for processing collections of data. As result, it doesn't do much out of the box and you have to write some code to see a result.
Majority of useful classes are available inside ckanext.collection.utils
module and all examples bellow require the following line in the beginning of
the script: from ckanext.collection.utils import *
.
Let's start with the basics. ckanext-collection
defines a few collections for
different puproses. The most basic collection is Collection
, but it has no
value without customization, so we'll start from StaticCollection
:
col = StaticCollection("name", {})
Constructor of any collection has two mandatory arguments: name and parameters. Name is mostly used internally and consists of any combination of letters, digits, hyphens and underscores. Parameters are passed inside the dictionary and they change the content of the collection.
In the most basic scenario, collection represents a number of similar items: datasets, users, organizations, dictionaries, numbers, etc. As result, it can be transformed into list or iterated over:
list(col)
for item in col:
print(item)
Our test collection is empty at the moment, so you will not see anything just
yet. Usually, StaticCollection
contains static data, specified when
collection is created. But because we haven't specified any data, collection
contains nothing.
To fix this problem, we have to configure a part of the collection responsible
for data production using its settings. Collection divides its internal
logic between a number of configurable services, and service that we need is
called data service. To modify it, we can pass a named argument called
data_settings
to the collection's constructor:
col = StaticCollection(
"name", {},
data_settings={"data": [1,2,3]}
)
Now try again iterating over the collection and now you'll see the result:
for item in col:
print(item)
It's not very impressive, but you didn't expect much from static collection, right? There are other collections that are more smart, but we have to learn more concepts of this extension to use them, so for now we'll only take a brief look on them.
Note: collections have certain restrictions when it comes to amount of
data. By default, you'll see only around 10 records, even if you have more. The
same is true for StaticCollection
- you can see it if you set data
attribute of its data-service to range(1, 100)
. We'll learn how to control
these restrictions later.
StaticCollection
works with static data. It can be used for tests or as a
placeholder for a collection that is not yet implemented. In rare cases it can
be used with arbitrary iterable to create a standard interface for data
interaction.
ModelCollection
works with SQLAlchemy models. We are going to use two
attributes of its data-service: model
and is_scalar
. The former sets actual
model that collection processes, while the latter controls, how we work with
every individual record. By default, ModelCollection
returns every record as
a number of columns, but we'll set is_scalar=True
and receive model instance
for every record instead:
col = ModelCollection(
"", {},
data_settings={"is_scalar": True, "model": model.User}
)
for user in col:
assert isinstance(user, model.User)
print(f"{user.name}, {user.email}")
ApiSearchCollection
works with API actions similar to package_search
. They
have to use rows
and start
parameters for pagination and their result must
contain count
and results
keys. Its data-service accepts action
attribute
with the name of API action that produces the data:
col = ApiSearchCollection(
"", {},
data_settings={"action": "package_search"}
)
for pkg in col:
print(f"{pkg['id']}: {pkg['title']}")
ApiListCollection
works with API actions similar to package_list
. They have
to use limit
and offset
parameters for pagination and their result must be
represented by a list.
col = ApiListCollection(
"", {},
data_settings={"action": "package_list"}
)
for name in col:
print(name)
ApiCollection
works with API actions similar to user_list
. They have to
return all records at once, as list.
col = ApiCollection(
"", {},
data_settings={"action": "user_list"}
)
for user in col:
print(user["name"])
Collection intialization
Collection constructor has two mandatory arguments: name and parameters.
Name is used as collection identifier and it's better to keep this value unique
accross collections. For example, name is used for computing HTML table id
attribute when serializing collection as an HTML table. If you render two
collections with the same name, you'll get two identical IDs on the page.
Params are usually used by data and pager service for searching, sorting,
etc. Collection does not keep all the params. Instead, it stores only items
with key prefixed by <name>:
. I.e, if collection has name hello
, and you
pass {"hello:a": 1, "b": 2, "world:c": 3}
, collection will remove b
(because
it has no collection name plus colon prefix) and world:c
members(because it
uses world
instead of hello
in prefix). As for hello:a
, collection strips
<name>:
prefix from it. So, in the end, collection stores {"a": 1}
. You
can check params of the collection using params
attribute:
col = Collection("hello", {"hello:a": 1, "b": 2, "world:c": 3})
assert col.params == {"a": 1}
col = Collection("world", {"hello:a": 1, "b": 2, "world:c": 3})
assert col.params == {"c": 3}
It allows you rendering and processing multiple collections simultaneously on
the same page. Imagine that you have collection users
and collection
packages
. You want to see second page of users
and fifth of
packages
. Submit the query string ?users:page=2&packages:page=5
and
initialize collections using the following code:
from ckan.logic import parse_params
from ckan.plugins import toolkit as tk
params = parse_params(tk.request.args)
users = ModelCollection(
"users", params,
data_settings={"model": model.User}
)
packages = ModelCollection(
"packages", params,
data_settings={"model": model.Package}
)
assert users.pager.page == 2
assert packages.pager.page == 5
Services
Collection itself contains just a bare minimum of logic, and all the heavy-lifting is delegated to services. Collection knows how to initialize services and usually the only difference between all your collections, is the way all their services are configured.
Collection contains the following services:
data
: controls the exact data that can be received from collection. Contains logic for searching, filters, sorting, etc.pager
: defines restrictions for data iteration. Exactly this service shows only 10 records when you iterating over static collectionserializer
: specifies how collection can be transformed into desired form. Using correct serializer you'll be able to dump the whole collection as CSV, JSON, YAML or render it as HTML table.columns
: contains configuration of specific data columns used by other services. It may define model attributes that are dumped into CSV, names of the transformation functions that are applied to the certain attribute, names of the columns that are available for sorting in HTML representation of data.filters
: contains configuration of additional widgets produced during data serialization. For example, when data is serialized into an HTML table, filters can define configuration of dropdowns and input fields from the data search form.
Note: You can define more services in custom collections. The list above
enumerates all the services that are available in the base collection and in
all collections shipped with the current extension. For example, one of
built-in collections, DbCollection
has additional service called
db_connection
that can communicate with DB.
When a collection is created, it creates an instance of each service using service factories and service settings. Base collection and all collections that extend it already have all details for initializing every service:
col = Collection("name", {})
print(f"""Services:
{col.data=},
{col.pager=},
{col.serializer=},
{col.columns=},
{col.filters=}""")
assert list(col) == []
This collection has no data. We can initialize an instance of StaticData
and
replace the existing data service of the collection with new StaticData
instance.
Every service has one required argument: collection that owns the service. All
other arguments are used as a service settings and must be passed by
name. Remember, all the classes used in this manual are available inside
ckanext.collection.utils
:
static_data = StaticData(col, data=[1,2,3])
col.replace_service(static_data)
assert list(col) == [1, 2, 3]
Look at Colletion.replace_service
. It accepts only service instance. There is
no need to pass the name of the service that must be replaced - collection can
understand it without help. And pay attention to the first argument of service
constructor. It must be the collection that is going to use the service. Some
services may work even if you pass a random value as the first argument, but
it's an exceptional situation and one shouldn't rely on it.
If existing collection is no longer used and you are going to create a new one,
you sometimes want to reuse a service from an existing collection. Just to
avoid creating the service and calling Collection.replace_service
, which will
save you two lines of code. In this case, use <service>_instance
parameter of
the collection constructor:
another_col = Collection("another-name", {}, data_instance=col.data)
assert list(another_col) == [1, 2, 3]
If you do such thing, make sure you are not using old collection anymore. You just transfered one of its services to another collection, so there is no guarantees that old collection with detached service will function properly.
It's usually better to customize service factory, instead of passing existing
customized instance of the service around. You can tell which class to use for
making an instance of a service using <service>_factory
parameter of the
collection contstructor:
col = Collection("name", {}, data_factory=StaticData)
assert list(col) == []
But in this way we cannot specify the data
attribute of the data
factory!
No worries, there are multiple ways to overcome this problem. First of all, all
the settings of the service are available as its attributes. It means that
data
setting is the same as data
attribute of the service. If you can do
StaticData(..., data=...)
, you can as well do service = StaticData(...); service.data = ...
:
col = Collection("name", {}, data_factory=StaticData)
col.data.data = [1, 2, 3]
assert list(col) == [1, 2, 3]
Note: data
service caches its data. If you already accessed data property
from the StaticData
, assigning an new value doesn't have any effect because
of the cache. You have to call col.data.refresh_data()
after assigning to
rebuild the cache.
But there is a better way. You can pass <service>_settings
dictionary to the
collection constructor and it will be passed down into corresponding service
factory:
col = Collection(
"name", {},
data_factory=StaticData,
data_settings={"data": [1, 2, 3]}
)
assert list(col) == [1, 2, 3]
It works well for individual scenarios, but when you are creating a lot of
collections with the static data, you want to omit some standard parameters. In
this case you should define a new class that extends Collection and declares
<Service>Factory
attribute:
class MyCollection(Collection):
DataFactory = StaticData
col = MyCollection(
"name", {},
data_settings={"data": [1, 2, 3]}
)
assert list(col) == [1, 2, 3]
You still can pass data_factory
into MyCollection
constructor to override
data service factory. But now, by default, StaticData
is used when it's not
specified explicitly.
Finally, if you want to create a subclass of service, that has a specific value of certain attributes, i.e something like this:
class OneTwoThreeData(StaticData):
data = [1, 2, 3]
you can use Service.with_attributes(attr_name=attr_value)
factory method. It
produce a new service class(factory) with specified attributes bound to a
static value. For example, that's how we can define a collection, that always
contains [1, 2, 3]
:
class MyCollection(Collection):
DataFactory = StaticData.with_attributes(data=[1, 2, 3])
col = MyCollection("name", {})
assert list(col) == [1, 2, 3]
Now you don't have to specify data_factory
or data_settings
when creating a
collection. It will always use StaticData
with data
set to [1, 2, 3]
. Make sure you mean it, because now you cannot override the data using
data_settings
.
Common logic
All services share a few common features. First of all, all services contain a
reference to the collection that uses/owns the service. Only one collection can
own the service. If you move service from one collection to another, you must
never use the old collection, that no longer owns the service. Depending on
internal implementation of the service, it may work without changes, but we
recommend removing such collections. At any point you can get the collection
that owns the service via attached
attribute of the service:
col = Collection("name", {})
assert col.data.attached is col
assert col.pager.attached is col
assert col.columns.attached is col
another_col = Collection(
"another-name", {},
data_instance=col.data
)
assert col.data.attached is not col
assert col.data.attached is another_col
assert col.data is another_col.data
Second common point of services is settings. Let's use StaticData
for
tests. It has one configurable attribute(setting) - data
. We can specify it
directly when creating data service instance: StaticData(..., data=DATA)
. Or
we can specify it via data_settings
when creating a collection:
StaticCollection("name", {}, data_settings={"data": DATA})
. In both cases
DATA
will be available as a data
attribute of the data service. But it
doesn't mean that we can pass just any attribute in this way:
data = StaticData(col, data=[], not_real=True)
assert hasattr(data, "data")
assert not hasattr(data, "not_real")
To allow overriding the value of attribute via settings, we have to define this
attribute as a configurable attribute. For this we need
configurable_attribute
function from ckanext.collection.shared
:
class MyData(StaticData):
i_am_real = configurable_attribute(False)
data = MyData(col, data=[], i_am_real=True)
assert hasattr(data, "data")
assert hasattr(data, "i_am_real")
assert data.i_am_real is True
configurable_attribute
accepts either positional default value of the
attribute, or named default_factory
function that generated default value
every time new instance of the service is created. default_factory
must
accept a single argument - a new service that is instantiated at the moment:
class MyData(StaticData):
ref = 42
i_am_real = shared.configurable_attribute(default_factory=lambda self: self.ref * 10)
data = MyData(col, data=[])
assert data.i_am_real == 420
Never use another configurable attributes in the default_factory
- order in
which configurable attributes are initialized is not strictly defined. At the
moment of writing this manual, configurable attributes were initialized in
alphabetical order, but this implementation detail may change in future without
notice.
TODO: with_attributes
Data service
This service produces the data for collection. Every data service must:
- be Iterable and iterate over all available records by default
- define
total
property, that reflects number of available records so thatlen(list(data)) == data.total
- define
range(start: Any, end: Any)
method that returns slice of the data
Base class for data services - Data
- already contains a simple version of
this logic. You need to define only one method to make you custom
implementations: compute_data()
. When data if accessed for the first time,
compute_data
is called. Its result cached and used for iteration in
for-loops, slicing via range
method and size measurement via total
property.
class CustomData(Data):
def compute_data(self) -> Any:
return "abcdefghijklmnopqrstuvwxyz"
col = Collection("name", {}, data_factory=CustomData)
assert list(col) == ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
assert col.data.total == 26
assert col.data.range(-3, None) == "xyz"
If you need more complex data source, make sure you defined __iter__
,
total
, and range
:
class CustomData(Data):
names = configurable_attribute(default_factory=["Anna", "Henry", "Mary"])
@property
def total(self):
return len(self.names)
def __iter__(self):
yield from sorted(self.names)
def range(self, start: Any, end: Any):
if not isinstance(start, str) or not isinstance(end, str):
return []
for name in self:
if name < start:
continue
if name > end:
break
yield name
Pager service
Pager service sets the upper and lower bounds on data used by
collection. Default pager used by collection relies on numeric start
/end
values. But it's possible to define custom pager that uses alphabetical or
temporal bounds, as long as range
method of your custom data service supports
these bounds.
Standard pager(ClassicPager
) has two configurable attributes: page
(default:
- and
rows_per_page
(default: 10).
col = StaticCollection("name", {})
assert col.pager.page == 1
assert col.pager.rows_per_page == 10
Because of these values you see only first 10 records from data when iterating the collection. Let's change pager settings:
col = StaticCollection(
"name", {},
data_settings={"data": range(1, 100)},
pager_settings={"page": 3, "rows_per_page": 6}
)
assert list(col) == [13, 14, 15, 16, 17, 18]
Pagination details are often passed with search parameters and have huge
implact on the required data frame. Because of it, if pager_settings
are
missing, ClassicPager
will look for settings inside collection
parameters(second argument of the collection constructor). But in this case,
pager will use only items that has <collection name>:
prefix:
col = StaticCollection(
"xxx",
{"xxx:page": 3, "xxx:rows_per_page": 6},
data_settings={"data": range(1, 100)}
)
assert list(col) == [13, 14, 15, 16, 17, 18]
col = StaticCollection(
"xxx",
{"page": 3, "rows_per_page": 6},
data_settings={"data": range(1, 100)}
)
assert list(col) == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Serializer service
Serializer converts data into textual, binary or any other alternative
representation. For example, if you want to compute records produced by the
data
service of the collection into pandas' DataFrame, you should probably
use serializer.
Serializers are main users of columns service, because it contains details
about specific data columns. And serializers often iterate data service
directly(ignoring range
method), to serialize all available records.
The only required method for serializer is serialize
. This method must return
an data from data
service transformed into format provided by serializer. For
example, JsonSerializer
returns string with JSON-encoded data.
You are not restricted by textual or binary formats. Serializer that transforms data into pandas' DataFrame is completely valid version of the serializer.
class NewLineSerializer(Serializer):
def serialize(self):
result = ""
for item in self.attached.data:
result += str(item) + "\n"
return result
col = StaticCollection(
"name", {},
serializer_factory=NewLineSerializer,
data_settings={"data": [1, 2, 3]}
)
assert "".join(col.serializer.serialize()) == "1\n2\n3\n"
Columns service
This service contains additional information about separate columns of data records. It defines following settings:
- names: all available column names. Used by other settings of columns service
- hidden: columns that should not be shown by serializer. Used by serializer services
- visible: columns that must be shown by serializer. Used by serializer services
- sortable: columns that support sorting. Used by data services
- filterable: columns that support filtration/facetting. Used by data services
- searchable: columns that support search by partial match. Used by data services
- labels: human readable labels for columns. Used by serializer services
This service contains information used by other service, so defining additional
attributes here is completely normal. For example, some custom serializer, that
serializes data into ORC, can expect orc_format
attribute in the columns
service to be available. So you can add as much additional column related
details as required into this service.
Filters service
This service used only by HTML table serializers at the moment. It has two
configurable attributes static_filters
and static_actions
. static_filters
are used for building search form for the data table. static_actions
are not
used, but you can put into it details about batch or record-level actions and
use these details to extend one of standard serializers. For example,
ckanext-admin-panel defines allowed actions (remove, restore, hide) for content
and creates custom templates that are referring these actions.
Core classes and usage examples
TBA
Data
TBA
StaticData
TBA
BaseSaData
TBA
StatementSaData
TBA
UnionSaData
TBA
ModelData
TBA
ApiData
TBA
ApiSearchData
TBA
ApiListData
TBA
Pager
TBA
ClassicPager
TBA
Columns
TBA
Filters
TBA
Serializer
TBA
CsvSerializer
TBA
JsonlSerializer
TBA
JsonSerializer
TBA
HtmlSerializer
TBA
TableSerializer
TBA
HtmxTableSerializer
TBA
Config settings
# Names of registered collections that are viewable by any visitor, including
# anonymous.
# (optional, default: )
ckanext.collection.auth.anonymous_collections =
# Names of registered collections that are viewable by any authenticated
# user.
# (optional, default: )
ckanext.collection.auth.authenticated_collections =
# Add HTMX asset to pages. Enable this option if you are using CKAN v2.10
# (optional, default: false)
ckanext.collection.include_htmx_asset = false
# Initialize CKAN JS modules every time HTMX fetches HTML from the server.
# (optional, default: false)
ckanext.collection.htmx_init_modules = false
# Import path for serializer used by CSV export endpoint.
# (optional, default: ckanext.collection.utils.serialize:CsvSerializer)
ckanext.collection.export.csv.serializer = ckanext.collection.utils.serialize:CsvSerializer
# Import path for serializer used by JSON export endpoint.
# (optional, default: ckanext.collection.utils.serialize:JsonSerializer)
ckanext.collection.export.json.serializer = ckanext.collection.utils.serialize:JsonSerializer
# Import path for serializer used by JSONl export endpoint.
# (optional, default: ckanext.collection.utils.serialize:JsonlSerializer)
ckanext.collection.export.jsonl.serializer = ckanext.collection.utils.serialize:JsonlSerializer
# Import path for serializer used by `format`-export endpoint.
# (optional, default: )
ckanext.collection.export.<format>.serializer =
Integrations
ckanext-admin-panel
To enable configuration form of ckanext-collection in the admin panel, enable the following arbitrary schema
scheming.arbitrary_schemas =
ckanext.collection:ap_config.yaml
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ckanext_collection-0.1.2.tar.gz
.
File metadata
- Download URL: ckanext_collection-0.1.2.tar.gz
- Upload date:
- Size: 94.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4db942b295008dd2cb4f713bf644b035d3a7db859117a3ab2454cae25d1670d4 |
|
MD5 | 5addc3a13de24c05d18cf32a06889a94 |
|
BLAKE2b-256 | b8e474e236d89aafdd138b468ef03204970fd450d929238566cfacf0eb2a962b |
File details
Details for the file ckanext_collection-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: ckanext_collection-0.1.2-py3-none-any.whl
- Upload date:
- Size: 102.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4a412d4d99fdd756204364df44fcb906ea2887618510c7d2b92dec2a1bb2f6f |
|
MD5 | 117187266033d4cb3c6981dacca2f779 |
|
BLAKE2b-256 | bea75e9076f7ee475b2a313e19b9f0ffdde257630cb7e1e96710b776896e4ab5 |