A configurable pipeline, aimed at transforming content for import and export
Project description
Transmogrifier provides support for building pipelines that turn one thing into another. Specifically, transmogrifier pipelines are used to convert and import legacy content into a Plone site. It provides the tools to construct pipelines from multiple sections, where each section processes the data flowing through the pipe.
A “transmogrifier pipeline” refers to a description of a set of pipe sections, slotted together in a set order. The stated goal is for these sections to transform data and ultimately add content to a Plone site based on this data. Sections deal with tasks such as sourcing the data (from textfiles, databases, etc.) and characterset conversion, through to determining portal type, location and workflow state.
Note that a transmogrifier pipeline can be used to process any number of things, and is not specific to Plone content import. However, it’s original intent is to provide a pluggable way to import legacy content.
Installation
See docs/INSTALL.txt for installation instructions.
Credits
- Development sponsored by
Elkjøp Nordic AS
- Design and development
- Project name
A transmogrifier is fictional device used for transforming one object into another object. The term was coined by Bill Waterson of Calvin and Hobbes fame.
Detailed Documentation
Pipelines
To transmogrify, or import and convert non-plone content, you simply define a pipeline. Pipe sections, the equivalent of parts in a buildout, are slotted together into a processing pipe. To slot sections together, you define a configuration file, define named sections, and a main pipeline definition that names the sections in order (one section per line):
>>> exampleconfig = """\ ... [transmogrifier] ... pipeline = ... section 1 ... section 2 ... section 3 ... ... [section 1] ... blueprint = collective.transmogrifier.tests.examplesource ... size = 5 ... ... [section 2] ... blueprint = collective.transmogrifier.tests.exampletransform ... ... [section 3] ... blueprint = collective.transmogrifier.tests.exampleconstructor ... """
As you can see this is also very similar to how you construct WSGI pipelines using paster. The format of the configuration files is defined by the Python ConfigParser module, with extensions that we’ll describe later. At minimum, at least the transmogrifier section with an empty pipeline is required:
>>> mimimalconfig = """\ ... [transmogrifier] ... pipeline = ... """
Transmogrifier can load these configuration files either by looking them up in a registry or by loading them from a python package.
You register transmogrifier configurations using the registerConfig directive in the http://namespaces.plone.org/transmogrifier namespace, together with a name, and optionally a title and description:
<configure xmlns="http://namespaces.zope.org/zope" xmlns:transmogrifier="http://namespaces.plone.org/transmogrifier" i18n_domain="collective.transmogrifier"> <transmogrifier:registerConfig name="exampleconfig" title="Example pipeline configuration" description="This is an example pipeline configuration" configuration="example.cfg" /> </configure>
You can then tell transmogrifier to load the ‘exampleconfig’ configuration. To load configuration files directly from a python package, name the package and the configuration file separated by a colon, such as ‘collective.transmogrifier.tests:exampleconfig.cfg’.
Registering files with the transmogrifier registry allows other uses, such as listing available configurations in a user interface, together with the registered description. Loading files directly let’s you build reusable libraries of configuration files more quickly though.
In this document we’ll use the shorthand registerConfig to register example configurations:
>>> registerConfig(u'collective.transmogrifier.tests.exampleconfig', ... exampleconfig)
Pipeline sections
Each section in the pipeline is created by a blueprint. Blueprints are looked up as named utilities implementing the ISectionBlueprint interface. In the transmogrifier configuration file, you refer to blueprints by the name under which they are registered. Blueprints are factories; when called they produce an ISection pipe section. ISections in turn, are iterators implementing the iterator protocol.
Here is a simple blueprint, in the form of a class definition:
>>> from zope.interface import classProvides, implements >>> from zope.component import provideUtility >>> class ExampleTransform(object): ... classProvides(ISectionBlueprint) ... implements(ISection) ... ... def __init__(self, transmogrifier, name, options, previous): ... self.previous = previous ... self.name = name ... ... def __iter__(self): ... for item in self.previous: ... item['exampletransformname'] = self.name ... yield item ... >>> provideUtility(ExampleTransform, ... name=u'collective.transmogrifier.tests.exampletransform')
Note that we register this class as a named utility, and that instances of this class can be used as an iterator. When slotted together, items ‘flow’ through the pipeline by iterating over the last section, which in turn iterates over it’s preceding section (self.previous in the example), and so on.
By iterating over the source, then yielding the items again, each section passes items on to the next section. During the iteration loop, sections can manipulate the items. Note that items are python dictionaries; sections simply operate on the keys they care about. In our example we add a new key, exampletransformname, which we set to the name of the section.
Sources
The items that flow through the pipe have to originate from somewhere though. This is where special sections, sources, come in. A source is simply a pipe section that inserts extra items into the pipeline. This is best illustrated with another example:
>>> class ExampleSource(object): ... classProvides(ISectionBlueprint) ... implements(ISection) ... ... def __init__(self, transmogrifier, name, options, previous): ... self.previous = previous ... self.size = int(options['size']) ... ... def __iter__(self): ... for item in self.previous: ... yield item ... ... for i in range(self.size): ... yield dict(id='item%02d' % i) ... >>> provideUtility(ExampleSource, ... name=u'collective.transmogrifier.tests.examplesource')
In this example we use the options dictionary to read options from the section configuration, which in the example configuration we gave earlier has the option size defined as 5. Note that the configuration values are always strings, so we need to convert the size option to an integer here.
The source first iterates over the previous section and yields all items unchanged. Only when that loop is done, does the source produce new items and puts those into the pipeline. This order is important: when you slot multiple source sections together, you want items produced by earlier sections to be processed first too.
There is always a previous section, even for the first section defined in the pipeline. Transmogrifier passes in a empty iterator when it instantiates this first section, expecting such a first section to be a source that’ll produce items for the pipeline to process.
Constructors
As stated before, transmogrifier is intended for importing content into a Plone site. However, transmogrifier itself only drives the pipeline, inserting an empty iterator and discarding whatever it pulls out of the last section.
In order to create content then, a constructor section is required. Like source sections, you should be able to use multiple constructors, so constructors should always start with yielding the items passed in from the previous section on to a possible next section.
So, a constructor section is an ISection that consumes items from the previous section, and affects the plone site based on items, usually by creating content objects based on these items, then yield the item for a next section. For example purposes, we simply pretty print the items instead:
>>> import pprint >>> class ExampleConstructor(object): ... classProvides(ISectionBlueprint) ... implements(ISection) ... ... def __init__(self, transmogrifier, name, options, previous): ... self.previous = previous ... self.pprint = pprint.PrettyPrinter().pprint ... ... def __iter__(self): ... for item in self.previous: ... self.pprint(sorted(item.items())) ... yield item ... >>> provideUtility(ExampleConstructor, ... name=u'collective.transmogrifier.tests.exampleconstructor')
With this last section blueprint example completed, we can load the example configuration we created earlier, and run our transmogrification:
>>> from collective.transmogrifier.transmogrifier import Transmogrifier >>> transmogrifier = Transmogrifier(plone) >>> transmogrifier(u'collective.transmogrifier.tests.exampleconfig') [('exampletransformname', 'section 2'), ('id', 'item00')] [('exampletransformname', 'section 2'), ('id', 'item01')] [('exampletransformname', 'section 2'), ('id', 'item02')] [('exampletransformname', 'section 2'), ('id', 'item03')] [('exampletransformname', 'section 2'), ('id', 'item04')]
Developing blueprints
As we could see from the ISectionBlueprint examples above, a blueprint gets called with several arguments: transmogrifier, name, options and previous.
We discussed previous before, it is a reference to the previous pipe section and must be looped over when the section itself is iterated. The name argument is simply the name of the section as given in the configuration file.
The transmogrifier argument is a reference to the transmogrifier itself, and it can be used to reach the context we are importing to through it’s context attribute. The transmogrifier also acts as a dictionary, mapping from section names to a mapping of the options in each section.
Finally, as seen before, the options argument is a mapping of the current section options. It is the same mapping as can be had through transmogrifier[name].
A short example shows each of these arguments in action:
>>> class TitleExampleSection(object): ... classProvides(ISectionBlueprint) ... implements(ISection) ... ... def __init__(self, transmogrifier, name, options, previous): ... self.transmogrifier = transmogrifier ... self.name = name ... self.options = options ... self.previous = previous ... ... pipeline = transmogrifier['transmogrifier']['pipeline'] ... pipeline_size = len([s.strip() for s in pipeline.split('\n') ... if s.strip()]) ... self.size = options['pipeline-size'] = str(pipeline_size) ... self.site_title = transmogrifier.context.Title() ... ... def __iter__(self): ... for item in self.previous: ... item['pipeline-size'] = self.size ... item['title'] = '%s - %s' % (self.site_title, item['id']) ... yield item >>> provideUtility(TitleExampleSection, ... name=u'collective.transmogrifier.tests.titleexample') >>> titlepipeline = """\ ... [transmogrifier] ... pipeline = ... section1 ... titlesection ... section3 ... ... [section1] ... blueprint = collective.transmogrifier.tests.examplesource ... size = 5 ... ... [titlesection] ... blueprint = collective.transmogrifier.tests.titleexample ... ... [section3] ... blueprint = collective.transmogrifier.tests.exampleconstructor ... """ >>> registerConfig(u'collective.transmogrifier.tests.titlepipeline', ... titlepipeline) >>> plone.Title() u'Plone Test Site' >>> transmogrifier = Transmogrifier(plone) >>> transmogrifier(u'collective.transmogrifier.tests.titlepipeline') [('id', 'item00'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item00')] [('id', 'item01'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item01')] [('id', 'item02'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item02')] [('id', 'item03'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item03')] [('id', 'item04'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item04')]
Configuration file syntax
As mentioned earlier, the configuration files use the format defined by the Python ConfigParser module with extensions. The extensions are based on the zc.buildout extensions and are:
option names are case sensitive
option values can use a substitution syntax, described below, to refer to option values in specific sections.
you can include other configuration files, see Including other configurations.
The ConfigParser syntax is very flexible. Section names can contain any characters other than newlines and right square braces (“]”). Option names can contain any characters (within the ASCII character set) other than newlines, colons, and equal signs, can not start with a space, and don’t include trailing spaces.
It is a good idea to keep section and option names simple, sticking to alphanumeric characters, hyphens, and periods.
Variable substitution
Transmogrifier supports a string.Template-like syntax for variable substitution, using both the section and the option name joined by a colon:
>>> substitutionexample = """\ ... [transmogrifier] ... pipeline = ... section1 ... section2 ... section3 ... ... [definitions] ... item_count = 3 ... ... [section1] ... blueprint = collective.transmogrifier.tests.examplesource ... size = ${definitions:item_count} ... ... [section2] ... blueprint = collective.transmogrifier.tests.exampletransform ... ... [section3] ... blueprint = collective.transmogrifier.tests.exampleconstructor ... """ >>> registerConfig(u'collective.transmogrifier.tests.substitutionexample', ... substitutionexample)Here we created an extra section called definitions, and refer to the item_count option defined in that section to set the size of the section1 pipeline section, so we only get 3 items when we execute this pipeline:
>>> transmogrifier = Transmogrifier(plone) >>> transmogrifier(u'collective.transmogrifier.tests.substitutionexample') [('exampletransformname', 'section2'), ('id', 'item00')] [('exampletransformname', 'section2'), ('id', 'item01')] [('exampletransformname', 'section2'), ('id', 'item02')]
Including other configurations
You can include other transmogrifier configurations with the include option in the transmogrifier section. This option takes a list of configuration ids, separated by whitespace. All sections and options from those configuration files will be included provided the options weren’t already present. This works recursively; inclusions in the included configuration files are honoured too:
>>> inclusionexample = """\ ... [transmogrifier] ... include = ... collective.transmogrifier.tests.sources ... collective.transmogrifier.tests.base ... ... [section1] ... size = 3 ... """ >>> registerConfig(u'collective.transmogrifier.tests.inclusionexample', ... inclusionexample) >>> sources = """\ ... [section1] ... blueprint = collective.transmogrifier.tests.examplesource ... size = 10 ... """ >>> registerConfig(u'collective.transmogrifier.tests.sources', ... sources) >>> base = """\ ... [transmogrifier] ... pipeline = ... section1 ... section2 ... section3 ... include = collective.transmogrifier.tests.constructor ... ... [section2] ... blueprint = collective.transmogrifier.tests.exampletransform ... """ >>> registerConfig(u'collective.transmogrifier.tests.base', ... base) >>> constructor = """\ ... [section3] ... blueprint = collective.transmogrifier.tests.exampleconstructor ... """ >>> registerConfig(u'collective.transmogrifier.tests.constructor', ... constructor) >>> transmogrifier = Transmogrifier(plone) >>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample') [('exampletransformname', 'section2'), ('id', 'item00')] [('exampletransformname', 'section2'), ('id', 'item01')] [('exampletransformname', 'section2'), ('id', 'item02')]
Like zc.buildout configurations, we can also add or remove lines from included configuration options, by using the += and -= syntax:
>>> advancedinclusionexample = """\ ... [transmogrifier] ... include = ... collective.transmogrifier.tests.inclusionexample ... pipeline -= ... section2 ... section3 ... pipeline += ... section4 ... section3 ... ... [section4] ... blueprint = collective.transmogrifier.tests.titleexample ... """ >>> registerConfig(u'collective.transmogrifier.tests.advancedinclusionexample', ... advancedinclusionexample) >>> transmogrifier = Transmogrifier(plone) >>> transmogrifier(u'collective.transmogrifier.tests.advancedinclusionexample') [('id', 'item00'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item00')] [('id', 'item01'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item01')] [('id', 'item02'), ('pipeline-size', '3'), ('title', u'Plone Test Site - item02')]
When calling transmogrifier, you can provide your own sections too: any extra keyword is interpreted as a section dictionary. Do make sure you use string values though:
>>> transmogrifier(u'collective.transmogrifier.tests.inclusionexample', ... section1=dict(size='1')) [('exampletransformname', 'section2'), ('id', 'item00')]
Conventions
At its most basic level, transmogrifier pipelines are just iterators passing ‘things’ around. Transmogrifier doesn’t expect anything more than being able to iterate over the pipeline and doesn’t dictate what happens within that pipeline, what defines a ‘thing’ or what ultimately gets accomplished.
But as has been stated repeatedly, transmogrifier has been developed to facilitate importing legacy content, processing data in incremental steps until a final section constructs new content.
To reach this end, several conventions have been established that help the various pipeline sections work together.
Items are mappings
The first one is that the ‘things’ passed from section to section are mappings; i.e. they are or behave just like python dictionaries. Again, transmogrifier doesn’t produce these by itself, source sections (see Sources) produce them by injecting them into the stream.
Keys are fields
Secondly, all keys in such mappings that do not start with an underscore will be used by constructor sections (see Constructors) to construct Plone content. So keys that do not start with an underscore are expected to map to Archetypes fields or Zope3 schema fields or whatever the constructor expects.
Paths are to the target object
Many sections either create objects (constructors) or operate on already-constructed or pre-existing objecs. Such sections should interpret paths as the complete path for the object. For constructors this means they’ll need to split the path into a container path and an id in order for them to find the correct context for constructing the object.
Keys with a leading underscore are controllers
This leaves the keys that do start with a leading underscore to have special meaning to specific sections, allowing earlier pipeline sections to inject ‘control statements’ for later sections in the item mapping. To avoid name clashes, sections that do expect such controller keys should use prefixes based on the name under which their blueprint was registered, plus optionally the name of the pipe section. This allows for precise targeting of pipe sections when inserting such keys.
We’ll illustrate this with an example. Let’s say a source section loads news items from a database, but the database tables for such items hold filenames to point to binary image data. Rather than have this section load those filenames directly and add them to the item for image creation, a generic ‘file loader’ section is used to do this. Let’s suppose that this file loader is registered as acme.transmogrifier.fileloader. This section then could be instructed to load files and store them in a named key by using 2 ‘controller’ keys named _acme.transmogrifier.fileloader_filename and _acme.transmogrifier.fileloader_targetkey. If the source section were to create pipeline items with those keys, this later fileloader section would then automatically load the filenames and inject them into the items in the right location.
If you need 2 such loaders, you can target them each individually by including their section names; so to target just the imageloader1 section you’d use the keys _acme.transmogrifier.fileloader_imageloader1_filename and _acme.transmogrifier.fileloader_imageloader1_targetkey. Sections that support such targeting should prefer such section specific keys over those only using the blueprint name.
The collective.transmogrifier.utils module has a handy utility method called defaultKeys that’ll generate these keys for you for easy matching:
>>> from collective.transmogrifier import utils >>> keys = utils.defaultKeys('acme.transmogrifier.fileloader', ... 'imageloader1', 'filename') >>> pprint.pprint(keys) ('_acme.transmogrifier.fileloader_imageloader1_filename', '_acme.transmogrifier.fileloader_filename', '_imageloader1_filename', '_filename') >>> utils.Matcher(*keys)('_filename', '_imageloader1_filename') ('_imageloader1_filename', True)
Keep memory use to a minimum
The above example is a little contrived of course; you’d generally configure a file loader section with a key name to grab the filename from, and perhaps put the loader after the constructor section and load the image data straight into the already constructed content item instead. This lowers memory requirements as image data can go directly into the ZODB this way, and the content object can be deactivated after the binary data has been stored.
By operating on one item at a time, a transmogrifier pipeline can handle huge numbers of content without breaking memory limits; individual sections should also avoid using memory unnecessarily.
Previous sections go first
As mentioned in the Sources section, when inserting new items into the stream, generally previous pipe sections come first. This way someone constructing a pipeline knows what source section will be processed earlier (those slotted earlier in the pipeline) and can adjust expectations accordingly. This makes content construction more predictable when dealing with multiple sources.
An exception would be a Folder Source, which inserts additional Folder items into the pipeline to ensure that the required container for any given content item exists at construction time. Such a source would inject extra items as needed, not before or after the previous source section.
Iterators have 3 stages
Some tasks have to happen before the pipeline runs, or after all content has been created. In such cases it is handy to realise that iteration within a section consists of three stages: before iteration, iteration itself, and after iteration.
For example, a section creating references may have to wait for all content to be created before it can insert the references. In this case it could build a queue during iteration, and only when the previous pipe section has been exhausted and the last item has been yielded would the section reach into the portal and create all the references.
Sources following the Previous sections go first convention basically inject the new items in the after iteration stage.
Here’s a piece of psuedo code to illustrate these 3 stages:
def __iter__(self): # Before iteration # You can do initialisation here for item in self.previous # Iteration itself # You could process the items, take notes, inject additional # items based on the current item in the pipe or manipulate portal # content created by previous items yield item # After iteration # The section still has control here and could inject additional # items, manipulate all portal content created by the pipeline, # or clean up after itself.
You can get quite creative with this. For example, the reference creator could get quite creative and defer creation of references until it knew the referenced object has been created too and periodically create these references. This would keep memory requirements smaller as not all references to create have to be remembered.
Store pipeline-wide information in annotations
If, for some reason or other, you need to remember state across section instances that is pipeline-wide (such as database connections, or data counters), such information should be stored as annotations on the transmogrifier object:
from zope.annotation.interfaces import IAnnotations MYKEY = 'foo.bar.baz' def __init__(self, transmogrifier, name, options, previous): self.storage = IAnnotations(transmogrifier).setdefault(MYKEY, {}) self.storage.setdefault('spam', 0) ... def __iter__(self): ... self.storage['spam'] += 1 ...
GenericSetup import integration
To ease running a transmogrifier pipeline during site configuration, a generic import step for GenericSetup is included.
The import step looks for a file named transmogrifier.txt and reads pipeline configuration names from this file, one name per line. Empty lines and lines starting with a # (hash mark) are skipped. These pipelines are then executed in the same order as they are found in the file.
This means that if you want to run one or more pipelines as part of a GenericSetup profile, all you have to do is name these pipelines in a file named transmogrifier.txt in your profile directory.
The GenericSetup import context is stored on the transmogrifier as an annotation:
from collective.transmogrifier.genericsetup import IMPORT_CONTEXT from zope.annotation.interfaces import IAnnotations def __init__(self, transmogrifier, name, options, previous): self.import_context = IAnnotations(transmogrifier)[IMPORT_CONTEXT]
This will of course prevent your code from running outside the generic setup import context.
Default section blueprints
Constructor section
A constructor pipeline section is the heart of a transmogrifier content import pipeline. It constructs Plone content based on the items it processes. The constructor section blueprint name is collective.transmogrifier.sections.constructor. Constructor sections do only one thing, they construct new content. No schema changes are made. Also, constructors create content without restrictions, no security checks or containment constraints are checked.
Construction needs 2 pieces of information: the path to the item (including the id for the new item itself) and it’s portal type. To determine both of these, the constructor section inspects each item and looks for 2 keys, as described below. Any item missing any of these 2 pieces will be skipped. Similarly, items with a path for a container or type that doesn’t exist will be skipped as well; make sure that these containers are constructed beforehand. Because a constructor section will only construct new objects, if an object with the same path already exists, the item will also be skipped.
For the object path, it’ll look (in order) for _collective.transmogrifier.sections.constructor_[sectionname]_path, _collective.transmogrifier.sections.constructor_path, _[sectionname]_path, and _path, where [sectionname] is replaced with the name given to the current section. This allows you to target the right section precisely if needed. Alternatively, you can specify what key to use for the path by specifying the path-key option, which should be a list of keys to try (one key per line, use a re: or regexp: prefix to specify regular expressions).
For the portal type, use the type-key option to specify a set of keys just like path-key. If omitted, the constructor will look for _collective.transmogrifier.sections.constructor_[sectionname]_type, _collective.transmogrifier.sections.constructor_type, _[sectionname]_type, _type, portal_type and Type (in that order, with [sectionname] replaced).
Unicode paths will be encoded to ASCII. Using the path and type, a new object will be constructed using invokeFactory; nothing else is done. Paths are always interpreted as relative to the context object, with the last path segment being the id of the object to create.
By default the constructor section will log a warning if the container for the item is missing and the item can’t be constructed. However if you add a required = True key to the constructor section it will instead raise a KeyError.
>>> import pprint >>> constructor = """ ... [transmogrifier] ... pipeline = ... contentsource ... constructor ... printer ... ... [contentsource] ... blueprint = collective.transmogrifier.sections.tests.contentsource ... ... [constructor] ... blueprint = collective.transmogrifier.sections.constructor ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.constructor', ... constructor) >>> transmogrifier(u'collective.transmogrifier.sections.tests.constructor') [('_path', '/spam/eggs/foo'), ('_type', 'FooType')] [('_path', '/foo'), ('_type', 'FooType')] [('_path', u'/unicode/encoded/to/ascii'), ('_type', 'FooType')] [('_path', 'not/existing/bar'), ('_type', 'BarType'), ('title', 'Should not be constructed, not an existing path')] [('_path', '/spam/eggs/existing'), ('_type', 'FooType'), ('title', 'Should not be constructed, an existing object')] [('_path', '/spam/eggs/incomplete'), ('title', 'Should not be constructed, no type')] [('_path', '/spam/eggs/nosuchtype'), ('_type', 'NonExisting'), ('title', 'Should not be constructed, not an existing type')] [('_path', 'spam/eggs/changedByFactory'), ('_type', 'FooType'), ('title', 'Factories are allowed to change the id')] >>> pprint.pprint(plone.constructed) (('spam/eggs', 'foo', 'FooType'), ('', 'foo', 'FooType'), ('unicode/encoded/to', 'ascii', 'FooType'), ('spam/eggs', 'changedByFactory', 'FooType'))>>> constructor = """ ... [transmogrifier] ... pipeline = ... contentsource ... constructor ... printer ... ... [contentsource] ... blueprint = collective.transmogrifier.sections.tests.contentsource ... ... [constructor] ... blueprint = collective.transmogrifier.sections.constructor ... required = True ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.constructor2', ... constructor) >>> try: ... transmogrifier(u'collective.transmogrifier.sections.tests.constructor2') ... raise AssertionError("Required constructor did not raise an error for missing folder") ... except KeyError: ... pass [('_path', '/spam/eggs/foo'), ('_type', 'FooType')] [('_path', '/foo'), ('_type', 'FooType')] [('_path', u'/unicode/encoded/to/ascii'), ('_type', 'FooType')]
Folders section
The collective.transmogrifier.sections.constructor blueprint can construct new content, based on a type (_type key) and a path (_path key). However, it will bail if it is asked to create an item for which the parent folder does not exist.
One way to work around this is to ensure that the folders already exist, for example by sending the instruction to construct them through the pipeline before any contents of that folder. This requires sorted input, of course.
Alternatively, you can use the collective.transmogrifier.sections.folders blueprint. This will look at the path of each incoming item and construct parent folders if needed. This implies that all folders (that do not yet exist), are of the same type. That type defaults to Folder, although you can supply an alternative type. The folder will be created without an id only, but a subsequent schema updated section for a subsequent item may have the opportunity to update it (but not change its type.)
This blueprint can take the following options, all of the optional:
- path-key
The name of the key holding the path. This defaults to the same semantics as those used for the constructor section. Just use _path and you’ll be OK.
- new-type-key
The type key to use when inserting a new item in the pipeline to create folders. The default is _type. Change it if you need to target a specific constructor section.
- new-path-key
The path key to use when inserting a new item in the pipeline to create folders. The default is to use the same as the incoming path key. Change it if you need to target a specific constructor section.
- folder-type
The name of the portal type to use for new folders. Defaults to Folder, which is the default folder type in CMF and Plone.
- cache
By default, the section will keep a cache in memory of each folder it has checked (and possibly created) to know whether it already exists. This saves a lot of traversal, especially if you have many items under a particular folder. This will use a small amount of memory. If you have millions of objects, you can trade memory for speed by setting this option to false.
Here is how it might look by default:
>>> import pprint >>> constructor = """ ... [transmogrifier] ... pipeline = ... contentsource ... folders ... printer ... ... [contentsource] ... blueprint = collective.transmogrifier.sections.tests.folderssource ... ... [folders] ... blueprint = collective.transmogrifier.sections.folders ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.folders', ... constructor) >>> transmogrifier(u'collective.transmogrifier.sections.tests.folders') [('_path', '/foo'), ('_type', 'Document')] [('_path', '/existing/foo'), ('_type', 'Document')] [('_path', '/nonexisting'), ('_type', 'Folder')] [('_path', '/nonexisting/alpha'), ('_type', 'Folder')] [('_path', '/nonexisting/alpha/foo'), ('_type', 'Document')] [('_path', '/nonexisting/beta'), ('_type', 'Folder')] [('_path', '/nonexisting/beta/foo'), ('_type', 'Document')] [('_type', 'Document')] [('_folders_path', '/delta'), ('_type', 'Folder')] [('_folders_path', '/delta/foo'), ('_type', 'Document')]
To specify alternate types and keys, we can do something like this:
>>> import pprint >>> constructor = """ ... [transmogrifier] ... pipeline = ... contentsource ... folders ... printer ... ... [contentsource] ... blueprint = collective.transmogrifier.sections.tests.folderssource ... ... [folders] ... blueprint = collective.transmogrifier.sections.folders ... folder-type = My Folder ... new-type-key = '_folderconstructor_type ... new-path-key = '_folderconstructor_path ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.folders2', ... constructor) >>> transmogrifier(u'collective.transmogrifier.sections.tests.folders2') [('_path', '/foo'), ('_type', 'Document')] [('_path', '/existing/foo'), ('_type', 'Document')] [("'_folderconstructor_path", '/nonexisting'), ("'_folderconstructor_type", 'My Folder')] [("'_folderconstructor_path", '/nonexisting/alpha'), ("'_folderconstructor_type", 'My Folder')] [('_path', '/nonexisting/alpha/foo'), ('_type', 'Document')] [("'_folderconstructor_path", '/nonexisting/beta'), ("'_folderconstructor_type", 'My Folder')] [('_path', '/nonexisting/beta/foo'), ('_type', 'Document')] [('_type', 'Document')] [("'_folderconstructor_path", '/delta'), ("'_folderconstructor_type", 'My Folder')] [('_folders_path', '/delta/foo'), ('_type', 'Document')]
Codec section
A codec pipeline section lets you alter the character encoding of item values, allowing you to recode text from and to unicode and any of the codecs supported by python. The codec section blueprint name is collective.transmogrifier.sections.codec.
What values to recode is determined by the keys option, which takes a set of newline-separated key names. If a key name starts with re: or regexp: it is treated as a regular expression instead.
The optional from and to options determine what codecs values are recoded from and to. Both these values default to unicode, meaning no translation. If either option is set to default, the current default encoding of the Plone site is used.
To deal with possible encoding errors, you can set the error handler of both the from and to codecs separately with the from-error-handler and to-error-handler options, respectively. These default to strict, but can be set to any error handler supported by python, including replace and ignore.
Also optional is the condition option, which lets you specify a TALES expression that when evaluating to False will prevent any en- or decoding from happening. The condition is evaluated for every matched key.
>>> codecs = """ ... [transmogrifier] ... pipeline = ... source ... decode-all ... encode-id ... encode-title ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.samplesource ... encoding = utf8 ... ... [decode-all] ... blueprint = collective.transmogrifier.sections.codec ... keys = re:.* ... from = utf8 ... ... [encode-id] ... blueprint = collective.transmogrifier.sections.codec ... keys = id ... to = ascii ... ... [encode-title] ... blueprint = collective.transmogrifier.sections.codec ... keys = title ... to = ascii ... to-error-handler = backslashreplace ... condition = python:'Brand' not in item['title'] ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.codecs', ... codecs) >>> transmogrifier(u'collective.transmogrifier.sections.tests.codecs') [('id', 'foo'), ('status', u'\u2117'), ('title', 'The Foo Fighters \\u2117')] [('id', 'bar'), ('status', u'\u2122'), ('title', u'Brand Chocolate Bar \u2122')] [('id', 'monty-python'), ('status', u'\xa9'), ('title', "Monty Python's Flying Circus \\xa9")]
The condition expression has access to the following:
item |
the current pipeline item |
key |
the name of the matched key |
match |
if the key was matched by a regular expression, the match object, otherwise boolean True |
transmogrifier |
the transmogrifier |
name |
the name of the splitter section |
options |
the splitter options |
modules |
sys.modules |
Inserter section
An inserter pipeline section lets you define a key and value to insert into pipeline items. The inserter section blueprint name is collective.transmogrifier.sections.inserter.
A inserter section takes a key and a value TALES expression. These expressions are evaluated to generate the actual key-value pair that gets inserted. You can also specify an optional condition option; if given, the key only gets inserted when the condition, which is also a TALES is true.
Because the inserter value expression has access to the original item, it could even be used to change existing item values. Just target an existing key, pull out the original value in the value expression and return a modified version.
>>> inserter = """ ... [transmogrifier] ... pipeline = ... source ... simple-insertion ... expression-insertion ... transform-id ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.rangesource ... size = 3 ... ... [simple-insertion] ... blueprint = collective.transmogrifier.sections.inserter ... key = string:foo ... value = string:bar (inserted into "${item/id}" by the "$name" section) ... ... [expression-insertion] ... blueprint = collective.transmogrifier.sections.inserter ... key = python:'foo-%s' % item['id'][-2:] ... value = python:int(item['id'][-2:]) * 15 ... condition = python:int(item['id'][-2:]) ... ... [transform-id] ... blueprint = collective.transmogrifier.sections.inserter ... key = string:id ... value = string:foo-${item/id} ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.inserter', ... inserter) >>> transmogrifier(u'collective.transmogrifier.sections.tests.inserter') [('foo', 'bar (inserted into "item-00" by the "simple-insertion" section)'), ('id', 'foo-item-00')] [('foo', 'bar (inserted into "item-01" by the "simple-insertion" section)'), ('foo-01', 15), ('id', 'foo-item-01')] [('foo', 'bar (inserted into "item-02" by the "simple-insertion" section)'), ('foo-02', 30), ('id', 'foo-item-02')]
The key, value and condition expressions have access to the following:
item |
the current pipeline item |
transmogrifier |
the transmogrifier |
name |
the name of the splitter section |
options |
the splitter options |
modules |
sys.modules |
key |
(only for the value and condition expressions) the key being inserted |
Condition section
A condition pipeline section lets you selectively discard items from the pipeline. The condition section blueprint name is collective.transmogrifier.sections.condition.
A condition section takes a condition TALES expression. When this expression when matched against the current item is True, the item is yielded to the next pipe section, otherwise it is not:
>>> condition = """ ... [transmogrifier] ... pipeline = ... source ... condition ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.rangesource ... size = 5 ... ... [condition] ... blueprint = collective.transmogrifier.sections.condition ... condition = python:int(item['id'][-2:]) > 2 ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.condition', ... condition) >>> transmogrifier(u'collective.transmogrifier.sections.tests.condition') [('id', 'item-03')] [('id', 'item-04')]
The condition expression has access to the following:
item |
the current pipeline item |
transmogrifier |
the transmogrifier |
name |
the name of the splitter section |
options |
the splitter options |
modules |
sys.modules |
As condition sections skip items in the pipeline, they should not be used inside a splitter section!
Manipulator section
A manipulator pipeline section lets you copy, move or discard keys from the pipeline. The manipulator section blueprint name is collective.transmogrifier.sections.manipulator.
A manipulator section will copy keys when you specify a set of keys to copy, and an expression to determine what to copy these to. These are the keys and destination options.
The keys option is a set of key names, one on each line; keynames starting with re: or regexp: are treated as regular expresions. The destination expression is a TALES expression that can access not only the item, but also the matched key and, if a regular expression was used, the match object.
If a delete option is specified, it is also interpreted as a set of keys, like the keys option. These keys will be deleted from the item; if used together with the keys and destination options, keys will be renamed instead of copied.
Also optional is the condition option, which lets you specify a TALES expression that when evaluating to False will prevent any manipulation from happening. The condition is evaluated for every matched key.
>>> manipulator = """ ... [transmogrifier] ... pipeline = ... source ... copy ... rename ... delete ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.samplesource ... ... [copy] ... blueprint = collective.transmogrifier.sections.manipulator ... keys = ... title ... id ... destination = string:$key-copy ... ... [rename] ... blueprint = collective.transmogrifier.sections.manipulator ... keys = re:([^-]+)-copy$ ... destination = python:'%s-duplicate' % match.group(1) ... delete = ${rename:keys} ... ... [delete] ... blueprint = collective.transmogrifier.sections.manipulator ... delete = status ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.manipulator', ... manipulator) >>> transmogrifier(u'collective.transmogrifier.sections.tests.manipulator') [('id', 'foo'), ('id-duplicate', 'foo'), ('title', u'The Foo Fighters \u2117'), ('title-duplicate', u'The Foo Fighters \u2117')] [('id', 'bar'), ('id-duplicate', 'bar'), ('title', u'Brand Chocolate Bar \u2122'), ('title-duplicate', u'Brand Chocolate Bar \u2122')] [('id', 'monty-python'), ('id-duplicate', 'monty-python'), ('title', u"Monty Python's Flying Circus \xa9"), ('title-duplicate', u"Monty Python's Flying Circus \xa9")]
The destination expression has access to the following:
item |
the current pipeline item |
key |
the name of the matched key |
match |
if the key was matched by a regular expression, the match object, otherwise boolean True |
transmogrifier |
the transmogrifier |
name |
the name of the splitter section |
options |
the splitter options |
modules |
sys.modules |
Splitter section
A splitter pipeline section lets you branch a pipeline into 2 or more sub-pipelines. The splitter section blueprint name is collective.transmogrifier.sections.splitter.
A splitter section takes 2 or more pipeline definitions, and sends the items from the previous section through each of these sub-pipelines, each with it’s own copy [*] of the items:
>>> emptysplitter = """ ... [transmogrifier] ... pipeline = ... source ... splitter ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.rangesource ... size = 3 ... ... [splitter] ... blueprint = collective.transmogrifier.sections.splitter ... pipeline-1 = ... pipeline-2 = ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.emptysplitter', ... emptysplitter) >>> transmogrifier(u'collective.transmogrifier.sections.tests.emptysplitter') [('id', 'item-00')] [('id', 'item-00')] [('id', 'item-01')] [('id', 'item-01')] [('id', 'item-02')] [('id', 'item-02')]
Although the pipeline definitions in the splitter are empty, we end up with 2 copies of every item in the pipeline as both splitter pipelines get to process a copy. Splitter pipelines are defined by options starting with pipeline-.
Normally you’ll use conditions to identify items for each sub-pipe, making the splitter the pipeline equivalent of an if/elif statement. Conditions are optional and use the pipeline option name plus -condition:
>>> evenoddsplitter = """ ... [transmogrifier] ... pipeline = ... source ... splitter ... printer ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.rangesource ... size = 3 ... ... [splitter] ... blueprint = collective.transmogrifier.sections.splitter ... pipeline-even-condition = python:int(item['id'][-2:]) % 2 ... pipeline-even = even-section ... pipeline-odd-condition = not:${splitter:pipeline-even-condition} ... pipeline-odd = odd-section ... ... [odd-section] ... blueprint = collective.transmogrifier.sections.inserter ... key = string:even ... value = string:The even pipe ... ... [even-section] ... blueprint = collective.transmogrifier.sections.inserter ... key = string:odd ... value = string:The odd pipe ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.evenodd', ... evenoddsplitter) >>> transmogrifier(u'collective.transmogrifier.sections.tests.evenodd') [('even', 'The even pipe'), ('id', 'item-00')] [('id', 'item-01'), ('odd', 'The odd pipe')] [('even', 'The even pipe'), ('id', 'item-02')]
Conditions are expressed as TALES statements, and have access to:
item |
the current pipeline item |
transmogrifier |
the transmogrifier |
name |
the name of the splitter section |
pipeline |
the name of the splitter pipeline this condition belongs to (including the pipeline- prefix) |
options |
the splitter options |
modules |
sys.modules |
Savepoint section
A savepoint pipeline section commits a savepoint every so often, which has a side-effect of freeing up memory. The savepoint section blueprint name is collective.transmogrifier.sections.savepoint.
A savepoint section takes an optional every option, which defaults to 1000; a savepoint is committed every every items passing through the pipe. A savepoint section doesn’t alter the items in any way:
>>> savepoint = """ ... [transmogrifier] ... pipeline = ... source ... savepoint ... ... [source] ... blueprint = collective.transmogrifier.sections.tests.rangesource ... size = 10 ... ... [savepoint] ... blueprint = collective.transmogrifier.sections.savepoint ... every = 3 ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.savepoint', ... savepoint)We’ll show savepoints being committed by overriding transaction.savepoint:
>>> import transaction >>> original_savepoint = transaction.savepoint >>> counter = [0] >>> def test_savepoint(counter=counter, *args, **kw): ... counter[0] += 1 >>> transaction.savepoint = test_savepoint >>> transmogrifier(u'collective.transmogrifier.sections.tests.savepoint') >>> transaction.savepoint = original_savepoint >>> counter[0] 3
CSV source section
A CSV source pipeline section lets you create pipeline items from CSV files. The CSV source section blueprint name is collective.transmogrifier.sections.csvsource.
A CSV source section will load the CSV file named in the filename option, and will yield an item for each line in the CSV file. It’ll use the first line of the CSV file to determine what keys to use, or you can specify a fieldnames option to specify the key names.
The filename option may be an absolute path, or a package reference, e.g. my.package:foo/bar.csv.
By default the CSV file is assumed to use the Excel CSV dialect, but you can specify any dialect supported by the python csv module if you specify it with the dialect option.
>>> import tempfile >>> tmp = tempfile.NamedTemporaryFile('w+', suffix='.csv') >>> tmp.write('\r\n'.join("""\ ... foo,bar,baz ... first-foo,first-bar,first-baz ... second-foo,second-bar,second-baz ... """.splitlines())) >>> tmp.flush() >>> csvsource = """ ... [transmogrifier] ... pipeline = ... csvsource ... printer ... ... [csvsource] ... blueprint = collective.transmogrifier.sections.csvsource ... filename = %s ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ % tmp.name >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.file', ... csvsource) >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file') [('bar', 'first-bar'), ('baz', 'first-baz'), ('foo', 'first-foo')] [('bar', 'second-bar'), ('baz', 'second-baz'), ('foo', 'second-foo')]>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file', ... csvsource=dict(fieldnames='monty spam eggs')) [('eggs', 'baz'), ('monty', 'foo'), ('spam', 'bar')] [('eggs', 'first-baz'), ('monty', 'first-foo'), ('spam', 'first-bar')] [('eggs', 'second-baz'), ('monty', 'second-foo'), ('spam', 'second-bar')]
Here is the same example, loading a file from a package instead:
>>> csvsource = """ ... [transmogrifier] ... pipeline = ... csvsource ... printer ... ... [csvsource] ... blueprint = collective.transmogrifier.sections.csvsource ... filename = collective.transmogrifier.tests:sample.csv ... ... [printer] ... blueprint = collective.transmogrifier.sections.tests.pprinter ... """ >>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.package', ... csvsource) >>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.package') [('bar', 'first-bar'), ('baz', 'first-baz'), ('foo', 'first-foo')] [('bar', 'second-bar'), ('baz', 'second-baz'), ('foo', 'second-foo')]>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.package', ... csvsource=dict(fieldnames='monty spam eggs')) [('eggs', 'baz'), ('monty', 'foo'), ('spam', 'bar')] [('eggs', 'first-baz'), ('monty', 'first-foo'), ('spam', 'first-bar')] [('eggs', 'second-baz'), ('monty', 'second-foo'), ('spam', 'second-bar')]
Change History
(name of developer listed in brackets)
1.3 (2011-03-17)
Added the GenericSetup import context as an annotation to the transmogrifier. [elro]
Added a logger to log the value of a particular key for all items. Handy when debugging, you can see which path is failing, and good if you want to show progress in a long import. [regebro]
Added a breakpoint section to break on a particular expression, which is handy for debugging. [regebro]
1.2 (2010-03-30)
Bug fix: the constructor promises to encode paths to ASCII, but failed to do so. Thanks to gyst for finding the discrepancy. [mj]
1.1 (2010-03-17)
Allow the CSV source to load its file from a package as well as from an absolute or relative file path. To load from a package, pass package.name:filename.csv to the filename option. [optilude]
Add CMF 2.2/Plone 4 compatibility for the content constructor [optilude]
Use an explicit provides attribute to register the transmogrifier adapter. Fixes the “Missing ‘provides’ attribute” errors when loading with zope.annotation installed. [mj]
Add a required flag to the content constructor, which causes it to raise a KeyError if the container where to construct the new item doesn’t exist. [regebro]
Add an optional condition to the manipulator section. [regebro]
1.0 (2009-08-07)
Initial transmogrifier architecture. [mj]
Download
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for collective.transmogrifier-1.3.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | b31604475a1b30fd9141e271101c0ef05440a2b83a1566ddd53877edcd53692d |
|
MD5 | caf7c6ee869571b6a6858321d3b0db82 |
|
BLAKE2b-256 | 9a6d395af1e00f7dbcda7644fcbeedc2e73ec251687f2856defaddbcdf37b61b |