datafactory generates testdata.
Project description
Overview
datafactory makes flexible data according to the given rules.
The features are divided into field, model, container, and formatter. If you compare it to a DB, fields are columns, models are records, and containers are tables.
The great thing about the datafactory is its flexibility in type specification. Containers can also be nested.
formatter supports data formatting and file output.
Requirements
Python 3.5 or later.
Install
$ pip install datafactory
Usage
Basic Example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'id': datafactory.IncrementField(),
...: 'x': datafactory.CycleField(['a', 'b', 'c']),
...: # BLANK will be omit.
...: 'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),
...: })
In [3]: container = datafactory.Container(model, 5, render=True)
In [4]: container
Out[4]:
[{'id': 1, 'x': 'a'},
{'id': 2, 'x': 'b', 'option': False},
{'id': 3, 'x': 'c', 'option': True},
{'id': 4, 'x': 'a'},
{'id': 5, 'x': 'b'}]
# specify rewrite=True, if file already exists.
In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)
In [6]: !cat /tmp/test.json
[
{
"x": "a",
"id": 1
},
{
"x": "b",
"id": 2,
"option": false
},
{
"x": "c",
"id": 3,
"option": true
},
{
"x": "a",
"id": 4
},
{
"x": "b",
"id": 5
}
]
TSV Example
In [1]: import datafactory
In [2]: model = datafactory.ListModel([
...: datafactory.IncrementField(start=10, step=5),
...: datafactory.HashOfField(2, 'md5'), # hashing value of the third column.
...: datafactory.ChoiceField(['foo', 'bar', 'baz']),
...: datafactory.CycleField(range(0, 30, 10)),
...: ]).ordering(2) # render at first index:2(third column)
# IterContainer is saving memory, because generating an element each time.
In [3]: container = datafactory.IterContainer(model, 10) # repeat 10 times.
In [4]: datafactory.CsvFormatter(
...: container,
...: delimiter='\t',
...: header=['id', 'hash-of-name', 'name', 'value']
...: ).write('/tmp/test.csv', rewrite=True)
In [5]: !cat /tmp/test.csv
id hash-of-name name value
10 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
15 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
20 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
25 acbd18db4cc2f85cedef654fccc4a4d8 foo 0
30 acbd18db4cc2f85cedef654fccc4a4d8 foo 10
35 73feffa4b7f6bb68e44cf984c85f6e88 baz 20
40 73feffa4b7f6bb68e44cf984c85f6e88 baz 0
45 73feffa4b7f6bb68e44cf984c85f6e88 baz 10
50 37b51d194a7513e45b56f6524f2d51f2 bar 20
55 37b51d194a7513e45b56f6524f2d51f2 bar 0
Custom Example
If object is callable, it stores execution result.
Model
In [1]: import datafactory
In [2]: def square(k, i):
...: return k * i
...:
In [3]: container = datafactory.DictContainer(square)
In [4]: container(['a', 'b', 'c', 'd', 'e'])
Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}
Field
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'col1': (lambda r, i: i),
...: 'col2': (lambda r: r['col1'] + 1),
...: 'col3': (lambda r: r['col2'] * 2),
...: 'col4': 100, # fixed value
...: }).ordering('col1', 'col2', 'col3')
In [3]: container = datafactory.ListContainer(model)
In [4]: container(4)
Out[4]:
[{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},
{'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},
{'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},
{'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]
Limited number of element Example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: # x: a is 1times limited. / b is 2times limited. / c is 3times limited.
...: 'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),
...: # y: a is 2times limited. / b and c is 1times limited.
...: 'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),
...: # z: a and b can't be selected. / c is 5times limited.
...: 'z': datafactory.PickoutField(['c']*5, missing=None),
...: })
In [3]: container = datafactory.ListContainer(model)
In [4]: container(6)
Out[4]:
[{'x': 'a', 'y': 'a', 'z': 'c'},
{'x': 'c', 'y': 'b', 'z': 'c'},
{'x': 'c', 'y': 'a', 'z': 'c'},
{'x': 'b', 'y': 'c', 'z': 'c'},
{'x': 'c', 'y': '*', 'z': 'c'},
{'x': 'b', 'y': '*', 'z': None}]
Combination Example
To generate the testdata that combines multiple elements can be achieved by using the repeat-argument of CycleField and SequenceField.
In [1]: import datafactory
In [2]: l0 = ['a', 'b']
In [3]: l1 = ['a', 'b', 'c']
In [4]: l2 = ['a', 'b', 'c', 'd']
In [5]: model = datafactory.ListModel([
...: datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),
...: datafactory.CycleField(l1, repeat=len(l2)),
...: datafactory.CycleField(l2),
...: ])
In [6]: container = datafactory.Container(model)
# by specifying the ESCAPE to missing-argument
# automatically detect end of elements and escape before reaching 10000.
In [7]: container(10000)
Out[7]:
[['a', 'a', 'a'],
['a', 'a', 'b'],
['a', 'a', 'c'],
['a', 'a', 'd'],
['a', 'b', 'a'],
['a', 'b', 'b'],
['a', 'b', 'c'],
['a', 'b', 'd'],
['a', 'c', 'a'],
['a', 'c', 'b'],
['a', 'c', 'c'],
['a', 'c', 'd'],
['b', 'a', 'a'],
['b', 'a', 'b'],
['b', 'a', 'c'],
['b', 'a', 'd'],
['b', 'b', 'a'],
['b', 'b', 'b'],
['b', 'b', 'c'],
['b', 'b', 'd'],
['b', 'c', 'a'],
['b', 'c', 'b'],
['b', 'c', 'c'],
['b', 'c', 'd']]
nested example
In [1]: import datafactory
In [2]: model = datafactory.Model({
...: 'a': datafactory.ListModel([
...: datafactory.CycleField(['b', 'c']),
...: datafactory.CycleField(['d', 'e']),
...: ]),
...: datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)
...: })
In [3]: datafactory.Container(model, 10, render=True)
Out[3]:
[{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
{'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]
datetime Utility
choice
random choice between start and end.
In [1]: from datafactory.utils.datetime import choice
In [2]: choice(1988, '2015-11-11T11:11:11.111111')
Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)
# tuple: datetime(*tuple), dict: datetime(**dict)
In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})
Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)
In [4]: from datetime import datetime, date
In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))
Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)
generator
generator that generate the datetime object at regular intervals.
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import generator
# if you omit end-argument, then it creates an object infinitely.
In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))
In [4]: next(g)
Out[4]: datetime.datetime(2015, 1, 1, 0, 0)
In [5]: next(g)
Out[5]: datetime.datetime(2015, 1, 2, 12, 0)
In [6]: next(g)
Out[6]: datetime.datetime(2015, 1, 4, 0, 0)
In [7]: next(g)
Out[7]: datetime.datetime(2015, 1, 5, 12, 0)
range
generate list object that includes regularly generated datetime objects element.
In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import range
In [3]: range(2015, '2015/2/1')
Out[3]:
[datetime.datetime(2015, 1, 1, 0, 0),
datetime.datetime(2015, 1, 2, 0, 0),
datetime.datetime(2015, 1, 3, 0, 0),
datetime.datetime(2015, 1, 4, 0, 0),
datetime.datetime(2015, 1, 5, 0, 0),
datetime.datetime(2015, 1, 6, 0, 0),
datetime.datetime(2015, 1, 7, 0, 0),
datetime.datetime(2015, 1, 8, 0, 0),
datetime.datetime(2015, 1, 9, 0, 0),
datetime.datetime(2015, 1, 10, 0, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 12, 0, 0),
datetime.datetime(2015, 1, 13, 0, 0),
datetime.datetime(2015, 1, 14, 0, 0),
datetime.datetime(2015, 1, 15, 0, 0),
datetime.datetime(2015, 1, 16, 0, 0),
datetime.datetime(2015, 1, 17, 0, 0),
datetime.datetime(2015, 1, 18, 0, 0),
datetime.datetime(2015, 1, 19, 0, 0),
datetime.datetime(2015, 1, 20, 0, 0),
datetime.datetime(2015, 1, 21, 0, 0),
datetime.datetime(2015, 1, 22, 0, 0),
datetime.datetime(2015, 1, 23, 0, 0),
datetime.datetime(2015, 1, 24, 0, 0),
datetime.datetime(2015, 1, 25, 0, 0),
datetime.datetime(2015, 1, 26, 0, 0),
datetime.datetime(2015, 1, 27, 0, 0),
datetime.datetime(2015, 1, 28, 0, 0),
datetime.datetime(2015, 1, 29, 0, 0),
datetime.datetime(2015, 1, 30, 0, 0),
datetime.datetime(2015, 1, 31, 0, 0),
datetime.datetime(2015, 2, 1, 0, 0)]
# +-3 hour noise, +5 minute noise
In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))
Out[4]:
[datetime.datetime(2015, 1, 1, 3, 1),
datetime.datetime(2015, 1, 2, 0, 3),
datetime.datetime(2015, 1, 3, 2, 0),
datetime.datetime(2015, 1, 3, 22, 2),
datetime.datetime(2015, 1, 4, 22, 3),
datetime.datetime(2015, 1, 6, 0, 2),
datetime.datetime(2015, 1, 7, 0, 4),
datetime.datetime(2015, 1, 8, 0, 4),
datetime.datetime(2015, 1, 8, 21, 3),
datetime.datetime(2015, 1, 9, 22, 0),
datetime.datetime(2015, 1, 11, 0, 0),
datetime.datetime(2015, 1, 11, 22, 1),
datetime.datetime(2015, 1, 12, 22, 5),
datetime.datetime(2015, 1, 14, 3, 0),
datetime.datetime(2015, 1, 15, 2, 5)]
# it is able to specify minus direction as interval.
In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))
Out[5]:
[datetime.datetime(2015, 5, 22, 0, 0),
datetime.datetime(2015, 5, 21, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 16, 0, 0),
datetime.datetime(2015, 5, 15, 0, 0),
datetime.datetime(2015, 5, 14, 0, 0),
datetime.datetime(2015, 5, 13, 0, 0),
datetime.datetime(2015, 5, 12, 0, 0),
datetime.datetime(2015, 5, 11, 0, 0),
datetime.datetime(2015, 5, 10, 0, 0),
datetime.datetime(2015, 5, 9, 0, 0),
datetime.datetime(2015, 5, 8, 0, 0),
datetime.datetime(2015, 5, 7, 0, 0),
datetime.datetime(2015, 5, 6, 0, 0),
datetime.datetime(2015, 5, 5, 0, 0),
datetime.datetime(2015, 5, 4, 0, 0),
datetime.datetime(2015, 5, 3, 0, 0),
datetime.datetime(2015, 5, 2, 0, 0),
datetime.datetime(2015, 5, 1, 0, 0),
datetime.datetime(2015, 4, 30, 0, 0),
datetime.datetime(2015, 4, 29, 0, 0),
datetime.datetime(2015, 4, 28, 0, 0),
datetime.datetime(2015, 4, 27, 0, 0),
datetime.datetime(2015, 4, 26, 0, 0),
datetime.datetime(2015, 4, 25, 0, 0),
datetime.datetime(2015, 4, 24, 0, 0),
datetime.datetime(2015, 4, 23, 0, 0),
datetime.datetime(2015, 4, 22, 0, 0)]
common
noise
It is possible to specify the gap between the actual time as noise parameters. allow to specify the noise parameters are “datetimes.generator” and “datetimes.range” functions.
**noise is specified in the kwargs format and they are not required.
The available keys are same with timedelta-args.
days
hours
minute
seconds
microseconds
argtype
The acceptable arguments as the other than datetime type are the following.
- int:
It is evaluated as a year.
- str:
It is parsed as datetime from the numeric part of the string.
- tuple:
It will be passed into datetime args.
- dict:
It will be passed into datetime kwargs.
- date:
It will be converted datetime type.
history
1.0.x
Initialize.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datafactory-1.0.1.tar.gz
.
File metadata
- Download URL: datafactory-1.0.1.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc60824d1d29ca55be9772ac8450c7ec5774bcd1d1630c8f0ab8e30b62f1887b |
|
MD5 | a1d6ecbcaff9cff3ccc868905d7a9be3 |
|
BLAKE2b-256 | b978d57f9c802b7ef482bf444263791db32d58672ebdd0ed7203ce06ef59253d |
File details
Details for the file datafactory-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: datafactory-1.0.1-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41ea7201fe4bc18ea54aea9f32fd3090c97c6fe9601f42c3aeca612a498c58fb |
|
MD5 | f65f7e9c06b4f32f61a6198d737465e7 |
|
BLAKE2b-256 | 5fcede9c6eb552cc04bee8cf70a37bad42ea4e9293efb0ad16e5e961565c6f90 |