Skip to main content

datafactory generates testdata.

Project description

https://github.com/walkframe/datafactory/workflows/master/badge.svg

Requirements

  • Python 3.5 or later.

Install

$ pip install datafactory

Usage

Basic Example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     'id': datafactory.IncrementField(),
   ...:     'x': datafactory.CycleField(['a', 'b', 'c']),
   ...:     # BLANK will be omit.
   ...:     'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),
   ...: })

In [3]: container = datafactory.Container(model, 5, render=True)

In [4]: container
Out[4]:
[{'id': 1, 'x': 'a'},
 {'id': 2, 'x': 'b', 'option': False},
 {'id': 3, 'x': 'c', 'option': True},
 {'id': 4, 'x': 'a'},
 {'id': 5, 'x': 'b'}]

# specify rewrite=True, if file already exists.
In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)

In [6]: !cat /tmp/test.json
[
 {
  "x": "a",
  "id": 1
 },
 {
  "x": "b",
  "id": 2,
  "option": false
 },
 {
  "x": "c",
  "id": 3,
  "option": true
 },
 {
  "x": "a",
  "id": 4
 },
 {
  "x": "b",
  "id": 5
 }
]

TSV Example

In [1]: import datafactory

In [2]: model = datafactory.ListModel([
   ...:     datafactory.IncrementField(start=10, step=5),
   ...:     datafactory.HashOfField(2, 'md5'),  # hashing value of the third column.
   ...:     datafactory.ChoiceField(['foo', 'bar', 'baz']),
   ...:     datafactory.CycleField(range(0, 30, 10)),
   ...: ]).ordering(2)  # render at first index:2(third column)

# IterContainer is saving memory, because generating an element each time.
In [3]: container = datafactory.IterContainer(model, 10)  # repeat 10 times.

In [4]: datafactory.CsvFormatter(
   ...:     container,
   ...:     delimiter='\t',
   ...:     header=['id', 'hash-of-name', 'name', 'value']
   ...: ).write('/tmp/test.csv', rewrite=True)

In [5]: !cat /tmp/test.csv
id    hash-of-name    name    value
10    acbd18db4cc2f85cedef654fccc4a4d8        foo     0
15    acbd18db4cc2f85cedef654fccc4a4d8        foo     10
20    73feffa4b7f6bb68e44cf984c85f6e88        baz     20
25    acbd18db4cc2f85cedef654fccc4a4d8        foo     0
30    acbd18db4cc2f85cedef654fccc4a4d8        foo     10
35    73feffa4b7f6bb68e44cf984c85f6e88        baz     20
40    73feffa4b7f6bb68e44cf984c85f6e88        baz     0
45    73feffa4b7f6bb68e44cf984c85f6e88        baz     10
50    37b51d194a7513e45b56f6524f2d51f2        bar     20
55    37b51d194a7513e45b56f6524f2d51f2        bar     0

Custom Example

if object is callable, it stores execution result.

Model

In [1]: import datafactory

In [2]: def square(k, i):
   ...:     return k * i
   ...:

In [3]: container = datafactory.DictContainer(square)

In [4]: container(['a', 'b', 'c', 'd', 'e'])
Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}

Field

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:    'col1': (lambda r, i: i),
   ...:    'col2': (lambda r: r['col1'] + 1),
   ...:    'col3': (lambda r: r['col2'] * 2),
   ...:    'col4': 100,  # fixed value
   ...: }).ordering('col1', 'col2', 'col3')

In [3]: container = datafactory.ListContainer(model)

In [4]: container(4)
Out[4]:
[{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},
 {'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},
 {'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},
 {'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]

Limited number of element Example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     # x: a is 1times limited. / b is 2times limited. / c is 3times limited.
   ...:     'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),
   ...:     # y: a is 2times limited. / b and c is 1times limited.
   ...:     'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),
   ...:     # z: a and b can't be selected. / c is 5times limited.
   ...:     'z': datafactory.PickoutField(['c']*5, missing=None),
   ...: })

In [3]: container = datafactory.ListContainer(model)

In [4]: container(6)
Out[4]:
[{'x': 'a', 'y': 'a', 'z': 'c'},
 {'x': 'c', 'y': 'b', 'z': 'c'},
 {'x': 'c', 'y': 'a', 'z': 'c'},
 {'x': 'b', 'y': 'c', 'z': 'c'},
 {'x': 'c', 'y': '*', 'z': 'c'},
 {'x': 'b', 'y': '*', 'z': None}]

Combination Example

To generate the testdata that combines multiple elements can be achieved by using the repeat-argument of CycleField and SequenceField.

In [1]: import datafactory

In [2]: l0 = ['a', 'b']

In [3]: l1 = ['a', 'b', 'c']

In [4]: l2 = ['a', 'b', 'c', 'd']

In [5]: model = datafactory.ListModel([
   ...:     datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),
   ...:     datafactory.CycleField(l1, repeat=len(l2)),
   ...:     datafactory.CycleField(l2),
   ...: ])

In [6]: container = datafactory.Container(model)

# by specifying the ESCAPE to missing-argument
# automatically detect end of elements and escape before reaching 10000.
In [7]: container(10000)
Out[7]:
[['a', 'a', 'a'],
 ['a', 'a', 'b'],
 ['a', 'a', 'c'],
 ['a', 'a', 'd'],
 ['a', 'b', 'a'],
 ['a', 'b', 'b'],
 ['a', 'b', 'c'],
 ['a', 'b', 'd'],
 ['a', 'c', 'a'],
 ['a', 'c', 'b'],
 ['a', 'c', 'c'],
 ['a', 'c', 'd'],
 ['b', 'a', 'a'],
 ['b', 'a', 'b'],
 ['b', 'a', 'c'],
 ['b', 'a', 'd'],
 ['b', 'b', 'a'],
 ['b', 'b', 'b'],
 ['b', 'b', 'c'],
 ['b', 'b', 'd'],
 ['b', 'c', 'a'],
 ['b', 'c', 'b'],
 ['b', 'c', 'c'],
 ['b', 'c', 'd']]

nested example

In [1]: import datafactory

In [2]: model = datafactory.Model({
   ...:     'a': datafactory.ListModel([
   ...:         datafactory.CycleField(['b', 'c']),
   ...:         datafactory.CycleField(['d', 'e']),
   ...:     ]),
   ...:     datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)
   ...: })

In [3]: datafactory.Container(model, 10, render=True)
Out[3]:
[{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},
 {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]

datetime Utility

choice

random choice between start and end.

In [1]: from datafactory.utils.datetime import choice


In [2]: choice(1988, '2015-11-11T11:11:11.111111')
Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)

# tuple: datetime(*tuple), dict: datetime(**dict)
In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})
Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)

In [4]: from datetime import datetime, date

In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))
Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)

generator

generator that generate the datetime object at regular intervals.

In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import generator

# if you omit end-argument, then it creates an object infinitely.
In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))

In [4]: next(g)
Out[4]: datetime.datetime(2015, 1, 1, 0, 0)

In [5]: next(g)
Out[5]: datetime.datetime(2015, 1, 2, 12, 0)

In [6]: next(g)
Out[6]: datetime.datetime(2015, 1, 4, 0, 0)

In [7]: next(g)
Out[7]: datetime.datetime(2015, 1, 5, 12, 0)

range

generate list object that includes regularly generated datetime objects element.

In [1]: from datetime import timedelta
In [2]: from datafactory.utils.datetime import range

In [3]: range(2015, '2015/2/1')
Out[3]:
[datetime.datetime(2015, 1, 1, 0, 0),
 datetime.datetime(2015, 1, 2, 0, 0),
 datetime.datetime(2015, 1, 3, 0, 0),
 datetime.datetime(2015, 1, 4, 0, 0),
 datetime.datetime(2015, 1, 5, 0, 0),
 datetime.datetime(2015, 1, 6, 0, 0),
 datetime.datetime(2015, 1, 7, 0, 0),
 datetime.datetime(2015, 1, 8, 0, 0),
 datetime.datetime(2015, 1, 9, 0, 0),
 datetime.datetime(2015, 1, 10, 0, 0),
 datetime.datetime(2015, 1, 11, 0, 0),
 datetime.datetime(2015, 1, 12, 0, 0),
 datetime.datetime(2015, 1, 13, 0, 0),
 datetime.datetime(2015, 1, 14, 0, 0),
 datetime.datetime(2015, 1, 15, 0, 0),
 datetime.datetime(2015, 1, 16, 0, 0),
 datetime.datetime(2015, 1, 17, 0, 0),
 datetime.datetime(2015, 1, 18, 0, 0),
 datetime.datetime(2015, 1, 19, 0, 0),
 datetime.datetime(2015, 1, 20, 0, 0),
 datetime.datetime(2015, 1, 21, 0, 0),
 datetime.datetime(2015, 1, 22, 0, 0),
 datetime.datetime(2015, 1, 23, 0, 0),
 datetime.datetime(2015, 1, 24, 0, 0),
 datetime.datetime(2015, 1, 25, 0, 0),
 datetime.datetime(2015, 1, 26, 0, 0),
 datetime.datetime(2015, 1, 27, 0, 0),
 datetime.datetime(2015, 1, 28, 0, 0),
 datetime.datetime(2015, 1, 29, 0, 0),
 datetime.datetime(2015, 1, 30, 0, 0),
 datetime.datetime(2015, 1, 31, 0, 0),
 datetime.datetime(2015, 2, 1, 0, 0)]

# +-3 hour noise, +5 minute noise
In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))
Out[4]:
[datetime.datetime(2015, 1, 1, 3, 1),
 datetime.datetime(2015, 1, 2, 0, 3),
 datetime.datetime(2015, 1, 3, 2, 0),
 datetime.datetime(2015, 1, 3, 22, 2),
 datetime.datetime(2015, 1, 4, 22, 3),
 datetime.datetime(2015, 1, 6, 0, 2),
 datetime.datetime(2015, 1, 7, 0, 4),
 datetime.datetime(2015, 1, 8, 0, 4),
 datetime.datetime(2015, 1, 8, 21, 3),
 datetime.datetime(2015, 1, 9, 22, 0),
 datetime.datetime(2015, 1, 11, 0, 0),
 datetime.datetime(2015, 1, 11, 22, 1),
 datetime.datetime(2015, 1, 12, 22, 5),
 datetime.datetime(2015, 1, 14, 3, 0),
 datetime.datetime(2015, 1, 15, 2, 5)]

# it is able to specify minus direction as interval.
In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))
Out[5]:
[datetime.datetime(2015, 5, 22, 0, 0),
 datetime.datetime(2015, 5, 21, 0, 0),
 datetime.datetime(2015, 5, 20, 0, 0),
 datetime.datetime(2015, 5, 19, 0, 0),
 datetime.datetime(2015, 5, 18, 0, 0),
 datetime.datetime(2015, 5, 17, 0, 0),
 datetime.datetime(2015, 5, 16, 0, 0),
 datetime.datetime(2015, 5, 15, 0, 0),
 datetime.datetime(2015, 5, 14, 0, 0),
 datetime.datetime(2015, 5, 13, 0, 0),
 datetime.datetime(2015, 5, 12, 0, 0),
 datetime.datetime(2015, 5, 11, 0, 0),
 datetime.datetime(2015, 5, 10, 0, 0),
 datetime.datetime(2015, 5, 9, 0, 0),
 datetime.datetime(2015, 5, 8, 0, 0),
 datetime.datetime(2015, 5, 7, 0, 0),
 datetime.datetime(2015, 5, 6, 0, 0),
 datetime.datetime(2015, 5, 5, 0, 0),
 datetime.datetime(2015, 5, 4, 0, 0),
 datetime.datetime(2015, 5, 3, 0, 0),
 datetime.datetime(2015, 5, 2, 0, 0),
 datetime.datetime(2015, 5, 1, 0, 0),
 datetime.datetime(2015, 4, 30, 0, 0),
 datetime.datetime(2015, 4, 29, 0, 0),
 datetime.datetime(2015, 4, 28, 0, 0),
 datetime.datetime(2015, 4, 27, 0, 0),
 datetime.datetime(2015, 4, 26, 0, 0),
 datetime.datetime(2015, 4, 25, 0, 0),
 datetime.datetime(2015, 4, 24, 0, 0),
 datetime.datetime(2015, 4, 23, 0, 0),
 datetime.datetime(2015, 4, 22, 0, 0)]

common

noise

possible to specify the gap between the actual time as noise parameters. allow to specify the noise parameters are “datetimes.generator” and “datetimes.range” functions. noise-arguments must be specified in the kwargs format. and not required. the available keys are same with timedelta-args. specifically, it is the following.

  • days

  • hours

  • minute

  • seconds

  • microseconds

argtype

acceptable argument as datetime other than datetime type are following.

int:

it is processed as year.

str or unicode:

create datetime object in the numeric part of string.

tuple:

it is processed as (year, month, day)

dict:

these items are processed as datetime arguments.

date:

hour:minute:second is complemented with 00:00:00.

history

1.0.0

Initialize.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafactory-1.0.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

datafactory-1.0.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file datafactory-1.0.0.tar.gz.

File metadata

  • Download URL: datafactory-1.0.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for datafactory-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e3b5bf6c4b0d72c6e90322d3058bde52d2b1d9ee6664b94b7a8dc4a1b8c53e60
MD5 135be233fde224d4722e22176b399a18
BLAKE2b-256 715a69b5e9d08e3318635abe251469945f306fe70f65f0aecf7b8ece2c399c01

See more details on using hashes here.

File details

Details for the file datafactory-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: datafactory-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for datafactory-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94348d5f221cc81c4b5e546f9d67d07ce1abaa86798436cd0b2e1617de061b08
MD5 ca0071f2077035769e4034e2e722ab13
BLAKE2b-256 0e90249725bfa9017aeca38cb75dfa547161e78653ffab88989754397924f707

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page