Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
d_train = tfio.IODataset.from_mnist(
    'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
    'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz')

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile the model.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for the HTTP file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.8 Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.7m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.6m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 ad6021a6f223c0054a16f0e489e5870e2b740c889c8cae5a586d6830df4e2721
MD5 814d5d69b52423e5bb2c4582e45c3c70
BLAKE2b-256 6f553c94ab4a758707fdf7e6aa94e013e24f83d53e6e77ed5096faf18b7a59ad

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5a4c1a678ea3aabd637f2e0d9fa1bf0ba77d2b98aa40051b84255a522d90c524
MD5 a7087c774fdb1c901c9ab468c4a81adf
BLAKE2b-256 99bb0f0b64bb72d82fcc31b547064b2024f818dd1cef84617b2c3ac8502ee503

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 531f2164e0cd36595312e8bc7f4b806869465f00396ca01ea2b2586144de8d58
MD5 94299202cf4e00cb74621c6ed5f5a026
BLAKE2b-256 138adedd47aa684452ae6d58229e663079466317fb81c99df969adf697545baf

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 9d4b27c9763a0332ee1f4429dc509625cd64a0fbe87ba083d4541ed5071f2209
MD5 af39b75347c9054774be429c8f4b8bb7
BLAKE2b-256 06e5cef42137c6796a990ce0a719d41d286f8246a49685f0d8a5f5e6ccedc698

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 cfc276828ff12bad0604b71d2a1684f48546185ec67aab736f02e4f6c38bc508
MD5 9b0df757f947389aa9d315d8f28f31e3
BLAKE2b-256 4c3fa01749be1fba0433e203b05e99ab909464146e3f403067a6e1b793141837

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 d998bde6a5e95259e9277132cf4287cd985202035a933a0c7f49d38d888f487d
MD5 4099fad90c2bcc34bda822ae1e0e59c8
BLAKE2b-256 c61e141f6529198e93510f43eba46117357c874042171942d29c8b0f83641af8

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f9a9e7725db202d5a3811c3d4221e6ffd2179f9481d63d591b4c78d352b7dd67
MD5 2c639dc277084f5a68ec11aaa73a121b
BLAKE2b-256 842ea6424d1e9dc72ba1593f177a112217c2322dadc339ea8fb422af9e114b0e

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 652737993f1ce54d0b5fdf354a3a0906b46937378ab9d86b41c329adc7c6a8e1
MD5 16a3064a29a291fb95345b35eb874777
BLAKE2b-256 c4296c33ea9ec590c1b437285f6b76b709ce50713672a5d99052ee1d92f21b30

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210203024109-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 62a05d492c02bedb66e659fcb29c8090c73d0f5b099db76ed77b65a175cab8f8
MD5 b2be85ccf552b160fbd9dcddd84be9cf
BLAKE2b-256 98e28f5899406a17ebdb97f624cb3d621926fe374099076b1db1593293821bde

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page