Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
d_train = tfio.IODataset.from_mnist(
    'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
    'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz')

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile the model.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for the HTTP file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.8 Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-macosx_10_13_x86_64.whl (21.4 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.7m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-macosx_10_13_x86_64.whl (21.4 MB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.6m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-macosx_10_13_x86_64.whl (21.4 MB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 5948ffa61aa7db5f7002c576f6ff43ed0ed287bb63080a6e39f9c44e8332536e
MD5 a3836661ab055f32f0c352ed9207798c
BLAKE2b-256 baf692d38759816162a72d3c40bb732d29c7504f362f43728ff64e013e1c327e

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 876402db77b4a60c57d1f14cb1891a7424d51350a90af01cb54875fbd748582f
MD5 a7871d001d94732f0e32852c02e3f17a
BLAKE2b-256 f38369754c3454ee7ca8daa5d0522186b12412bba2b517bd5c4ec0cb68a569e5

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 0f81571bb15d1bf5667982a1ba08f88a7881ac5d2b1d611b8f9257d252cedc98
MD5 f4cb9fd1d2f2d1a0ab20221f62fd17bf
BLAKE2b-256 1fe6d994e22f5e6d164887cb989fc4598c176757f344a71cc5ee96aa36459ff7

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 051a0468a49792c25465bf0e64e02c8b4a7adc59ac6de37b47efc90dae9e344b
MD5 d6e6be48cc72578207eb9a7aed901e64
BLAKE2b-256 3c5dc52f9d9b8cea4642a68e6623796ef220acf38e95557ccb7fc7c37b071d87

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2580b7bdb76dcde3f34a45e341a340c4cc1b70ce33bfcee49d76ac9421226d3d
MD5 1e10f76f5f0a7940c4c6736ef10e49f9
BLAKE2b-256 1b8e760ab01c8d96eec42f01a069c5c3c5296997c9ad2b510b26de538e7f24c6

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 582319cc918de6f355afffbbf076e2c92343532f7447058b1718d1b31693b91b
MD5 25db748d278a392458b7a3429924b5ba
BLAKE2b-256 05ea39bd55ba275f1f1f89800818a8771f455d0e8cdd8bbcc828a49ec3324679

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 2902e9d7a6c218422b39ed9f9a87d46e717dcd9d862902d143532d857d091a51
MD5 6f4e7dfe3b5a68bc32c6fb5fc484de65
BLAKE2b-256 13e3f0359440a9eea35268cc1df35fbdc72741716603217c8b7eac16b8e34a55

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7a9a6ea24db8ee4246d55734ff9806b86b9a351db42c916489f90bf12819b704
MD5 38e2e63e56aea4d18473abefa3adcfe9
BLAKE2b-256 5932dd98c4ffccf9538dc1e2d9dff0c9ed094be3fa61b12571f9ca3a8186b38f

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210210123742-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 e327b9338f27a8ef807ab24facb7abef40b014cac4574d0bd9168e849fba9b90
MD5 698748f3f0b478f81b79ccb438b9ec0d
BLAKE2b-256 1658fc363206bbd9c8e70fd4041f0e0bec90302b70625b07541de142a422f294

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page