Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
d_train = tfio.IODataset.from_mnist(
    'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
    'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz')

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile the model.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for the HTTP file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-win_amd64.whl (21.2 MB view details)

Uploaded CPython 3.8 Windows x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-win_amd64.whl (21.2 MB view details)

Uploaded CPython 3.7m Windows x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-win_amd64.whl (21.2 MB view details)

Uploaded CPython 3.6m Windows x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-manylinux2010_x86_64.whl (25.4 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 23f3e2a4b3861e659e9123067707817c356968aaaf74917a65b176e5ed42365e
MD5 4434ff6fbfbcd843b14ff0e025d1af0f
BLAKE2b-256 e93b4f20fd0bbe5fc3bda94d9ecb5f8583b54b41541d9758ad2b0c62211547eb

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2000b33b91431e059b6ecdfe939abe92e7f439b2d6c3be2ac42d5d1a4ba12b0a
MD5 a9adb46bdee06c0e5f3bd9b94ff40fc9
BLAKE2b-256 1d823d4aa7c2624e99769facfc9a9bfe76ac887f22a14fcf934b92d20d44fd6f

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 d17e9d94242f33028fdf853bcbeef587b89880e8dcab5a583af9c73c42380c51
MD5 447518e4a758008e34b6769bbf3221ae
BLAKE2b-256 9c4f2a24c4f256114eef90b9fbe848c2b3a12b22614b9b141331e1071ea8a06e

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 74dfe77c22725c35d14c4dd5367777354db392ae8f6b299b20be57f33824d200
MD5 c85c364748b3567b274f00fc9bdcdbc0
BLAKE2b-256 c258cf8735117de3b01b135b7aa4f02cec1c3f285d92620b994e7bdf04332fda

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 22b9697d14857ade41e046592de42e16cfa3610945a36cf261493431050b8798
MD5 ce0888b89ca37f22fcd5411f5a21481f
BLAKE2b-256 27644c4526a115fb1906375525d6b7560723ce0b91ebc9bc8987c70c5a4066f0

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 375819d7559781b0fc2e7f41d093ab6bbb493d19b34d01f2e883b92312df37fd
MD5 d3c9d720f7e558d5e0635325c824626f
BLAKE2b-256 a38559c766c64d037b77c0432f1cdb115d9ac70987fb84cd7fb8e9043ff73edc

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 e0e2a1b278d9079735a05038f93f9d8b6bafb8cf2b237d21630609427d8bea01
MD5 8305142037c23593265423b14c473f96
BLAKE2b-256 6afd34d71ef68a33f5ae02f0e6e643182f03f9b0e1d2e4b683817fc55b1b694d

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 ef5c7398aa99fb4aaa442fd14f78dc237fe5281094d298a69de45f519cde57b9
MD5 12f0bdbb8c3b251cbd6704be428c4297
BLAKE2b-256 7638c25920ecb94001d6e581322c036c6a963e8d075a25e1fed1a319ce6833a3

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.18.0.dev20210212062814-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 9d167d327067a895a1d10961044f86b33708a890b37d04e170b64715442be1c3
MD5 f04a3ba182e387f53001190f226eca74
BLAKE2b-256 6779e48ee12912034436401fd827a29b2334a709854097d157852588f8919287

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page