Skip to main content

TensorFlow IO

Project description




TensorFlow I/O

GitHub CI PyPI License Documentation

TensorFlow I/O is a collection of file systems and file formats that are not available in TensorFlow's built-in support. A full list of supported file systems and file formats by TensorFlow I/O can be found here.

The use of tensorflow-io is straightforward with keras. Below is an example to Get Started with TensorFlow with the data processing aspect replaced by tensorflow-io:

import tensorflow as tf
import tensorflow_io as tfio

# Read the MNIST data into the IODataset.
d_train = tfio.IODataset.from_mnist(
    'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
    'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz')

# Shuffle the elements of the dataset.
d_train = d_train.shuffle(buffer_size=1024)

# By default image data is uint8, so convert to float32 using map().
d_train = d_train.map(lambda x, y: (tf.image.convert_image_dtype(x, tf.float32), y))

# prepare batches the data just like any other tf.data.Dataset
d_train = d_train.batch(32)

# Build the model.
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile the model.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fit the model.
model.fit(d_train, epochs=5, steps_per_epoch=200)

In the above MNIST example, the URL's to access the dataset files are passed directly to the tfio.IODataset.from_mnist API call. This is due to the inherent support that tensorflow-io provides for the HTTP file system, thus eliminating the need for downloading and saving datasets on a local directory.

NOTE: Since tensorflow-io is able to detect and uncompress the MNIST dataset automatically if needed, we can pass the URL's for the compressed files (gzip) to the API call as is.

Please check the official documentation for more detailed and interesting usages of the package.

Installation

Python Package

The tensorflow-io Python package can be installed with pip directly using:

$ pip install tensorflow-io

People who are a little more adventurous can also try our nightly binaries:

$ pip install tensorflow-io-nightly

Docker Images

In addition to the pip packages, the docker images can be used to quickly get started.

For stable builds:

$ docker pull tfsigio/tfio:latest
$ docker run -it --rm --name tfio-latest tfsigio/tfio:latest

For nightly builds:

$ docker pull tfsigio/tfio:nightly
$ docker run -it --rm --name tfio-nightly tfsigio/tfio:nightly

R Package

Once the tensorflow-io Python package has been successfully installed, you can install the development version of the R package from GitHub via the following:

if (!require("remotes")) install.packages("remotes")
remotes::install_github("tensorflow/io", subdir = "R-package")

TensorFlow Version Compatibility

To ensure compatibility with TensorFlow, it is recommended to install a matching version of TensorFlow I/O according to the table below. You can find the list of releases here.

TensorFlow I/O Version TensorFlow Compatibility Release Date
0.17.0 2.4.x Dec 14, 2020
0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020
0.14.0 2.2.x Jul 08, 2020
0.13.0 2.2.x May 10, 2020
0.12.0 2.1.x Feb 28, 2020
0.11.0 2.1.x Jan 10, 2020
0.10.0 2.0.x Dec 05, 2019
0.9.1 2.0.x Nov 15, 2019
0.9.0 2.0.x Oct 18, 2019
0.8.1 1.15.x Nov 15, 2019
0.8.0 1.15.x Oct 17, 2019
0.7.2 1.14.x Nov 15, 2019
0.7.1 1.14.x Oct 18, 2019
0.7.0 1.14.x Jul 14, 2019
0.6.0 1.13.x May 29, 2019
0.5.0 1.13.x Apr 12, 2019
0.4.0 1.13.x Mar 01, 2019
0.3.0 1.12.0 Feb 15, 2019
0.2.0 1.12.0 Jan 29, 2019
0.1.0 1.12.0 Dec 16, 2018

Performance Benchmarking

We use github-pages to document the results of API performance benchmarks. The benchmark job is triggered on every commit to master branch and facilitates tracking performance w.r.t commits.

Contributing

Tensorflow I/O is a community led open source project. As such, the project depends on public contributions, bug-fixes, and documentation. Please see:

Build Status and CI

Build Status
Linux CPU Python 2 Status
Linux CPU Python 3 Status
Linux GPU Python 2 Status
Linux GPU Python 3 Status

Because of manylinux2010 requirement, TensorFlow I/O is built with Ubuntu:16.04 + Developer Toolset 7 (GCC 7.3) on Linux. Configuration with Ubuntu 16.04 with Developer Toolset 7 is not exactly straightforward. If the system have docker installed, then the following command will automatically build manylinux2010 compatible whl package:

#!/usr/bin/env bash

ls dist/*
for f in dist/*.whl; do
  docker run -i --rm -v $PWD:/v -w /v --net=host quay.io/pypa/manylinux2010_x86_64 bash -x -e /v/tools/build/auditwheel repair --plat manylinux2010_x86_64 $f
done
sudo chown -R $(id -nu):$(id -ng) .
ls wheelhouse/*

It takes some time to build, but once complete, there will be python 3.5, 3.6, 3.7 compatible whl packages available in wheelhouse directory.

On macOS, the same command could be used. However, the script expects python in shell and will only generate a whl package that matches the version of python in shell. If you want to build a whl package for a specific python then you have to alias this version of python to python in shell. See .github/workflows/build.yml Auditwheel step for instructions how to do that.

Note the above command is also the command we use when releasing packages for Linux and macOS.

TensorFlow I/O uses both GitHub Workflows and Google CI (Kokoro) for continuous integration. GitHub Workflows is used for macOS build and test. Kokoro is used for Linux build and test. Again, because of the manylinux2010 requirement, on Linux whl packages are always built with Ubuntu 16.04 + Developer Toolset 7. Tests are done on a variatiy of systems with different python3 versions to ensure a good coverage:

Python Ubuntu 18.04 Ubuntu 20.04 macOS + osx9 Windows-2019
2.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: N/A
3.7 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
3.8 :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

TensorFlow I/O has integrations with many systems and cloud vendors such as Prometheus, Apache Kafka, Apache Ignite, Google Cloud PubSub, AWS Kinesis, Microsoft Azure Storage, Alibaba Cloud OSS etc.

We tried our best to test against those systems in our continuous integration whenever possible. Some tests such as Prometheus, Kafka, and Ignite are done with live systems, meaning we install Prometheus/Kafka/Ignite on CI machine before the test is run. Some tests such as Kinesis, PubSub, and Azure Storage are done through official or non-official emulators. Offline tests are also performed whenever possible, though systems covered through offine tests may not have the same level of coverage as live systems or emulators.

Live System Emulator CI Integration Offline
Apache Kafka :heavy_check_mark: :heavy_check_mark:
Apache Ignite :heavy_check_mark: :heavy_check_mark:
Prometheus :heavy_check_mark: :heavy_check_mark:
Google PubSub :heavy_check_mark: :heavy_check_mark:
Azure Storage :heavy_check_mark: :heavy_check_mark:
AWS Kinesis :heavy_check_mark: :heavy_check_mark:
Alibaba Cloud OSS :heavy_check_mark:
Google BigTable/BigQuery to be added
Elasticsearch (experimental) :heavy_check_mark: :heavy_check_mark:
MongoDB (experimental) :heavy_check_mark: :heavy_check_mark:

References for emulators:

Community

Additional Information

License

Apache License 2.0

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.8 Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.7m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-win_amd64.whl (21.1 MB view details)

Uploaded CPython 3.6m Windows x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-manylinux2010_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-macosx_10_13_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 4b7c3e9ab623a3e39f5cda4ee9230e47e446e625c822ddd24881c86bbaef137d
MD5 51837c14cd32837763eea5867bf5c9c8
BLAKE2b-256 c5b30b1bbef22e54de1ef00f860ab11a9121b77cd51b9e14c63516291feeaea1

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c794609e37b1088d996aa5a759e40f69660bc29fd4fa536c580347ae4082aeb3
MD5 b01086eccd2f11ff443996e5b8210fa7
BLAKE2b-256 7480c88f91be1a6aeb949bb1b4dab66d135318e5a0472c0d4b507c85a2d300b9

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 53ef4d485ccb660471a2e868abf5b69d1e4ef85bcbf5f97ef2152531c4c7f1c5
MD5 a8a358759eb15142de266727af0df411
BLAKE2b-256 5e3e199c71f5fd072047a97c966fca3dcf2bf18ce021dc722188462fd0501eed

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2ac9880852c839f6f9bc3cda540be6c14d32f10c4e153e9db78ebef944aa6558
MD5 126e40af6d031d0bea67f2a43c6c8bfe
BLAKE2b-256 bd77b52aa863e50e0bb98f8f372ba4d39821f454d0b35a3ef4edbdca54d7b2a5

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b586027fe4210292a39e7dcd10b88058faa53b28cc625aa3e7d0d5dfbb94c827
MD5 d64ece1fb7e64ab719aa7d3a88cad23d
BLAKE2b-256 1c7417e297f7a08ac7e6d66ad259ecef628ca9b1c58adcb3aa082091eb154a0e

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 d29fa097b66a93997f125cb3785c0ecad3ea2a45a16effdc9c262b96dffa90b6
MD5 66f10689c02d84bb0e8a8e72def2b2fc
BLAKE2b-256 4eb2d5354da8886188410fd50b09100777170dd0d69c38401211ddb869817355

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 d39a341e1bf1b70339dd183fe43964d65e1bf4080337eae840abe95a69938fe2
MD5 9474cf31d31ac049df7e2a3ceead858c
BLAKE2b-256 0f8545cf66714e38a737b87bb39465d1e4cec583930b5bb2ef386bcde34e78e2

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f4ce7922b7d9ca1f995fad5203cccc85f73849ec9c4fcd291b011887998df6e3
MD5 03108be058142037e796a92c90cd5470
BLAKE2b-256 79e9cab81a2be46877d78c9cea13dd8ab3d3b0676f4e824bd9bd65f5eea58994

See more details on using hashes here.

File details

Details for the file tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for tensorflow_io_nightly-0.17.0.dev20210204211754-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 1e6007bc81689d384800d0341b10ce7c7aaa55c3da149abae0453d0355a89b2a
MD5 c7b853d7a7d706650bfd4d87969bf4c6
BLAKE2b-256 9f99838e199e74c4d2eb27af07d6a1b567f761d20bb7097acf315fa8d1c66c3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page