image and video datasets and models for torch deep learning
Project description
torch-vision
This repository consists of:
vision.datasets : Data loaders for popular vision datasets
vision.models : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models.
vision.transforms : Common image transformations such as random crop, rotations etc.
vision.utils : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc.
Installation
Binaries:
conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith
From Source:
pip install -r requirements.txt
pip install .
Datasets
The following dataset loaders are available:
Datasets have the API: - __getitem__ - __len__ They all subclass from torch.utils.data.Dataset Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.
For example:
torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)
In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
transform - a function that takes in an image and returns a transformed version
common stuff like ToTensor, RandomCrop, etc. These can be composed together with transforms.Compose (see transforms section below)
target_transform - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
COCO
This requires the COCO API to be installed
Detection:
dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])
LSUN
dset.LSUN(db_path, classes='train', [transform, target_transform])
db_path = root directory for the database files
classes =
‘train’ - all categories, training set
‘val’ - all categories, validation set
‘test’ - all categories, test set
[‘bedroom_train’, ‘church_train’, …] : a list of categories to load
CIFAR
dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)
dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)
root : root directory of dataset where there is folder cifar-10-batches-py
train : True = Training set, False = Test set
download : True = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, does not do anything.
ImageFolder
A generic data loader where the images are arranged in this way:
root/dog/xxx.png root/dog/xxy.png root/dog/xxz.png root/cat/123.png root/cat/nsdf3.png root/cat/asd932_.png
dset.ImageFolder(root="root folder path", [transform, target_transform])
It has the members:
self.classes - The class names as a list
self.class_to_idx - Corresponding class indices
self.imgs - The list of (image path, class-index) tuples
Imagenet-12
This is simply implemented with an ImageFolder dataset.
The data is preprocessed as described here
Models
The models subpackage contains definitions for the following model architectures:
AlexNet: AlexNet variant from the “One weird trick” paper.
VGG: VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization)
ResNet: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152
You can construct a model with random weights by calling its constructor:
import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
We provide pre-trained models for the ResNet variants and AlexNet, using the PyTorch model zoo. These can be constructed by passing pretrained=True:
python import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True)
Transforms
Transforms are common image transforms. They can be chained together using transforms.Compose
transforms.Compose
One can compose several transforms together. For example.
transform = transforms.Compose([
transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
std = [ 0.229, 0.224, 0.225 ]),
])
Transforms on PIL.Image
Scale(size, interpolation=Image.BILINEAR)
Rescales the input PIL.Image to the given ‘size’. ‘size’ will be the size of the smaller edge.
For example, if height > width, then image will be rescaled to (size * height / width, size) - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR
CenterCrop(size) - center-crops the image to the given size
Crops the given PIL.Image at the center to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size)
RandomCrop(size, padding=0)
Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.
RandomHorizontalFlip()
Randomly horizontally flips the given PIL.Image with a probability of 0.5
RandomSizedCrop(size, interpolation=Image.BILINEAR)
Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio
This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR
Pad(padding, fill=0)
Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7
Transforms on torch.*Tensor
Normalize(mean, std)
Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std
Conversion Transforms
ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]
Generic Transofrms
Lambda(lambda)
Given a Python lambda, applies it to the input img and returns it. For example:
transforms.Lambda(lambda x: x.add(10))
Utils
make_grid(tensor, nrow=8, padding=2)
Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of images
save_image(tensor, filename, nrow=8, padding=2)
Saves a given Tensor into an image file.
If given a mini-batch tensor, will save the tensor as a grid of images.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch-vision-0.1.6.dev0.tar.gz
.
File metadata
- Download URL: torch-vision-0.1.6.dev0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3f682bc21ef59da0543aeba8191c6a1bbe95ccf18747b636769078ab3fe86d1 |
|
MD5 | 726f4a66c3953ba3971f938cf42b20ea |
|
BLAKE2b-256 | e95fc3f0dcafaffd9a481b009eaafd076e7d09cf7e044e204bbd79910672e918 |
File details
Details for the file torch_vision-0.1.6.dev0-py2.py3-none-any.whl
.
File metadata
- Download URL: torch_vision-0.1.6.dev0-py2.py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5292127b8d9e4b211fdf1a030e61fa56a2f537fe8306963618be6b61d65a80d |
|
MD5 | ebcdb6e89dbad46c39514023e6c24464 |
|
BLAKE2b-256 | ea134942860c32f6877def97c0b432348adce870ae613ed4eb1de10cae0bb018 |