Python library for Digital Pathology Image Processing
Project description
Table of Contents
Motivation
The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors, and understand their nature. In daily practice, pathologists usually perform microscopy examination of tissue slides considering a limited number of regions and the clinical evaulation relies on several factors such as nuclei morphology, cell distribution, and color (staining): this process is time consuming, could lead to information loss, and suffers from inter-observer variability.
The advent of digital pathology is changing the way patholgists work and collaborate, and has opened the way to a new era in computational pathology. In particular, histopathology is expected to be at the center of the AI revolution in medicine [1], prevision supported by the increasing success of deep learning applications to digital pathology.
Whole Slide Images (WSIs), namely the translation of tissue slides from glass to digital format, are a great source of information from both a medical and a computational point of view. WSIs can be coloured with different staining techniques (e.g. H&E or IHC), and are usually very large in size (up to several GB per slide). Because of WSIs typical pyramidal structure, images can be retrieved at different magnification factors, providing a further layer of information beyond color.
However, processing WSIs is far from being trivial. First of all, WSIs can be stored in different proprietary formats, according to the scanner used to digitalize the slides, and a standard protocol is still missing. WSIs can also present artifacts, such as shadows, mold, or annotations (pen marks) that are not useful. Moreover, giving their dimensions, it is not possible to process a WSI all at once, or, for example, to feed a neural network: it is necessary to crop smaller regions of tissues (tiles), which in turns require a tissue detection step.
The aim of this project is to provide a tool for WSI processing in a reproducible environment to support clinical and scientific research. Histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles, and it can thus be integrated in a deep learning pipeline.
Getting Started
Prerequisites
Histolab has only one sistem-wide dependency: OpenSlide.
You can download and install it from OpenSlide according to your operating system.
Installation
pip install histolab
Documentation
Read the full documentation here https://histolab.readthedocs.io/en/latest/.
Quickstart
from histolab.data import breast_tissue, heart_tissue
NB To use the data module, you need to install pooch
.
Each data function outputs the corresponding slide as an OpenSlide object, and the path where the slide has been saved:
breast_svs, breast_path = breast_tissue()
heart_svs, heart_path = heart_tissue()
Slide
from histolab.slide import Slide
Convert the slide into a Slide
object. Slide
takes as input the path where the slide is stored and the processed_path
where the thumbnail and the tiles will be saved.
breast_slide = Slide(breast_path, processed_path='processed')
heart_slide = Slide(heart_path, processed_path='processed')
As a Slide
object, you can now easily retrieve information about the slide, such as the slide name, the dimensions at native magnification, the dimensions at a specified level, save and show the slide thumbnail, or get a scaled version of the slide.
print(f"Slide name: {breast_slide.name}")
print(f"Dimensions at level 0: {breast_slide.dimensions}")
print(f"Dimensions at level 1: {breast_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {breast_slide.level_dimensions(level=2)}")
Slide name: 9c960533-2e58-4e54-97b2-8454dfb4b8c8
Dimensions at level 0: (96972, 30681)
Dimensions at level 1: (24243, 7670)
Dimensions at level 2: (6060, 1917)
print(f"Slide name: {heart_slide.name}")
print(f"Dimensions at level 0: {heart_slide.dimensions}")
print(f"Dimensions at level 1: {heart_slide.level_dimensions(level=1)}")
print(f"Dimensions at level 2: {heart_slide.level_dimensions(level=2)}")
Slide name: JP2K-33003-2
Dimensions at level 0: (32671, 47076)
Dimensions at level 1: (8167, 11769)
Dimensions at level 2: (2041, 2942)
breast_slide.save_thumbnail()
print(f"Thumbnails saved at: {breast_slide.thumbnail_path}")
heart_slide.save_thumbnail()
print(f"Thumbnails saved at: {heart_slide.thumbnail_path}")
Thumbnails saved at: processed/thumbnails/9c960533-2e58-4e54-97b2-8454dfb4b8c8.png
Thumbnails saved at: processed/thumbnails/JP2K-33003-2.png
breast_slide.show()
heart_slide.show()
Tiles extraction
Now that your Slide
object is defined, you can automatically extract the tiles. A RandomTiler
object crops random tiles from the slide.
You need to specify the size you want your tiles, the number of tiles to crop, and the level of magnification. If check_tissue
is True, the exracted tiles are taken by default from the biggest tissue region detected in the slide, and the tiles are saved only if they have at least 80% of tissue inside.
from histolab.tiler import RandomTiler
random_tiles_extractor = RandomTiler(
tile_size=(512, 512),
n_tiles=6,
level=2,
seed=42,
check_tissue=True,
prefix="processed/breast_slide/",
)
random_tiles_extractor.extract(breast_slide)
Tile 0 saved: processed/breast_slide/tile_0_level2_70536-7186-78729-15380.png
Tile 1 saved: processed/breast_slide/tile_1_level2_74393-3441-82586-11635.png
Tile 2 saved: processed/breast_slide/tile_2_level2_82218-6225-90411-14420.png
Tile 3 saved: processed/breast_slide/tile_3_level2_84026-8146-92219-16340.png
Tile 4 saved: processed/breast_slide/tile_4_level2_78969-3953-87162-12147.png
Tile 5 saved: processed/breast_slide/tile_5_level2_78649-3569-86842-11763.png
Tile 6 saved: processed/breast_slide/tile_6_level2_81994-6753-90187-14948.png
6 Random Tiles have been saved.
random_tiles_extractor = RandomTiler(
tile_size=(512, 512),
n_tiles=6,
level=0,
seed=42,
check_tissue=True,
prefix="processed/heart_slide/",
)
random_tiles_extractor.extract(heart_slide)
Tile 0 saved: processed/heart_slide/tile_0_level0_4299-35755-4811-36267.png
Tile 1 saved: processed/heart_slide/tile_1_level0_7051-39146-7563-39658.png
Tile 2 saved: processed/heart_slide/tile_2_level0_10920-26934-11432-27446.png
Tile 3 saved: processed/heart_slide/tile_3_level0_7151-30986-7663-31498.png
Tile 4 saved: processed/heart_slide/tile_4_level0_11472-26400-11984-26912.png
Tile 5 saved: processed/heart_slide/tile_5_level0_13489-42680-14001-43192.png
Tile 6 saved: processed/heart_slide/tile_6_level0_13281-33895-13793-34407.png
6 Random Tiles have been saved.
Versioning
We use PEP 440 for versioning.
Authors
License
This project is licensed under Apache License Version 2.0
- see the LICENSE.txt file for details
Roadmap
Acknowledgements
References
[1] Colling, Richard, et al. "Artificial intelligence in digital pathology: A roadmap to routine use in clinical practice." The Journal of pathology 249.2 (2019)
Contribution guidelines
If you want to contribute to Histolab, be sure to review the contribution guidelines
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file histolab-0.0.4b0.tar.gz
.
File metadata
- Download URL: histolab-0.0.4b0.tar.gz
- Upload date:
- Size: 28.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34d5dd4a2c3e63013963de2982a5d35e7d5d8f67f104afbb95ef50ff5dc26ceb |
|
MD5 | af4b7e81f46bc8286f6b6b8714f0e2b3 |
|
BLAKE2b-256 | 8a8339aa75d5b02c47cc2497c1ff0d87b837eb30891f51f2a298e0e5ebaa04c8 |
File details
Details for the file histolab-0.0.4b0-py3-none-any.whl
.
File metadata
- Download URL: histolab-0.0.4b0-py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da1d56d22dbe411f9b2f63ad10f94520628d54b6ef792d55f932465ea1dd48c9 |
|
MD5 | e6797f285361f6ebb5259bdc5d773e96 |
|
BLAKE2b-256 | 03f2f643ce159281a59c0886300a7112c529ffdd4d6a0b4f8eca4bdae06702b8 |