Low-impact, task-level memory profiling for Dask.
Project description
dask-memusage
If you're using Dask with tasks that use a lot of memory, RAM is your bottleneck for parallelism. That means you want to know how much memory each task uses:
- So you can set the highest parallelism level (process or threads) for each machine, given available to RAM.
- In order to know where to focus memory optimization efforts.
dask-memusage
is an MIT-licensed statistical memory profiler for Dask's Distributed scheduler that can help you with both these problems.
dask-memusage
polls your processes for memory usage and records the minimum and maximum usage in a CSV:
task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625
task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625
Usage
Important: Make sure your workers only have a single thread! Otherwise the results will be wrong.
Installation
On the machine where you are running the Distributed scheduler, run:
$ pip install dask_memusage
Or if you're using Conda:
$ conda install -c conda-forge dask-memusage
API usage
# Add to your Scheduler object, which is e.g. your LocalCluster's scheduler
# attribute:
from dask_memoryusage import install
install(scheduler, "/tmp/memusage.csv")
CLI usage
$ dask-scheduler --preload dask_memusage --memusage.csv /tmp/memusage.csv
Limitations
- Again, make sure you only have one thread per worker process.
- This is statistical profiling, running every 10ms. Tasks that take less than that won't have accurate information.
Help
Need help? File a ticket at https://github.com/itamarst/dask-memusage/issues/new
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dask_memusage-1.1.tar.gz
.
File metadata
- Download URL: dask_memusage-1.1.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.22.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29d9f25074fecd7ca249e972cb3ec0b909a1dcefaf037c8d5fca24fadbf66757 |
|
MD5 | 94f3882eed9009eee13702c1c6ed2565 |
|
BLAKE2b-256 | b6c473b1021d1a9ea5ed29c079faf23cb62d8c29e8ef5794384f237c8927b918 |
Provenance
File details
Details for the file dask_memusage-1.1-py3-none-any.whl
.
File metadata
- Download URL: dask_memusage-1.1-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.22.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3024bcd9189ac611d2576ab8b3941dd41ea466f1933dd131cf4650f81a4677c4 |
|
MD5 | 12630a210959fa028c7c04e651b1ee67 |
|
BLAKE2b-256 | e051499c565202a5b892bd9ac5ba98c458d0cf6d1ec9b0b784db20a4e0f5b5cd |