Skip to main content

Low-impact, task-level memory profiling for Dask.

Project description

dask-memusage

If you're using Dask with tasks that use a lot of memory, RAM is your bottleneck for parallelism. That means you want to know how much memory each task uses:

  1. So you can set the highest parallelism level (process or threads) for each machine, given available to RAM.
  2. In order to know where to focus memory optimization efforts.

dask-memusage is an MIT-licensed statistical memory profiler for Dask's Distributed scheduler that can help you with both these problems.

dask-memusage polls your processes for memory usage and records the minimum and maximum usage in a CSV:

task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625
task_key,min_memory_mb,max_memory_mb
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 0)",44.84765625,96.98046875
"('from_sequence-map-sum-part-e15703211a549e75b11c63e0054b53e5', 1)",47.015625,97.015625
"('sum-part-e15703211a549e75b11c63e0054b53e5', 0)",0,0
"('sum-part-e15703211a549e75b11c63e0054b53e5', 1)",0,0
sum-aggregate-apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,0,0
apply-no_allocate-4c30eb545d4c778f0320d973d9fc8ea6,47.265625,47.265625

Usage

Important: Make sure your workers only have a single thread! Otherwise the results will be wrong.

Installation

On the machine where you are running the Distributed scheduler, run:

$ pip install dask_memusage

Or if you're using Conda:

$ conda install -c conda-forge dask-memusage

API usage

# Add to your Scheduler object, which is e.g. your LocalCluster's scheduler
# attribute:
from dask_memoryusage import install
install(scheduler, "/tmp/memusage.csv")

CLI usage

$ dask-scheduler --preload dask_memusage --memusage.csv /tmp/memusage.csv

Limitations

  • Again, make sure you only have one thread per worker process.
  • This is statistical profiling, running every 10ms. Tasks that take less than that won't have accurate information.

Help

Need help? File a ticket at https://github.com/itamarst/dask-memusage/issues/new

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask_memusage-1.1.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

dask_memusage-1.1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file dask_memusage-1.1.tar.gz.

File metadata

  • Download URL: dask_memusage-1.1.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for dask_memusage-1.1.tar.gz
Algorithm Hash digest
SHA256 29d9f25074fecd7ca249e972cb3ec0b909a1dcefaf037c8d5fca24fadbf66757
MD5 94f3882eed9009eee13702c1c6ed2565
BLAKE2b-256 b6c473b1021d1a9ea5ed29c079faf23cb62d8c29e8ef5794384f237c8927b918

See more details on using hashes here.

Provenance

File details

Details for the file dask_memusage-1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dask_memusage-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3024bcd9189ac611d2576ab8b3941dd41ea466f1933dd131cf4650f81a4677c4
MD5 12630a210959fa028c7c04e651b1ee67
BLAKE2b-256 e051499c565202a5b892bd9ac5ba98c458d0cf6d1ec9b0b784db20a4e0f5b5cd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page