Skip to main content

S.M.A.R.T. Prometheus Metrics Exporter

Project description

S.M.A.R.T. Prometheus Metrics Exporter

Docker images pypi
MIT License Contributor Covenant Python Version GitHub issues
pytest status Static Code Checks status
Docker build Debian image Docker build Alpine image trivy

smart-prom-next is a Prometheus metric exporter for S.M.A.R.T. values of hard disks. Python and the Linux tool smartctl are used to read out the hard disk values. These are then exposed using Prometheus Python Client over network port 9902.

According to Wikipedia, the primary function of S.M.A.R.T. is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures.

Currently, smart-prom-next is available as a docker image and a pypi package.

The Docker image is built from the slim version of the official Python Docker image, which uses Debian Bullseye. It is built for multiple platforms:
linux/386, linux/amd64, linux/arm/v5, linux/arm/v7, linux/arm64/v8

The second Docker option is also built from the official Python Docker image, but uses Alpine. It is built for multiple platforms:
linux/386, linux/amd64, linux/arm/v6, linux/arm/v7, linux/arm64/v8

Configuration Options / Environment Variables

smart-prom-next can be configured by the following environment variables:

  • PROMETHEUS_METRIC_PORT - port number over which the Prometheus metrics are exposed (default: 9902)
  • SMART_INFO_READ_INTERVAL_SECONDS - time interval in seconds at which the SMART values of the hard disk are read (default: 60)

Docker / docker-compose

The images, which are based on Debian Bullseye slim, can be accessed using: ghcr.io/philipmay/smart-prom-next:<version>-slim-bullseye or ghcr.io/philipmay/smart-prom-next:latest

The images, which are based on Alpine, can be accessed using: ghcr.io/philipmay/smart-prom-next:<version>-alpine

The latest versions are visible in smart-prom-next GitHub packages.

Below is an example of a complete minimal docker-compose.yml, how smart-prom-next can be used with docker-compose:

version: "3.0"
services:
  smart-prom-next:
    # see https://github.com/PhilipMay/smart-prom-next/pkgs/container/smart-prom-next
    image: ghcr.io/philipmay/smart-prom-next:latest
    container_name: "smart-prom-next"
    restart: unless-stopped
    privileged: true
    ports:
      - 9902:9902

The privileged: true permission is absolutely necessary so that smartctl can also access the hard disks from within the container.

Security note: In the production environment, you should leave out the ports: part in the docker-compose.yml in the vast majority of configurations so that it is not visible to the outside. Instead, the container should be assigned to a network in which the prometheus container is located. This looks like this:

    networks:
      - monitor

To adjust the environment variables, the following settings can be added, for example:

    environment:
      - PROMETHEUS_METRIC_PORT=9009
      - SMART_INFO_READ_INTERVAL_SECONDS=120

Available Metrics

smart_prom_smart_status_failed

The SMART health status of the device. A value of 0 indicates a healthy state. A value of 1 means that the device has not passed the health check and there is a problem.

List of labels used (description see below): "device", "type", "model", "serial"

smart_prom_smartctl_exit_status

The exit status (aka exit code or return code) of the smartctl tool. Any value other than zero indicates an issue. A more detailed description can be found in the EXIT STATUS chapter of the smartctl man pages.

List of labels used (description see below): "device", "type", "model", "serial"

smart_prom_smart_info

The SMART Attributes. A more detailed description can be found in the -A, --attributes chapter of the smartctl man pages.

List of labels used (description see below): "device", "type", "model", "serial", "attr_name", "attr_type", "attr_id"

smart_prom_nvme_smart_info

NVMe specific SMART attributes obtained from the SMART/Health Information log. A more detailed description can be found in the -A, --attributes chapter of the smartctl man pages.

List of labels used (description see below): "device", "type", "model", "serial", "attr_name"

smart_prom_scsi_smart_info

SCSI specific SMART attributes obtained from the SMART/Health Information log. A more detailed description can be found in the -A, --attributes chapter of the smartctl man pages.

List of labels used (description see below): "device", "type", "model", "serial", "attr_name", "attr_type"

smart_prom_temperature

The temperature values of the device. These include not only the current temperature but also other values.

List of labels used (description see below): "device", "type", "model", "serial", "temperature_type"

smart_prom_scrape_iterations_total

Counter how often the SMART values were scraped.

Metrics Label

In this project, we use different labels on the metrics. These are described here:

  • device - device file, e.g.: /dev/nvme0, /dev/sda
  • type - type of the device, e.g.: ata, nvme, usbjmicron
  • model - model name, e.g.: KXG6AZNV512G TOSHIBA, WDC WD3200BEVT-60ZCT0
  • serial - serial number, e.g.: WD-WXE708D44703, Y9SF71LHFWZL
  • temperature_type - type of the temperature value, e.g.: current, power_cycle_max, lifetime_max, op_limit_max
  • attr_name - SMART attribute name, e.g.: raw_read_error_rate, reallocated_sector_ct, critical_warning
  • attr_id - SMART attribute id, e.g.: 1, 3, 4
  • attr_type - type of the respective SMART attribute - value is one of this: value, worst, thresh, raw, failed_now, failed_past - a detailed description can be found in the -A, --attributes chapter of the smartctl man pages

Prometheus Alerts

Based on the metrics, Prometheus alerts can be defined. Below are a few suggestions for prometheus_rules.yml:

groups:
  - name: alert_rules
    rules:
  
      - alert: DiskFailing
        expr: smart_prom_smart_info{attr_type="failed_now"} == 1
        labels:
          severity: critical
        annotations:
          summary: "disk failing"

      - alert: DiskTemperatureHigh
        expr: smart_prom_temperature{temperature_type="current"} > 50
        labels:
          severity: warning
        annotations:
          summary: "disk temperature > 50"

      - alert: SMARTStatusFailing
        expr: smart_prom_smart_status_failed == 1
        labels:
          severity: critical
        annotations:
          summary: "SMART status failing"

Release News

Here you can find the latest versions of the software:

Important news and features in the releases:

  • add additional Alpine based image #40 - version 0.0.4 at 2022-07-28
  • add -slim-bullseye suffix to image #44 - version 0.0.4 at 2022-07-28
  • improve logs with "error" and "warning" prefix #43 - version 0.0.4 at 2022-07-28
  • add scsi disk handling - thanks to Jopaul-John - version 0.0.3 at 2022-07-20
  • breaking change on smart_prom_nvme_smart_info - version 0.0.2 at 2022-06-23
  • additional smart_prom_scrape_iterations_total metric - version 0.0.2 at 2022-06-23
  • first pre-release - pre-release version 0.0.1rc9 at 2022-06-20

Special Thanks

A special thanks goes to the following contributors:

Licensing

Copyright (c) 2022 Philip May

Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_prom_next-0.0.8.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

smart_prom_next-0.0.8-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file smart_prom_next-0.0.8.tar.gz.

File metadata

  • Download URL: smart_prom_next-0.0.8.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for smart_prom_next-0.0.8.tar.gz
Algorithm Hash digest
SHA256 6b135743d38488e143956655d886b48fe63c47e7dc4c7022924c9b02d3068204
MD5 834f80fa1de03bafa2abc254fddca493
BLAKE2b-256 1802ab732a96ff24c9c729f333ae877506a20f2d91d72ece63c31fee7ef5b17f

See more details on using hashes here.

Provenance

File details

Details for the file smart_prom_next-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for smart_prom_next-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d44c0f746130dbdd92bac096275515adfb37c1f127e324ab556c49ed20f56cb1
MD5 e579b7c2dba670e2167ecb28df8fd0b5
BLAKE2b-256 01efbaaa5a169c2c36d6fdb78e2da08fda4f1977d01534210a3a28d2c8f1e19b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page