Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Standby-Support][FEAT] Add support for smartctl --nocheck= to the collector #221

Closed
danie1k opened this issue Mar 31, 2022 · 15 comments
Closed
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@danie1k
Copy link

danie1k commented Mar 31, 2022

The smartctl tool has a bunch of useful arguments, Scrutiny's collector could make use of them, similarly to Telegraf.
I suggested only the nocheck argument, because for me personally it's the most important one because of "standby" option, so the drive is not woken up constantly.

Describe the solution you'd like
Extend collector's config file with more options, covering smartctl's arguments.

Excerpt from smartctl manual

-n POWERMODE, --nocheck=POWERMODE

[ATA only] Specifies if smartctl should exit before performing any checks when the device is in a low-power mode. It may be used to prevent a disk from being spun-up by smartctl. The power mode is ignored by default. A nonzero exit status is returned if the device is in one of the specified low-power modes (see RETURN VALUES below).

Note: If this option is used it may also be necessary to specify the device type with the '-d' option. Otherwise the device may spin up due to commands issued during device type autodetection.

The valid arguments to this option are:

never - check the device always, but print the power mode if '-i' is specified.

sleep - check the device unless it is in SLEEP mode.

standby - check the device unless it is in SLEEP or STANDBY mode. In these modes most disks are not spinning, so if you want to prevent a disk from spinning up, this is probably what you want.

idle - check the device unless it is in SLEEP, STANDBY or IDLE mode. In the IDLE state, most disks are still spinning, so this is probably not what you want.

-- https://linux.die.net/man/8/smartctl

@danie1k
Copy link
Author

danie1k commented Mar 31, 2022

After getting more familiar with the repo... Will that be supported by the command in following section of the config file?

#collect:
# metric:
# enable: true
# command: '-a -o on -S on'

@AnalogJ
Copy link
Owner

AnalogJ commented Apr 28, 2022

Yeah, I eventually want to make the collector command flags completely customizable.
I'll add this as an enhancement, but it'll have to wait until this database upgrade is complete.

@AnalogJ AnalogJ added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Apr 28, 2022
@AnalogJ AnalogJ changed the title [FEAT] Add support for smartctl --nocheck= to the collector [Standby-Support][FEAT] Add support for smartctl --nocheck= to the collector Jun 1, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented Jun 25, 2022

apologies, I should have updated & closed this issue earlier. This functionality has been available since v0.4.9

You can update the collector config files and specify the command flags you'd like to use when Scrutiny triggers smartctl.

# example to show how to override the smartctl command args globally
commands:
  metrics_scan_args: '--scan --json' # used to detect devices
  metrics_info_args: '--info --json' # used to determine device unique ID & register device with Scrutiny
  metrics_smart_args: '--xall --json' # used to retrieve smart data for each device.

Closing this issue for now, feel free to comment/reopen if you have any concerns.

@AnalogJ AnalogJ closed this as completed Jun 25, 2022
@boomam
Copy link
Contributor

boomam commented Nov 19, 2023

Closing this issue for now, feel free to comment/reopen if you have any concerns.

Are there any plans to add these options as env variables in a future release of the collector?

@AnalogJ
Copy link
Owner

AnalogJ commented Nov 19, 2023

You should already be able to do that @boomam COLLECTOR_COMMANDS_METRICS_SCAN_ARGS etc

@boomam
Copy link
Contributor

boomam commented Nov 20, 2023

You should already be able to do that @boomam COLLECTOR_COMMANDS_METRICS_SCAN_ARGS etc

Cool, i didn't see it documented anywhere (unless i egregiously missed it....) , so i thought I'd ask.

 
For my initial test, adding COLLECTOR_COMMANDS_METRICS_SCAN_ARGS as an env variable on the collector container, and setting it to -n standby or --nocheck=standby did not work.
I'll try and work out the syntax for stopping drives spinning up/similar and post a PR for the collectors doc page. :-)

@kocane
Copy link

kocane commented Dec 29, 2023

@boomam did you ever figure this out? trying to do the same thing

@boomam
Copy link
Contributor

boomam commented Dec 31, 2023

No, i gave up and decided to use a different method to get the same data.

@kocane
Copy link

kocane commented Dec 31, 2023

No, i gave up and decided to use a different method to get the same data.

Would you share this method with me :D?

@WhoTheHeck
Copy link

I also couldn't get it to work that way. And now use a workaround. I have multiple hosts sending data to scrutiny. Therefore, each of them has the collector running and the container itself has no drives handed over:

version: '3.5'

services:
  scrutiny:
    container_name: scrutiny
    image: ghcr.io/analogj/scrutiny:master-omnibus
    cap_add:
      - SYS_RAWIO
    ports:
      - "8080:8080"
      - "8086:8086"
    volumes:
      - /run/udev:/run/udev:ro
      - /pathto/config:/opt/scrutiny/config
      - /pathto/influxdb:/opt/scrutiny/influxdb
    devices:
      - /dev/null:/dev/null
    restart: unless-stopped

On the host with sleeping drives I use 2 config files. One that checks all drives and one that has the drives that go to sleep ignored with:

# drives to ignore:
devices:
  - device: /dev/sdc
    ignore: true
  - device: /dev/sdd
    ignore: true
  - device: /dev/sde
    ignore: true
# scrutiny endpoint:
api:
  endpoint: 'http://interalip:port'

I use a script that checks the drives state first. This doesn't wake them up. Based on that I run the collector with one or the other config:

#!/bin/bash

# Function to check if any of the drives are in standby
check_standby() {
  for drive in "sdc" "sdd" "sde"; do
    state=$(smartctl -i -n standby /dev/$drive | grep "ACTIVE or IDLE")
    if [[ -z $state ]]; then
      return 1
    fi
  done
  return 0
}

# Check if any drive is in standby and execute the corresponding scrutiny collector
if check_standby; then
  /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --config /opt/scrutiny/bin/alldrives.yaml
else
  /opt/scrutiny/bin/scrutiny-collector-metrics-linux-amd64 run --config /opt/scrutiny/bin/nowakeup.yaml
fi

...but it would be much easier if scrutiny could be configured to just not force-wake disks.

@datenzar
Copy link
Contributor

Hi there,

I'm facing the same challenge and want to prevent my disks from staying active. I already set the ENV variables correspondingly but without success.

COLLECTOR_COMMANDS_METRICS_SCAN_ARGS: "--scan --json -n idle"
COLLECTOR_COMMANDS_METRICS_INFO_ARGS: "--info --json -n idle"
COLLECTOR_COMMANDS_METRICS_SMART_ARGS: "--xall --json -n idle"

My guess is that the code is missing the respective binding so I created a PR. It's in draft mode as I haven't had the chance to test it.

@datenzar
Copy link
Contributor

An update from my testing efforts show bad results. Unfortunately, neither my above solution nor the configuration keeps the disk in standby mode. I guess there's another command somewhere executed in the code which isn't covered and causes the disks to spin up but above solution.

I used the solution #221 (comment) which works (thanks @WhoTheHeck!) and will use it. Not very elegant but working.

@c0ldtech
Copy link

c0ldtech commented Apr 3, 2024

@AnalogJ is there any documentation how it should work to change the parameters so we could (for example) include the --nocheck=standby option of smartctl?

I would really like to avoid spinning up my drives unnecessarily and it would be nice if there was an "official" way of achieving this without adding custom scripts or changing existing code in the container.

@AnalogJ
Copy link
Owner

AnalogJ commented Sep 8, 2024

I think c.SetEnvPrefix("COLLECTOR") is missing.

https://github.com/AnalogJ/scrutiny/pull/619/files

Can someone make a PR (and test that change)?

@datenzar
Copy link
Contributor

datenzar commented Oct 14, 2024

I think c.SetEnvPrefix("COLLECTOR") is missing.

https://github.com/AnalogJ/scrutiny/pull/619/files

Can someone make a PR (and test that change)?

Hi @AnalogJ,

sorry for the late reply. I was able to complete the fix. Please see my PR. Thx

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants