Linux Hard Disk Spin-Down

with a brief introduction to I/O

Mukesh Agrawal

agrawal at ais dot org


The purpose of this document is to explain how to use hard disk spin-down features under Linux. Presumably, you have decided that the hard disk on your system either consumes too much power, or is too noisy, so you would like to keep it spun down when not in use.

Disclaimer: The information within is based on a few simple experiments, and reading just a little bit of the kernel source code (mostly fs/buffer.c). I focus on the common case (ignoring things such as raw I/O), so some of the things I state may not apply to your configuration. Moreover, the information has not been checked by anyone other than myself, and may contain errors. Please email corrections to

Introduction to I/O

In order to make effective use of the spin-down ability of your hard disk, it helps to understand a little bit about how I/O works in Linux. The goal of this section is to provide just that "little bit" of information.

I/O consists of reads and writes. Reads are fairly simple. When the system needs a block of data, it first checks the buffer cache. The buffer cache contains data that have previously been read from disk, and which the OS believes may be needed in the immediate future. If the data is not found in the buffer cache, the kernel will read the data from disk.

Writes are slightly more complicated. When a process writes to disk, the kernel does not necessarily write the data to disk immediately. Instead, it updates its in memory copy of the block, but defers the disk update until later (an exception to this is files that are open for synchronous writing). There are two kernel threads that handle these deferred updates: bdflush and kupdate. [1]

The bdflush thread writes updates to disk when either of two conditions occur. One condition that can cause bdflush to update the disk is when the virtual memory system is having trouble finding enough free memory to satisfy allocation requests. In this case, the VM system will ask bdflush to free pages by flushing the written data to disk (thereby freeing the buffer space used by that data). The other event that will cause bdflush to update the disk is when the fraction of dirty buffers (those containing modified data which has not been updated on disk) in the buffer cache exceeds a threshold. This is described in Documentation/filesystems/proc.txt.

The other kernel thread that deals with flushing modified data to disk is kupdate. In normal usage, kupdate periodically writes modified buffers to disk. The period is set using the fifth field in /proc/sys/vm/bdflush. The value is specified in hertz. So on most Linux systems, which use a hertz value of 100, a value of 500 in the fifth field of /proc/sys/vm/bdflush would means that kupdate will flush dirty buffers to disk every 5 seconds.

The kupdate thread supports another mode of operation as well. It listens for SIGSTOP and SIGCONT signals. On receiving a SIGSTOP signal, kupdate stops its periodic operation, and instead, waits for SIGCONT. When it receives SIGCONT, it will flush dirty buffers. A similar effect can be achieved by setting the period for kupdate to 0 instead of sending the thread SIGSTOP. This mechanism enables a user space process to implement some other policy for when buffers are flushed.

In addition to these methods for flushing buffers, processes can specifically request that buffers be flushed to disk by using the sync(), fsync(), and fdatasync() system calls.

Notes

[1]

kupdate is called kupdated in more recent versions of the Linux kernel. I will just use the name kupdate for simplicity.