# Uninterruptible Sleep（不可中断的睡眠）

Uninterruptible Sleep

# Uninterruptible Sleep

Nov 16, 2015

One of the curious features of Unix systems (including Linux) is the “uninterruptible sleep” state. This is a state that a process can enter when doing certain system calls. In this state, the process is blocked performing a sytem call, and the process cannot be interrupted (or killed) until the system call completes. Most of these uninterruptible system calls are effectively instantaneous meaning that you never observe the uninterruptible nature of the system call. In rare cases (often because of buggy kernel drivers, but possibly for other reasons) the process can get stuck in this uninterruptible state. This is very similar to the zombie process state in the sense that you cannot kill a process in this state, although it’s worth that the two cases happen for different reasons. Typically when a process is wedged in the uninterruptible sleep state your only recourse is to reboot the system, because there is literally no way to kill the process.

One infamous example of this has been Linux with NFS. For historical reasons certain local I/O operations are not interruptible. For instance, the mkdir(2) system call is not interruptible, which you can verify from its man page by observing that this system call cannot return EINTR. On a normal system the worst case situation for mkdir would be a few disk seeks, which isn’t exactly fast but isn’t the end of the world either. On a networked filesystem like NFS this operation can involve network RPC calls that can block, potentially forever. This means that if you get the right kind of horkage under NFS, a program that calls mkdir(2) can get stuck in the dreaded uninterruptible sleep state forever. When this happens there’s no way to kill the process and the operator has to either live with this zombie-like process or reboot the system. The Linux kernel programmers could “fix” this by making the mkdir(2) system call interruptible so that mkdir(2) could return EINTR. However, historical Unix system since the dawn of time don’t return EINTR for this system call so Linux adopts the same convention.

This was actually a big problem for us at my first job out of college at Yelp. At the time we had just taken the radical step of moving images out of MySQL tables storing the raw image data in a BLOB column, and had moved the images into NFS served from cheap unreliable NFS appliances. Under certain situations the NFS servers would lock up and processes accessing NFS would start entering uninterruptible sleep as they did various I/O operations. When this happened, very quickly (e.g. in a minute or two) every single Apache worker would service a request handler doing one of these I/O operations, and thus 100% of the Apache workers would become stuck in the uninterruptible sleep state. This would quite literally bring down the entire site until we rebooted everything. We eventually “solved” this problem by dropping the NFS dependency and moving things to S3.

Another fun fact about the uninterruptible sleep state is that occassionally it may not be possible to strace a process in this state. The man page for the ptrace system call notes that under rare circumstances attaching to a process using the ptrace system call can cause the traced process to be interrupted. If the process is in uninterruptible sleep then the process can’t be interrupted, which will cause the strace process itself to hang forever. Remarkably, it appears that the ptrace(2) system call is itself uninterruptible, which means that if this happens you may not be able to kill the strace process!

Tonight I learned about a “new” feature in Linux: the TASK_KILLABLE state. This is sort of a compromise between processes in interruptible sleep and processes in uninterruptible sleep. A process in the TASK_KILLABLE state still cannot be interrupted in the usual sense (i.e. you can’t force the system call to return EINTR); however, processes in this state can be killed. This means that, for instance, processes doing I/O over NFS can be killed if they get into a wedged state. Not all system calls implement this state, so it’s still possible to get stuck unkillable processes for some system calls, but it’s certainly an improvement over the previous situation. As usual LWN has a great article on the subject including information about the historical semantics of uinterruptible sleep on Linux.

# 译文

2015年11月16日

Unix系统（包括Linux）的一个奇怪特性是“不间断睡眠”状态。这是进程在执行某些系统调用时可以进入的状态。在这种状态下，进程将被阻止执行系统调用，并且直到系统调用完成后，进程才能被中断（或终止）。这些不间断的系统调用中的大多数都是有效的瞬时含义，即您永远不会观察到系统调用的不间断性质。在极少数情况下（通常是由于内核驱动程序有故障，但可能是由于其他原因），进程可能会陷入这种不可中断的状态。从某种意义上说，您不能杀死一个进程，这与僵尸进程状态非常相似，尽管这两种情况是由于不同的原因发生的。

https://eklitzke.org/uninterruptible-sleep

04-02 2万+

05-07 1262
05-19 2267
06-24 2806
01-23 1650
08-03 3671
10-08 2466
11-01 1184
03-11 5013
07-10 837
06-28 6901