我们经常使用top命令来查看CPU利用率,如
root@ubuntu:~# top top – 09:16:29 up 6 min, 4 users, load average: 0.01, 0.22, 0.17 Tasks: 149 total, 1 running, 147 sleeping, 0 stopped, 1 zombie Cpu(s): 2.8%us, 6.7%sy, 0.2%ni, 89.9%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 508000k total, 404092k used, 103908k free, 47764k buffers Swap: 522236k total, 0k used, 522236k free, 184992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 3040 1812 1252 S 0.0 0.4 0:01.81 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.06 ksoftirqd/0 5 root 20 0 0 0 0 S 0.0 0.0 0:00.56 kworker/u:0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 |
Linux系统中计算CPU利用率是通过读取/proc/stat文件数据而计算得来。CPU利用率计算方法如下:
root@ubuntu:~# cat /proc/stat cpu 711 56 2092 7010 104 0 20 0 0 0 cpu0 711 56 2092 7010 104 0 20 0 0 0 intr 31161 94 64 0 1 75 0 3 0 0 0 0 0 1423 0 0 382 2825 4798 0 226 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 101085 btime 1307117390 processes 2078 procs_running 1 procs_blocked 0 softirq 32534 0 7796 151 143 4225 0 81 0 12 20126 root@ubuntu:~# |
第一行cpu为总的信息,cpu0 … cpun为各个具体CPU信息
cpu 711 56 2092 7010 104 0 20 0 0 0
上面共有10个值(单位:jiffies),前面8个值分别为:
User time, 711 Nice time, 56
System time, 2092 Idle time,7010
Waiting time,104 Hard Irq time, 0
SoftIRQ time,20 Steal time,0
CPU时间=user+system+nice+idle+iowait+irq+softirq+Stl
%us=(User time + Nice time)/CPU时间*100%
%sy=(System time + Hard Irq time +SoftIRQ time)/CPU时间*100%
%id=(Idle time)/CPU时间*100%
%ni=(Nice time)/CPU时间*100%
%wa=(Waiting time)/CPU时间*100%
%hi=(Hard Irq time)/CPU时间*100%
%si=(SoftIRQ time)/CPU时间*100%
%st=(Steal time)/CPU时间*100%
我们根据/proc/stat文件来分析Linux内核统计数据实现方式。
内核实现
下面以内核源码版本2.6.32-71.29.1.el6 x86_64为例,来介绍内核源码实现。
/proc/stat文件的创建由函数proc_stat_init()实现,在文件fs/proc/stat.c中,在内核初始化时调用。./proc/stat文件相关函数时间均在stat.c文件中。
对/proc/stat文件的读写方法为proc_stat_operations。
00160: static const struct file_operations proc_stat_operations = {
00161: .open = stat_open,
00162: .read = seq_read,
00163: .llseek = seq_lseek,
00164: .release = single_release,
00165: };
打开文件函数stat_open(),函数首先申请大小为size的内存,来存放临时数据(也是我们看到的stat里的最终数据)。
00136: static int stat_open(struct inode *inode, struct file *file)
00137: {
00138: unsigned size = 4096 * (1 + num_possible_cpus() / 32);
00139: char *buf;
00140: struct seq_file *m;
00141: int res;
00142:
00143: / * don’t ask for more than the kmalloc() max size, currently 128 KB */
00144: if (size > 128 * 1024)
00145: size = 128 * 1024;
00146: buf = kmalloc(size, GFP_KERNEL);
00147: if (! buf)
00148: return – ENOMEM;
00149:
00150: res = single_open(file, show_stat, NULL);
00151: if (! res) {
00152: m = file– >private_data;
00153: m– >buf = buf;
00154: m– >size = size;
00155: } else
00156: kfree(buf);
00157: return res;
00158: } ? end stat_open ?
00159:
/proc/stat文件的数据由show_stat()函数填充。注意43行for_each_possible_cpu(i)循环,
是计算所有CPU的数据,
如我们前面的示例看到的/proc/stat文件中第一行cpu值。
00025: static int show_stat(struct seq_file *p, void *v)
00026: {
00027: int i, j;
00028: unsigned long jif;
00029: cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
00030: cputime64_t guest;
00031: u64 sum = 0;
00032: u64 sum_softirq = 0;
00033: unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
00034: struct timespec boottime;
00035: unsigned int per_irq_sum;
00036:
00037: user = nice = system = idle = iowait =
00038: irq = softirq = steal = cputime64_zero;
00039: guest = cputime64_zero;
00040: getboottime(&boottime);
00041: jif = boottime.tv_sec;
00042:
00043: for_each_possible_cpu(i) {
00044: user = cputime64_add(user, kstat_cpu(i).cpustat.user);
00045: nice = cputime64_add(nice, kstat_cpu(i).cpustat.nice);
00046: system = cputime64_add(system, kstat_cpu(i).cpustat.system);
00047: idle = cputime64_add(idle, kstat_cpu(i).cpustat.idle);
00048: idle = cputime64_add(idle, arch_idle_time(i));
00049: iowait = cputime64_add(iowait, kstat_cpu(i).cpustat.iowait);
00050: irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
00051: softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
00052: steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
00053: guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
00054: for_each_irq_nr(j) {
00055: sum += kstat_irqs_cpu(j, i);
00056: }
计算总的CPU各个值user、nice、system、idle、iowait、irq、softirq、steal后,就分别计算各个CPU的使用情况(78~103行)。
00057: sum += arch_irq_stat_cpu(i);
00058:
00059: for (j = 0; j < NR_SOFTIRQS; j++) {
00060: unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
00061:
00062: per_softirq_sums[j] += softirq_stat;
00063: sum_softirq += softirq_stat;
00064: }
00065: }
00066: sum += arch_irq_stat();
00067:
00068: seq_printf(p, “cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n”,
00069: (unsigned long long)cputime64_to_clock_t(user),
00070: (unsigned long long)cputime64_to_clock_t(nice),
00071: (unsigned long long)cputime64_to_clock_t(system),
00072: (unsigned long long)cputime64_to_clock_t(idle),
00073: (unsigned long long)cputime64_to_clock_t(iowait),
00074: (unsigned long long)cputime64_to_clock_t(irq),
00075: (unsigned long long)cputime64_to_clock_t(softirq),
00076: (unsigned long long)cputime64_to_clock_t(steal),
00077: (unsigned long long)cputime64_to_clock_t(guest));
00078: for_each_online_cpu(i) {
00079:
00080: / * Copy values here to work around gcc- 2.95.3, gcc- 2.96 */
00081: user = kstat_cpu(i).cpustat.user;
00082: nice = kstat_cpu(i).cpustat.nice;
00083: system = kstat_cpu(i).cpustat.system;
00084: idle = kstat_cpu(i).cpustat.idle;
00085: idle = cputime64_add(idle, arch_idle_time(i));
00086: iowait = kstat_cpu(i).cpustat.iowait;
00087: irq = kstat_cpu(i).cpustat.irq;
00088: softirq = kstat_cpu(i).cpustat.softirq;
00089: steal = kstat_cpu(i).cpustat.steal;
00090: guest = kstat_cpu(i).cpustat.guest;
00091: seq_printf(p,
00092: “cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu\n”,
00093: i,
00094: (unsigned long long)cputime64_to_clock_t(user),
00095: (unsigned long long)cputime64_to_clock_t(nice),
00096: (unsigned long long)cputime64_to_clock_t(system),
00097: (unsigned long long)cputime64_to_clock_t(idle),
00098: (unsigned long long)cputime64_to_clock_t(iowait),
00099: (unsigned long long)cputime64_to_clock_t(irq),
00100: (unsigned long long)cputime64_to_clock_t(softirq),
00101: (unsigned long long)cputime64_to_clock_t(steal),
00102: (unsigned long long)cputime64_to_clock_t(guest));
00103: }
00104: seq_printf(p, “intr %llu”, (unsigned long long)sum);
00105:
00106: / * sum again ? it could be updated? */
00107: for_each_irq_nr(j) {
00108: per_irq_sum = 0;
00109: for_each_possible_cpu(i)
00110: per_irq_sum += kstat_irqs_cpu(j, i);
00111:
00112: seq_printf(p, ” %u”, per_irq_sum);
00113: }
00114:
00115: seq_printf(p,
00116: “\nctxt %llu\n”
00117: “btime %lu\n”
00118: “processes %lu\n”
00119: “procs_running %lu\n”
00120: “procs_blocked %lu\n”,
00121: nr_context_switches(),
00122: (unsigned long)jif,
00123: total_forks ,
00124: nr_running(),
00125: nr_iowait());
00126:
00127: seq_printf(p, “softirq %llu”, (unsigned long long)sum_softirq);
00128:
00129: for (i = 0; i < NR_SOFTIRQS; i++)
00130: seq_printf(p, ” %u”, per_softirq_sums[i]);
00131: seq_printf(p, “\n”);
00132:
00133: return 0;
00134: } ? end show_stat ?
00135:
104行计算所有CPU上中断次数,107~113行计算CPU上每个中断向量的中断次数。注意:/proc/stat文件中,将所有可能的NR_IRQS个中断向量计数都记录下来,但我们的机器上通过只是用少量的中断向量,这就是看到/proc/stat文件中,intr一行后面很多值为0的原因。
show_stat()函数最后获取进程切换次数nctxt、内核启动的时间btime、所有创建的进程processes、正在运行进程的数量procs_running、阻塞的进程数量procs_blocked和所有io等待的进程数量。
最后我们解释一下user、nice、system、idle、iowait、irq、softirq、steal值的含义:
- 用户时间(User time)
表示CPU执行用户进程的时间,包括nices时间。通常期望用户空间CPU越高越好。
- 系统时间(System time)
表示CPU在内核运行时间,包括IRQ和softirq时间。系统CPU占用率高,表明系统某部分存在瓶颈。通常值越低越好。
- 等待时间(Waiting time)
CPI在等待I/O操作完成所花费的时间。系统部应该花费大量时间来等待I/O操作,否则就说明I/O存在瓶颈。
- 空闲时间(Idle time)
系统处于空闲期,等待进程运行。
- Nice时间(Nice time)
系统调整进程优先级所花费的时间。
- 硬中断处理时间(Hard Irq time)
系统处理硬中断所花费的时间。
- 软中断处理时间(SoftIrq time)
系统处理软中断中断所花费的时间。
- 丢失时间(Steal time)
被强制等待(involuntary wait)虚拟CPU的时间,此时hypervisor在为另一个虚拟处理器服务。
更详细分析,请参考 http://ilinuxkernel.com/?p=1265
你的分析很详细啊!看来我要经常关注你的blog了。
感谢关注,欢迎给出意见或指正错误。:-)
看了您关于CPU利用率不准确的文章,请教两个问题:
1. Ticks是CPU利用率计算的最小时间片,内核在统计CPU利用率的时候,是否可以认为统计时刻的进程状态即被看做这个时间片(Tick)的CPU状态? 例如,在Tick中进程A/B/C切换,但是统计时刻看到进程A在运行,即将这个Tick算作进程A的使用时间?也将这个时间片标记为用户态CPU使用时间(就是/proc/stat中的值)?
2. 4核的虚拟机,linux内核版本在3.0以上,有一个多线程的java进程,top命令查看CPU利用率,总的CPU利用率(1-idle)在40%左右,但是那个java进程CPU利用率是300%(300%以上,那就是3核都使用了,那总的CPU利用率应该在75%以上呀),这个怎么理解呢?
这里所说的io_wait的统计有点困惑, 是如何知道这个cpu是处在一个io请求状态,io请求不是应该是针对某个进程来说的嘛?