所以我们build立了一个服务器( 11.0-RELEASE-p2 ),主机大约有150-200个监狱。 服务器有24个内核和192g的ram。 当使用顶部时,它显示没有压力的迹象 – 除了高负荷。 所有的jail驻留在NFS挂载上,每个jail在创build时都挂载自己的目录。 服务器不会感觉到任何慢,它相当活泼。 困扰我们的一件事是我们得到的高负荷。
从顶部输出:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48 5325 processes:1 running, 5324 sleeping CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other Swap: 4096M Total, 4096M Free
正如你所看到的,负载很高,内存有138G空闲,CPU空闲94%。
从systat -vmstat输出
3 users Load 92.59 105 73.97 Feb 1 10:39 Mem usage: 26%Phy 6%Kmem Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 21491k 223884 120800k 555864 144668k count All 22230k 836948 142997k 4351592 pages Proc: Interrupts rpdsw Csw Trp Sys Int Sof Flt ioflt 3595 total 104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1 730 zfod 1 ata1 15 1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci | | | | | | | | | | %ozfod ehci0 ohci =>> daefr 107 cpu0:timer dtbuf 622 prcfr 722 bce0 259 Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260 Calls hits % hits % 3237760 numvn react pcib7 263 41265 41201 100 2713450 frevn pdwak 21 mps0 264 1290 pdpgs ciss0 265 Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer %busy 0 0 0 0 0 0 cache 132 cpu5:timer 144669k free 52 cpu1:timer 921954 68 cpu19:time 99 cpu21:time 54 cpu20:time 59 cpu18:time 59 cpu22:time 82 cpu23:time 67 cpu12:time 68 cpu6:timer 79 cpu14:time 88 cpu15:time 111 cpu16:time 93 cpu17:time 49 cpu8:timer 251 cpu7:timer 102 cpu9:timer 176 cpu10:time 49 cpu11:time
据我所知,没有任何东西看起来真的很奇怪。 当然,也有一些中断,但是使用谷歌search表明,与其他人打断中断的问题相比,我们得到的中断数量是多余的。
iostat -w 1
tty da0 da1 cd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99 0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95 0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96 0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92 0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95 0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93 0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96 0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97 0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94 0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94 0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97 0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96 0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94 0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu rbw avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99 2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92 2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91 2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93 1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96 1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94 1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96 2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94 1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90 0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94 1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94 0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90 1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97 1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
关于NFS我真的不知道如何在那里寻找问题。 但是这里是一个输出
nfsstat -c
Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 44956931 1020943 93567574 167 23609403 879028 514647 665228 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 36867 1387 1 24655 21955 6118822 0 26166205 Mknod Fsstat Fsinfo PathConf Commit 0 5489407 1 2270 830867 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 203906224 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses -719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028 BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses 144 167 14572148 5721030 5124486 1455 -1123294109 26165764
和来自
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir 5 0 0 5 0 0 0 2 9 342 0 9 0 0 42 9 12 91 0 21 0 0 21 4 0 2 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 5 0 0 0 0 2 0 5 124 0 5 0 0 0 2 6 12 0 5 0 0 12 2 4 0 0 5 0 0 0 2 9 0 0 10 0 0 0 4 4 0 0 5 0 0 0 2 50 1 0 14 0 0 0 7
并最终输出
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6 Interface Traffic Peak Total lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB out 34.285 KB/s 291.936 KB/s 69.263 GB bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB out 56.828 KB/s 238.912 KB/s 91.154 GB bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB out 13.799 KB/s 287.402 KB/s 64.000 GB
按要求dmesg:
[larsemil@prison01 ~]$ dmesg Limiting open port RST response from 213 to 200 packets/sec Limiting open port RST response from 2636 to 200 packets/sec pid 22548 (php-fpm), uid 10000: exited on signal 11 pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped) [zone: pf states] PF states limit reached Limiting icmp ping response from 9592 to 200 packets/sec Limiting icmp ping response from 611 to 200 packets/sec Limiting icmp ping response from 1792 to 200 packets/sec Limiting icmp ping response from 2650 to 200 packets/sec Limiting icmp ping response from 316 to 200 packets/sec Limiting icmp ping response from 1758 to 200 packets/sec Limiting icmp ping response from 2478 to 200 packets/sec Limiting icmp ping response from 578 to 200 packets/sec Limiting icmp ping response from 2028 to 200 packets/sec Limiting icmp ping response from 3175 to 200 packets/sec Limiting icmp ping response from 245 to 200 packets/sec Limiting icmp ping response from 536 to 200 packets/sec Limiting icmp ping response from 229 to 200 packets/sec Limiting icmp ping response from 546 to 200 packets/sec Limiting icmp ping response from 2239 to 200 packets/sec Limiting icmp ping response from 3414 to 200 packets/sec Limiting icmp ping response from 3033 to 200 packets/sec Limiting icmp ping response from 1018 to 200 packets/sec Limiting icmp ping response from 270 to 200 packets/sec pid 34239 (php-fpm), uid 10000: exited on signal 11 pid 68427 (php-fpm), uid 10000: exited on signal 11
任何想法都欢迎!
你能发布dmesg输出和来自/ var / log / messages的任何日志消息吗?
我看到的是,你有一个196GB的ram机器,试图在3GB内存中做任何事情……这可能是疯狂的交换。
Mem:3116M主动,23G Inact,23G有线,900M Buf,138G免费ARC:10G总计,2612M MFU,4553M MRU,37M匿名,89M标头,2742M其他
免费的RAM是不好的。 您需要在机器中使用内存。 请寄出sysctl vfs.zfs.arc_max的输出在这里检查ARC的zfs调优
监狱本身基本上什么都不做。 监狱中的进程如果正在运行,将显示在最前面 – 看起来没有多less进展。
FreeBSD的顶端是不同的,LA应该是相对于核心数目(24)的读取。 你的LA很高,但这只是因为某些东西无法获得它所需要的记忆。
尝试:
sysctl kern.eventtimer.timer=HPET