FreeBSD shows high load, cannot find bottleneckInexplicably high total CPU usage when itemized view shows lessHigh load on serverHigh traffic, slow response: where is the bottleneck?FreeBSD high load loopback interfaceHigh load average due to high system cpu load (%sys)High Server Load cannot figure out whyfiguring out high load cause from top and iotopHigh load cause?high linux server loadIdentifying bottleneck with nginx VPS load testing
In Romance of the Three Kingdoms why do people still use bamboo sticks when papers are already invented?
How could indestructible materials be used in power generation?
Is there a hemisphere-neutral way of specifying a season?
Why is the ratio of two extensive quantities always intensive?
Why does the EU insist on the backstop when it is clear in a no deal scenario they still intend to keep an open border?
Why can't we play rap on piano?
What mechanic is there to disable a threat instead of killing it?
Python: return float 1.0 as int 1 but float 1.5 as float 1.5
Where does SFDX store details about scratch orgs?
How can I prevent hyper evolved versions of regular creatures from wiping out their cousins?
Why doesn't H₄O²⁺ exist?
Does casting Light, or a similar spell, have any effect when the caster is swallowed by a monster?
Why is consensus so controversial in Britain?
Were any external disk drives stacked vertically?
How to show the equivalence between the regularized regression and their constraint formulas using KKT
I Accidentally Deleted a Stock Terminal Theme
Arrow those variables!
How can I make my BBEG immortal short of making them a Lich or Vampire?
Alternative to sending password over mail?
Reserved de-dupe rules
Is the Joker left-handed?
Withdrawals from HSA
What reasons are there for a Capitalist to oppose a 100% inheritance tax?
Is it possible to create light that imparts a greater proportion of its energy as momentum rather than heat?
FreeBSD shows high load, cannot find bottleneck
Inexplicably high total CPU usage when itemized view shows lessHigh load on serverHigh traffic, slow response: where is the bottleneck?FreeBSD high load loopback interfaceHigh load average due to high system cpu load (%sys)High Server Load cannot figure out whyfiguring out high load cause from top and iotopHigh load cause?high linux server loadIdentifying bottleneck with nginx VPS load testing
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
So we have set up a server(11.0-RELEASE-p2) that hosts around 150-200 jails. The server has 24 cores and 192gb of ram. When using top it shows no sign of stress - except the high load. All jails reside on NFS mounts and each jail mounts its own directory upon creation.
The server does not feel slow in any way, its rather snappy. The one thing that bothers us is the high load we get.
Output from top:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
As you can see, the load is high, memory has 138G free and cpu is 94% idle.
Output from systat -vmstat
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
As far as i can tell nothing looks really strange there either. Sure, there are some interrupts but googling shows that interrupts in the amount we get there is nothing compared to what other people get when they have interrupt problems which are more in the line of 350 000 interrupts.
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
About NFS i really dont know how to look for problems there. But here is a output from
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
and from
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
and finally output from
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
As requested dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
Any ideas are welcome!
freebsd high-load
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
|
show 1 more comment
So we have set up a server(11.0-RELEASE-p2) that hosts around 150-200 jails. The server has 24 cores and 192gb of ram. When using top it shows no sign of stress - except the high load. All jails reside on NFS mounts and each jail mounts its own directory upon creation.
The server does not feel slow in any way, its rather snappy. The one thing that bothers us is the high load we get.
Output from top:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
As you can see, the load is high, memory has 138G free and cpu is 94% idle.
Output from systat -vmstat
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
As far as i can tell nothing looks really strange there either. Sure, there are some interrupts but googling shows that interrupts in the amount we get there is nothing compared to what other people get when they have interrupt problems which are more in the line of 350 000 interrupts.
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
About NFS i really dont know how to look for problems there. But here is a output from
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
and from
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
and finally output from
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
As requested dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
Any ideas are welcome!
freebsd high-load
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46
|
show 1 more comment
So we have set up a server(11.0-RELEASE-p2) that hosts around 150-200 jails. The server has 24 cores and 192gb of ram. When using top it shows no sign of stress - except the high load. All jails reside on NFS mounts and each jail mounts its own directory upon creation.
The server does not feel slow in any way, its rather snappy. The one thing that bothers us is the high load we get.
Output from top:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
As you can see, the load is high, memory has 138G free and cpu is 94% idle.
Output from systat -vmstat
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
As far as i can tell nothing looks really strange there either. Sure, there are some interrupts but googling shows that interrupts in the amount we get there is nothing compared to what other people get when they have interrupt problems which are more in the line of 350 000 interrupts.
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
About NFS i really dont know how to look for problems there. But here is a output from
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
and from
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
and finally output from
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
As requested dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
Any ideas are welcome!
freebsd high-load
So we have set up a server(11.0-RELEASE-p2) that hosts around 150-200 jails. The server has 24 cores and 192gb of ram. When using top it shows no sign of stress - except the high load. All jails reside on NFS mounts and each jail mounts its own directory upon creation.
The server does not feel slow in any way, its rather snappy. The one thing that bothers us is the high load we get.
Output from top:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
As you can see, the load is high, memory has 138G free and cpu is 94% idle.
Output from systat -vmstat
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
As far as i can tell nothing looks really strange there either. Sure, there are some interrupts but googling shows that interrupts in the amount we get there is nothing compared to what other people get when they have interrupt problems which are more in the line of 350 000 interrupts.
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
About NFS i really dont know how to look for problems there. But here is a output from
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
and from
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
and finally output from
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
As requested dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
Any ideas are welcome!
freebsd high-load
freebsd high-load
edited Feb 15 '17 at 19:47
larsemil
asked Feb 1 '17 at 10:01
larsemillarsemil
14417
14417
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 2 days ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46
|
show 1 more comment
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46
|
show 1 more comment
2 Answers
2
active
oldest
votes
Can you post dmesg output and any log messages from /var/log/messages?
What I see is that you have a 196GB ram machine that is trying to do everything in 3GB of ram... it is probably swapping furiously.
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Free ram is bad. You need to use the ram in the machine.
Please post the output of sysctl vfs.zfs.arc_max
Check here for zfs tuning for the ARC
Jails themselves do basically nothing. Processes in the jails will show up in top if they are running - looks like not much is going on.
FreeBSD top is different yes, the LA should be read relative to the number of cores (24). Your LA is high, but this is only because something cannot get the memory it needs.
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
add a comment |
try:
sysctl kern.eventtimer.timer=HPET
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f829844%2ffreebsd-shows-high-load-cannot-find-bottleneck%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Can you post dmesg output and any log messages from /var/log/messages?
What I see is that you have a 196GB ram machine that is trying to do everything in 3GB of ram... it is probably swapping furiously.
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Free ram is bad. You need to use the ram in the machine.
Please post the output of sysctl vfs.zfs.arc_max
Check here for zfs tuning for the ARC
Jails themselves do basically nothing. Processes in the jails will show up in top if they are running - looks like not much is going on.
FreeBSD top is different yes, the LA should be read relative to the number of cores (24). Your LA is high, but this is only because something cannot get the memory it needs.
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
add a comment |
Can you post dmesg output and any log messages from /var/log/messages?
What I see is that you have a 196GB ram machine that is trying to do everything in 3GB of ram... it is probably swapping furiously.
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Free ram is bad. You need to use the ram in the machine.
Please post the output of sysctl vfs.zfs.arc_max
Check here for zfs tuning for the ARC
Jails themselves do basically nothing. Processes in the jails will show up in top if they are running - looks like not much is going on.
FreeBSD top is different yes, the LA should be read relative to the number of cores (24). Your LA is high, but this is only because something cannot get the memory it needs.
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
add a comment |
Can you post dmesg output and any log messages from /var/log/messages?
What I see is that you have a 196GB ram machine that is trying to do everything in 3GB of ram... it is probably swapping furiously.
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Free ram is bad. You need to use the ram in the machine.
Please post the output of sysctl vfs.zfs.arc_max
Check here for zfs tuning for the ARC
Jails themselves do basically nothing. Processes in the jails will show up in top if they are running - looks like not much is going on.
FreeBSD top is different yes, the LA should be read relative to the number of cores (24). Your LA is high, but this is only because something cannot get the memory it needs.
Can you post dmesg output and any log messages from /var/log/messages?
What I see is that you have a 196GB ram machine that is trying to do everything in 3GB of ram... it is probably swapping furiously.
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Free ram is bad. You need to use the ram in the machine.
Please post the output of sysctl vfs.zfs.arc_max
Check here for zfs tuning for the ARC
Jails themselves do basically nothing. Processes in the jails will show up in top if they are running - looks like not much is going on.
FreeBSD top is different yes, the LA should be read relative to the number of cores (24). Your LA is high, but this is only because something cannot get the memory it needs.
answered Feb 1 '17 at 15:52
Stefan CaunterStefan Caunter
11
11
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
add a comment |
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
I think its the way of FreeBSD how it reports load. A blog here gives more info (undeadly.org/cgi?action=article&sid=20090715034920) and it says 'On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used.' which would explain the load as there are so many jails running. Machine is not swapping and barely using it disks at all.
– larsemil
Feb 1 '17 at 17:36
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
´sysctl vfs.zfs.arc_max vfs.zfs.arc_max: 199730868224 ´
– larsemil
Feb 1 '17 at 17:38
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
How about dmesg? Anything in the logs? 320 LA still means something is pinning the CPU...
– Stefan Caunter
Feb 1 '17 at 18:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
Updated with dmesg...
– larsemil
Feb 15 '17 at 19:47
add a comment |
try:
sysctl kern.eventtimer.timer=HPET
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
add a comment |
try:
sysctl kern.eventtimer.timer=HPET
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
add a comment |
try:
sysctl kern.eventtimer.timer=HPET
try:
sysctl kern.eventtimer.timer=HPET
edited Feb 13 '17 at 20:00
chicks
3,05072033
3,05072033
answered Feb 13 '17 at 18:18
Allan JudeAllan Jude
961611
961611
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
add a comment |
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
2
2
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Can you explain what this is doing or why?
– chicks
Feb 13 '17 at 20:00
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
Will this simply give better data or does it change anything else?
– larsemil
Feb 14 '17 at 8:32
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f829844%2ffreebsd-shows-high-load-cannot-find-bottleneck%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
this info might be totally wrong but i have seen an unusual high load because of many subshells/procs beeing created every seconds (we had a case where one server did start up ~4-5k valid processes per second). After some finetuning we dropped that number to ~1k. No visible direct impact anywhere, similiar to what you are describing. This was on a debian linux server so uncertain if that can be an issue on BSD as well.
– Dennis Nolte
Feb 1 '17 at 10:09
Be careful, FreeBSD appears to calculate load average different from how Linux does it. That said, I'm wondering if there is any actual problem at all here, aside from that the "load average" numbers feel high but no other indication of any excessive load?
– a CVn
Feb 1 '17 at 10:18
@MichaelKjörling this is basicly my question as well. As the system in general feels snappy i dont know if there really is a problem. Still a load on occation towards 300-400 seems rather excessive.
– larsemil
Feb 1 '17 at 10:19
It's still less than 10% of the processes running on the system at the time of your snapshot (320/5325=6.01%). FWIW, I just posted How is load average calculated on FreeBSD? on our sister site Unix & Linux because I was unable to actually locate any concrete information on how FreeBSD calculates the load average numbers, and your question piqued my curiosity.
– a CVn
Feb 1 '17 at 10:23
I don't have much experience in FreeBSD, but I know that until recently Ubuntu had a hard time calculating the real load on a host as soon as virtualisation came into play. I'm pretty sure your real load is much lower than the system shows you. BSD has the habit of implementing updates very late in non-experimental versions due to stability reasons. Also can it be that BSD restricts the load info given by each jail due to security reasons and so it assumes a higher load?
– Broco
Feb 1 '17 at 10:46