can high load cause server hang and error “blocked for more than 120 seconds”?Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds
How to figure out whether the data is sample data or population data apart from the client's information?
How does a Swashbuckler rogue "fight with two weapons while safely darting away"?
Why is the origin of “threshold” uncertain?
Modify locally tikzset
Why was Germany not as successful as other Europeans in establishing overseas colonies?
Packing rectangles: Does rotation ever help?
Confusion about capacitors
Feels like I am getting dragged in office politics
Colliding particles and Activation energy
Possible to set `foldexpr` using a function reference?
Why is current rating for multicore cable lower than single core with the same cross section?
Why does processed meat contain preservatives, while canned fish needs not?
You look catfish vs You look like a catfish
Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?
Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?
"ne paelici suspectaretur" (Tacitus)
How to back up a running remote server?
Build a trail cart
TikZ how to make supply and demand arrows for nodes?
Phrase for the opposite of "foolproof"
What's the metal clinking sound at the end of credits in Avengers: Endgame?
Do I have to worry about players making “bad” choices on level up?
Is GOCE a satellite or aircraft?
Unexpected email from Yorkshire Bank
can high load cause server hang and error “blocked for more than 120 seconds”?
Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.
I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.
I changed the environment this is on - even physical server and it's still happening.
I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .
Also, irqbalance is not installed, perhaps it would help?
this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .
could cpu or memory issues/no swap left cause this issue?
here is the console message:
[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
linux kernel
add a comment |
Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.
I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.
I changed the environment this is on - even physical server and it's still happening.
I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .
Also, irqbalance is not installed, perhaps it would help?
this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .
could cpu or memory issues/no swap left cause this issue?
here is the console message:
[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
linux kernel
add a comment |
Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.
I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.
I changed the environment this is on - even physical server and it's still happening.
I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .
Also, irqbalance is not installed, perhaps it would help?
this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .
could cpu or memory issues/no swap left cause this issue?
here is the console message:
[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
linux kernel
Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.
I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.
I changed the environment this is on - even physical server and it's still happening.
I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .
Also, irqbalance is not installed, perhaps it would help?
this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .
could cpu or memory issues/no swap left cause this issue?
here is the console message:
[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
linux kernel
linux kernel
edited Jul 5 '12 at 21:49
Tee
asked Jul 5 '12 at 21:41
TeeTee
86114
86114
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Yes, it could.
What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.
irqbalance
might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg
, in particular the stack trace that follows it?
Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.
This cannot be caused by:
- a CPU issue (or rather, that would be an insanely improbable hardware failure),
- a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be
oom-killed
), - a lack of swap (
oom-killer
again).
To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear indmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully yoursyslog
setup will also log it somewhere in/var/log
, but I couldn't know where.
– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in/var/log/dmesg
, but may turn up when you run thedmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enablesysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).
– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
add a comment |
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
Then commit the change with:
sudo sysctl -p
solved it for me....
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
add a comment |
I recently went through this error in one of our Production clusters:
Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
more than 120 seconds.
Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1
Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
..
On further verification of the sar logs Found the IO wait was increased during the same time.
And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.
11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
devId:3 devFlags=f1482005 iocLogInfo:31140000
11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
process devId=x
So this was due to hardware error, in our cluster.
So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.
Regards,
VT
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Yes, it could.
What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.
irqbalance
might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg
, in particular the stack trace that follows it?
Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.
This cannot be caused by:
- a CPU issue (or rather, that would be an insanely improbable hardware failure),
- a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be
oom-killed
), - a lack of swap (
oom-killer
again).
To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear indmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully yoursyslog
setup will also log it somewhere in/var/log
, but I couldn't know where.
– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in/var/log/dmesg
, but may turn up when you run thedmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enablesysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).
– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
add a comment |
Yes, it could.
What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.
irqbalance
might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg
, in particular the stack trace that follows it?
Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.
This cannot be caused by:
- a CPU issue (or rather, that would be an insanely improbable hardware failure),
- a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be
oom-killed
), - a lack of swap (
oom-killer
again).
To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear indmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully yoursyslog
setup will also log it somewhere in/var/log
, but I couldn't know where.
– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in/var/log/dmesg
, but may turn up when you run thedmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enablesysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).
– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
add a comment |
Yes, it could.
What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.
irqbalance
might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg
, in particular the stack trace that follows it?
Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.
This cannot be caused by:
- a CPU issue (or rather, that would be an insanely improbable hardware failure),
- a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be
oom-killed
), - a lack of swap (
oom-killer
again).
To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".
Yes, it could.
What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.
irqbalance
might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg
, in particular the stack trace that follows it?
Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.
This cannot be caused by:
- a CPU issue (or rather, that would be an insanely improbable hardware failure),
- a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be
oom-killed
), - a lack of swap (
oom-killer
again).
To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".
edited Jul 5 '12 at 21:48
answered Jul 5 '12 at 21:43
Pierre CarrierPierre Carrier
2,4521126
2,4521126
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear indmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully yoursyslog
setup will also log it somewhere in/var/log
, but I couldn't know where.
– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in/var/log/dmesg
, but may turn up when you run thedmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enablesysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).
– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
add a comment |
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear indmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully yoursyslog
setup will also log it somewhere in/var/log
, but I couldn't know where.
– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in/var/log/dmesg
, but may turn up when you run thedmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enablesysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).
– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.
– Tee
Jul 5 '12 at 21:51
This message comes from the kernel, it will appear in
dmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog
setup will also log it somewhere in /var/log
, but I couldn't know where.– Pierre Carrier
Jul 5 '12 at 22:19
This message comes from the kernel, it will appear in
dmesg
(if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog
setup will also log it somewhere in /var/log
, but I couldn't know where.– Pierre Carrier
Jul 5 '12 at 22:19
The message will NOT appear in
/var/log/dmesg
, but may turn up when you run the dmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).– Dr. Edward Morbius
Jul 6 '12 at 19:23
The message will NOT appear in
/var/log/dmesg
, but may turn up when you run the dmesg
command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat
and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).– Dr. Edward Morbius
Jul 6 '12 at 19:23
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.
– Lopsided
Mar 21 '14 at 14:45
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
@Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.
– Dr. Edward Morbius
Apr 26 '14 at 10:40
add a comment |
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
Then commit the change with:
sudo sysctl -p
solved it for me....
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
add a comment |
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
Then commit the change with:
sudo sysctl -p
solved it for me....
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
add a comment |
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
Then commit the change with:
sudo sysctl -p
solved it for me....
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
Then commit the change with:
sudo sysctl -p
solved it for me....
edited Apr 21 at 19:32
Glutanimate
1034
1034
answered Feb 21 '16 at 11:48
NickNick
6111
6111
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
add a comment |
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
6
6
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
You should explain what each those settings do.
– kasperd
Feb 21 '16 at 16:36
5
5
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."
– Peter M
Feb 29 '16 at 16:35
add a comment |
I recently went through this error in one of our Production clusters:
Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
more than 120 seconds.
Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1
Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
..
On further verification of the sar logs Found the IO wait was increased during the same time.
And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.
11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
devId:3 devFlags=f1482005 iocLogInfo:31140000
11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
process devId=x
So this was due to hardware error, in our cluster.
So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.
Regards,
VT
add a comment |
I recently went through this error in one of our Production clusters:
Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
more than 120 seconds.
Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1
Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
..
On further verification of the sar logs Found the IO wait was increased during the same time.
And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.
11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
devId:3 devFlags=f1482005 iocLogInfo:31140000
11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
process devId=x
So this was due to hardware error, in our cluster.
So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.
Regards,
VT
add a comment |
I recently went through this error in one of our Production clusters:
Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
more than 120 seconds.
Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1
Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
..
On further verification of the sar logs Found the IO wait was increased during the same time.
And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.
11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
devId:3 devFlags=f1482005 iocLogInfo:31140000
11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
process devId=x
So this was due to hardware error, in our cluster.
So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.
Regards,
VT
I recently went through this error in one of our Production clusters:
Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
more than 120 seconds.
Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1
Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
..
On further verification of the sar logs Found the IO wait was increased during the same time.
And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.
11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
devId:3 devFlags=f1482005 iocLogInfo:31140000
11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
process devId=x
So this was due to hardware error, in our cluster.
So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.
Regards,
VT
answered Nov 12 '15 at 15:27
Varun ThomasVarun Thomas
211
211
add a comment |
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown