can high load cause server hang and error “blocked for more than 120 seconds”?Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds

How to figure out whether the data is sample data or population data apart from the client's information?

How does a Swashbuckler rogue "fight with two weapons while safely darting away"?

Why is the origin of “threshold” uncertain?

Modify locally tikzset

Why was Germany not as successful as other Europeans in establishing overseas colonies?

Packing rectangles: Does rotation ever help?

Confusion about capacitors

Feels like I am getting dragged in office politics

Colliding particles and Activation energy

Possible to set `foldexpr` using a function reference?

Why is current rating for multicore cable lower than single core with the same cross section?

Why does processed meat contain preservatives, while canned fish needs not?

You look catfish vs You look like a catfish

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?

"ne paelici suspectaretur" (Tacitus)

How to back up a running remote server?

Build a trail cart

TikZ how to make supply and demand arrows for nodes?

Phrase for the opposite of "foolproof"

What's the metal clinking sound at the end of credits in Avengers: Endgame?

Do I have to worry about players making “bad” choices on level up?

Is GOCE a satellite or aircraft?

Unexpected email from Yorkshire Bank



can high load cause server hang and error “blocked for more than 120 seconds”?


Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








17















Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



I changed the environment this is on - even physical server and it's still happening.



I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



Also, irqbalance is not installed, perhaps it would help?



this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



could cpu or memory issues/no swap left cause this issue?



here is the console message:



[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.









share|improve this question






























    17















    Currently running a few VM's and 'baremetal' servers.
    Java is running on high - over 400%+ at times.
    Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



    I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



    I changed the environment this is on - even physical server and it's still happening.



    I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



    Also, irqbalance is not installed, perhaps it would help?



    this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



    could cpu or memory issues/no swap left cause this issue?



    here is the console message:



    [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
    [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
    [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
    [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
    [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
    [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
    [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.









    share|improve this question


























      17












      17








      17


      8






      Currently running a few VM's and 'baremetal' servers.
      Java is running on high - over 400%+ at times.
      Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



      I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



      I changed the environment this is on - even physical server and it's still happening.



      I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



      Also, irqbalance is not installed, perhaps it would help?



      this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



      could cpu or memory issues/no swap left cause this issue?



      here is the console message:



      [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
      [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
      [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.









      share|improve this question
















      Currently running a few VM's and 'baremetal' servers.
      Java is running on high - over 400%+ at times.
      Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



      I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



      I changed the environment this is on - even physical server and it's still happening.



      I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



      Also, irqbalance is not installed, perhaps it would help?



      this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



      could cpu or memory issues/no swap left cause this issue?



      here is the console message:



      [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
      [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
      [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.






      linux kernel






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 5 '12 at 21:49







      Tee

















      asked Jul 5 '12 at 21:41









      TeeTee

      86114




      86114




















          3 Answers
          3






          active

          oldest

          votes


















          16














          Yes, it could.



          What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



          irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



          Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



          This cannot be caused by:



          • a CPU issue (or rather, that would be an insanely improbable hardware failure),

          • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

          • a lack of swap (oom-killer again).

          To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






          share|improve this answer

























          • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51











          • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19












          • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23












          • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45











          • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40


















          6














          sudo sysctl -w vm.dirty_ratio=10
          sudo sysctl -w vm.dirty_background_ratio=5


          Then commit the change with:



          sudo sysctl -p


          solved it for me....






          share|improve this answer




















          • 6





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36






          • 5





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35


















          2














          I recently went through this error in one of our Production clusters:




          Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
          more than 120 seconds.



          Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



          Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




          ..



          On further verification of the sar logs Found the IO wait was increased during the same time.



          And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




          11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
          RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
          devId:3 devFlags=f1482005 iocLogInfo:31140000



          11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
          devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
          process devId=x




          So this was due to hardware error, in our cluster.



          So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



          Regards,
          VT






          share|improve this answer























            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "2"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            16














            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer

























            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40















            16














            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer

























            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40













            16












            16








            16







            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer















            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jul 5 '12 at 21:48

























            answered Jul 5 '12 at 21:43









            Pierre CarrierPierre Carrier

            2,4521126




            2,4521126












            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40

















            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40
















            There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51





            There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51













            This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19






            This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19














            The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23






            The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23














            @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45





            @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45













            @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40





            @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40













            6














            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer




















            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35















            6














            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer




















            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35













            6












            6








            6







            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer















            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 21 at 19:32









            Glutanimate

            1034




            1034










            answered Feb 21 '16 at 11:48









            NickNick

            6111




            6111







            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35












            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35







            6




            6





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36




            5




            5





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35











            2














            I recently went through this error in one of our Production clusters:




            Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
            more than 120 seconds.



            Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



            Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




            ..



            On further verification of the sar logs Found the IO wait was increased during the same time.



            And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




            11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
            RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
            devId:3 devFlags=f1482005 iocLogInfo:31140000



            11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
            devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
            process devId=x




            So this was due to hardware error, in our cluster.



            So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



            Regards,
            VT






            share|improve this answer



























              2














              I recently went through this error in one of our Production clusters:




              Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
              more than 120 seconds.



              Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



              Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




              ..



              On further verification of the sar logs Found the IO wait was increased during the same time.



              And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




              11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
              RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
              devId:3 devFlags=f1482005 iocLogInfo:31140000



              11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
              devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
              process devId=x




              So this was due to hardware error, in our cluster.



              So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



              Regards,
              VT






              share|improve this answer

























                2












                2








                2







                I recently went through this error in one of our Production clusters:




                Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
                more than 120 seconds.



                Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



                Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




                ..



                On further verification of the sar logs Found the IO wait was increased during the same time.



                And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




                11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
                RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
                devId:3 devFlags=f1482005 iocLogInfo:31140000



                11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
                devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
                process devId=x




                So this was due to hardware error, in our cluster.



                So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



                Regards,
                VT






                share|improve this answer













                I recently went through this error in one of our Production clusters:




                Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
                more than 120 seconds.



                Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



                Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




                ..



                On further verification of the sar logs Found the IO wait was increased during the same time.



                And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




                11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
                RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
                devId:3 devFlags=f1482005 iocLogInfo:31140000



                11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
                devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
                process devId=x




                So this was due to hardware error, in our cluster.



                So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



                Regards,
                VT







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 12 '15 at 15:27









                Varun ThomasVarun Thomas

                211




                211



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Server Fault!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                    Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                    What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company