can high load cause server hang and error “blocked for more than 120 seconds”?Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds

How to figure out whether the data is sample data or population data apart from the client's information?

How does a Swashbuckler rogue "fight with two weapons while safely darting away"?

Why is the origin of “threshold” uncertain?

Modify locally tikzset

Why was Germany not as successful as other Europeans in establishing overseas colonies?

Packing rectangles: Does rotation ever help?

Confusion about capacitors

Feels like I am getting dragged in office politics

Colliding particles and Activation energy

Possible to set `foldexpr` using a function reference?

Why is current rating for multicore cable lower than single core with the same cross section?

Why does processed meat contain preservatives, while canned fish needs not?

You look catfish vs You look like a catfish

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?

"ne paelici suspectaretur" (Tacitus)

How to back up a running remote server?

Build a trail cart

TikZ how to make supply and demand arrows for nodes?

Phrase for the opposite of "foolproof"

What's the metal clinking sound at the end of credits in Avengers: Endgame?

Do I have to worry about players making “bad” choices on level up?

Is GOCE a satellite or aircraft?

Unexpected email from Yorkshire Bank



can high load cause server hang and error “blocked for more than 120 seconds”?


Linux NFS hangs after about 15 minutesLinux Kernel crash mutex_lock_slowpath “blocked for more than 120 seconds”. What to do?Log - Server kernel: INFO: task httpd:000000 blocked for more than 120 secondstask blocked for more than 120 secondshow to control system load of a particular process? eg. JavaRandomly crashing Ubuntu 10.04 on multiple Xen VPS host instancesUbuntu 10.04 Xen guest - why would time drift be proportionate with the system load?“postgres blocked for more than 120 seconds” - is my db still consistent?Kernel 3.8, Apache2 with WSGI : INFO: task apache2 blocked for more than 120 secondstask fstrim blocked for more than 120 secondstask nginx:4164 blocked for more than 120 seconds






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








17















Currently running a few VM's and 'baremetal' servers.
Java is running on high - over 400%+ at times.
Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



I changed the environment this is on - even physical server and it's still happening.



I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



Also, irqbalance is not installed, perhaps it would help?



this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



could cpu or memory issues/no swap left cause this issue?



here is the console message:



[58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
[58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
[58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
[58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
[58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.
[58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
[58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
message.









share|improve this question






























    17















    Currently running a few VM's and 'baremetal' servers.
    Java is running on high - over 400%+ at times.
    Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



    I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



    I changed the environment this is on - even physical server and it's still happening.



    I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



    Also, irqbalance is not installed, perhaps it would help?



    this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



    could cpu or memory issues/no swap left cause this issue?



    here is the console message:



    [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
    [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
    [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
    [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
    [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
    [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.
    [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
    [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
    message.









    share|improve this question


























      17












      17








      17


      8






      Currently running a few VM's and 'baremetal' servers.
      Java is running on high - over 400%+ at times.
      Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



      I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



      I changed the environment this is on - even physical server and it's still happening.



      I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



      Also, irqbalance is not installed, perhaps it would help?



      this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



      could cpu or memory issues/no swap left cause this issue?



      here is the console message:



      [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
      [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
      [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.









      share|improve this question
















      Currently running a few VM's and 'baremetal' servers.
      Java is running on high - over 400%+ at times.
      Randomly the server hangs with the error in the console "java - blocked for more than 120 seconds" - kjournald, etc.



      I cannot get a dmesg output because for some reason this error only writes to the console, which I don't have access to since this is remotely hosted. therefore I cannot copy a full trace.



      I changed the environment this is on - even physical server and it's still happening.



      I changed hung_task_timeout_secs to 0 incase this is a false positive as per http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Technical_Notes/deployment.html .



      Also, irqbalance is not installed, perhaps it would help?



      this is Ubuntu 10.04 64bit - same issue with latest 2.6.38-15-server and 2.6.36 .



      could cpu or memory issues/no swap left cause this issue?



      here is the console message:



      [58Z?Z1.5?Z840] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z?Z1.5?Z986] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z06Z] INFUI task kjournald:190 blocked for more than 120 seconds.
      [58Z841.5?Z336] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?Z600] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z841.5?Z90?] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z841.5?3413] INFUI task java:21547 blocked for more than 120 seconds.
      [58Z841.5?368Z] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?ZZ36] INFUI task kjournald:60 blocked for more than 120 seconds.
      [58Z961.5?Z6Z5] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.
      [58Z961.5?31ZZ] INFUI task flush-202:0:709 blocked for more than 120 seconds.
      [58Z961.5?3393] "echo 0 > /proc/sgs/kernel/hung_task_timeout_secs" disables this
      message.






      linux kernel






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 5 '12 at 21:49







      Tee

















      asked Jul 5 '12 at 21:41









      TeeTee

      86114




      86114




















          3 Answers
          3






          active

          oldest

          votes


















          16














          Yes, it could.



          What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



          irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



          Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



          This cannot be caused by:



          • a CPU issue (or rather, that would be an insanely improbable hardware failure),

          • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

          • a lack of swap (oom-killer again).

          To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






          share|improve this answer

























          • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51











          • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19












          • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23












          • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45











          • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40


















          6














          sudo sysctl -w vm.dirty_ratio=10
          sudo sysctl -w vm.dirty_background_ratio=5


          Then commit the change with:



          sudo sysctl -p


          solved it for me....






          share|improve this answer




















          • 6





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36






          • 5





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35


















          2














          I recently went through this error in one of our Production clusters:




          Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
          more than 120 seconds.



          Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



          Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




          ..



          On further verification of the sar logs Found the IO wait was increased during the same time.



          And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




          11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
          RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
          devId:3 devFlags=f1482005 iocLogInfo:31140000



          11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
          devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
          process devId=x




          So this was due to hardware error, in our cluster.



          So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



          Regards,
          VT






          share|improve this answer























            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "2"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            16














            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer

























            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40















            16














            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer

























            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40













            16












            16








            16







            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".






            share|improve this answer















            Yes, it could.



            What this means is fairly explicit: the kernel couldn't schedule the task for 120 seconds. This indicates resource starvation, often around disk access.



            irqbalance might help, but that doesn't sound obvious. Can you provide us with the surrounding of this message in dmesg, in particular the stack trace that follows it?



            Moreover, this is not a false positive. This does not say that the task is hung forever, and the statement is perfectly correct. That doesn't mean it's a problem for you, and you can decide to ignore it if you don't notice any user impact.



            This cannot be caused by:



            • a CPU issue (or rather, that would be an insanely improbable hardware failure),

            • a memory issue (very improbably a hardware failure, but wouldn't happen multiple times; not a lack of RAM as a process would be oom-killed),

            • a lack of swap (oom-killer again).

            To an extend, you might be able blame this on a lack of memory in the sense that depriving your system of data caching in RAM will cause more I/O. But it's not as straightforward as "running out of memory".







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jul 5 '12 at 21:48

























            answered Jul 5 '12 at 21:43









            Pierre CarrierPierre Carrier

            2,4521126




            2,4521126












            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40

















            • There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

              – Tee
              Jul 5 '12 at 21:51











            • This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

              – Pierre Carrier
              Jul 5 '12 at 22:19












            • The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

              – Dr. Edward Morbius
              Jul 6 '12 at 19:23












            • @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

              – Lopsided
              Mar 21 '14 at 14:45











            • @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

              – Dr. Edward Morbius
              Apr 26 '14 at 10:40
















            There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51





            There is nothing being recorded to /var/log/dmesg so I just pasted what the Console showed.. when this appears the system is 100% hung.

            – Tee
            Jul 5 '12 at 21:51













            This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19






            This message comes from the kernel, it will appear in dmesg (if it was logged recently enough) as this command prints the kernel logging ring buffer. Hopefully your syslog setup will also log it somewhere in /var/log, but I couldn't know where.

            – Pierre Carrier
            Jul 5 '12 at 22:19














            The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23






            The message will NOT appear in /var/log/dmesg, but may turn up when you run the dmesg command. The file is created during the boot process and generally only captures boot-time kernel messages (which would otherwise eventually scroll out of the kernel ring buffer. You could also install/enable sysstat and look at resource utilization as reported there. I'm suspecting disk I/O / iowait, likely related to swapping (sysstat will help in identifying this).

            – Dr. Edward Morbius
            Jul 6 '12 at 19:23














            @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45





            @Dr.EdwardMorbius So how do we fix this? I'm having a major issue related to this with our Zimbra server which was running great in a production environment until recently.

            – Lopsided
            Mar 21 '14 at 14:45













            @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40





            @Lopsided: Sorry for the delay, I'm not here often. Briefly: you'll have to profile your Java process and find out why it's hanging. Garbage collection is one area I've had issues (and successes) in tuning. Look up JVM garbage collection ergodymics and see oracle.com/technetwork/java/javase/gc-tuning-6-140523.html I found increasing heap helped markedly.

            – Dr. Edward Morbius
            Apr 26 '14 at 10:40













            6














            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer




















            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35















            6














            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer




















            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35













            6












            6








            6







            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....






            share|improve this answer















            sudo sysctl -w vm.dirty_ratio=10
            sudo sysctl -w vm.dirty_background_ratio=5


            Then commit the change with:



            sudo sysctl -p


            solved it for me....







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 21 at 19:32









            Glutanimate

            1034




            1034










            answered Feb 21 '16 at 11:48









            NickNick

            6111




            6111







            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35












            • 6





              You should explain what each those settings do.

              – kasperd
              Feb 21 '16 at 16:36






            • 5





              This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

              – Peter M
              Feb 29 '16 at 16:35







            6




            6





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36





            You should explain what each those settings do.

            – kasperd
            Feb 21 '16 at 16:36




            5




            5





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35





            This fixed a similar issue I was having in a docker environment. I found an explanation here: blackmoreops.com/2014/09/22/…. "By default Linux uses up to 40% of the available memory for file system caching. After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous. For flushing out this data to disk this there is a time limit of 120 seconds by default. In the case here the IO subsystem is not fast enough to flush the data withing..."

            – Peter M
            Feb 29 '16 at 16:35











            2














            I recently went through this error in one of our Production clusters:




            Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
            more than 120 seconds.



            Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



            Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




            ..



            On further verification of the sar logs Found the IO wait was increased during the same time.



            And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




            11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
            RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
            devId:3 devFlags=f1482005 iocLogInfo:31140000



            11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
            devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
            process devId=x




            So this was due to hardware error, in our cluster.



            So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



            Regards,
            VT






            share|improve this answer



























              2














              I recently went through this error in one of our Production clusters:




              Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
              more than 120 seconds.



              Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



              Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




              ..



              On further verification of the sar logs Found the IO wait was increased during the same time.



              And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




              11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
              RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
              devId:3 devFlags=f1482005 iocLogInfo:31140000



              11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
              devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
              process devId=x




              So this was due to hardware error, in our cluster.



              So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



              Regards,
              VT






              share|improve this answer

























                2












                2








                2







                I recently went through this error in one of our Production clusters:




                Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
                more than 120 seconds.



                Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



                Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




                ..



                On further verification of the sar logs Found the IO wait was increased during the same time.



                And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




                11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
                RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
                devId:3 devFlags=f1482005 iocLogInfo:31140000



                11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
                devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
                process devId=x




                So this was due to hardware error, in our cluster.



                So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



                Regards,
                VT






                share|improve this answer













                I recently went through this error in one of our Production clusters:




                Nov 11 14:56:41 xxx kernel: INFO: task xfsalloc/3:2393 blocked for
                more than 120 seconds.



                Nov 11 14:56:41 Xxxx kernel: Not tainted 2.6.32-504.8.1.el6.x86_64 #1



                Nov 11 14:56:41 xxx: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.




                ..



                On further verification of the sar logs Found the IO wait was increased during the same time.



                And upon checking the Hardware (Physical Disks) saw medium errors and other SCSI Errors had logged on one the Physical Disks, which in turn was blocking the IOs, due to lack of resources to allocate.




                11/11/15 19:52:40: terminatated pRdm 607b8000 flags=0 TimeOutC=0
                RetryC=0 Request c1173100 Reply 60e06040 iocStatus 0048 retryC 0
                devId:3 devFlags=f1482005 iocLogInfo:31140000



                11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in process
                devId=x 11/11/15 19:52:40: DM_ProcessDevWaitQueue: Task mgmt in
                process devId=x




                So this was due to hardware error, in our cluster.



                So it would be good, if you could check for core file and also if ipmi utility is there, check for ipmiutil/ipmitool sel elist command to check for the issue.



                Regards,
                VT







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 12 '15 at 15:27









                Varun ThomasVarun Thomas

                211




                211



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Server Fault!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f405210%2fcan-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                    Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                    Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020