torque reports error when posting job to client nodesNon-exclusive job scheduling in PBS/Torquetorque pbs 4.0.1 job stays queued ('Q') state; the scheduler seems not receiving any notificationI get the error qsub: Bad UID for job execution when trying to submit a job via PBSEmail notifications per JOB ARRAY not per job in PBS torqueHow can we configure torque with multiple nodes for a workstation?Job submitted to Torque does not generate error/log fileRunning tensorflow code in torque job

STM Microcontroller burns every time

How much will studying magic in an academy cost?

How long would it take to cross the Channel in 1890's?

Suggested order for Amazon Prime Doctor Who series

How do I set an alias to a terminal line?

Inverse-quotes-quine

What is the legal status of travelling with methadone in your carry-on?

Links to webpages in books

How was Hillel permitted to go to the skylight to hear the shiur

How to make clear to people I don't want to answer their "Where are you from?" question?

Should my manager be aware of private LinkedIn approaches I receive? How to politely have this happen?

What's currently blocking the construction of the wall between Mexico and the US?

Intuition for capacitors in series

Should I prioritize my 401(k) over my student loans?

Graphical representation of connection of people

What reason would an alien civilization have for building a Dyson Sphere (or Swarm) if cheap Nuclear fusion is available?

Is it damaging to turn off a small fridge for two days every week?

Why cruise at 7000' in an A319?

What is the mechanical difference between the Spectator's Create Food and Water action and the Banshee's Undead Nature Trait?

Iterate MapThread with matrices

Why do textbooks often include the solutions to odd or even numbered problems but not both?

Unusual mail headers, evidence of an attempted attack. Have I been pwned?

Is there a maximum distance from a planet that a moon can orbit?

Going to get married soon, should I do it on Dec 31 or Jan 1?



torque reports error when posting job to client nodes


Non-exclusive job scheduling in PBS/Torquetorque pbs 4.0.1 job stays queued ('Q') state; the scheduler seems not receiving any notificationI get the error qsub: Bad UID for job execution when trying to submit a job via PBSEmail notifications per JOB ARRAY not per job in PBS torqueHow can we configure torque with multiple nodes for a workstation?Job submitted to Torque does not generate error/log fileRunning tensorflow code in torque job






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















The system has two machines, one (called macondo02) runs pbs_server and pbs_schedule, another (called macondo01) runs pbs_mom. I have ensured that the host can clearly identify the existance of the guest:



$ pbsnodes -a
macondo01
state = free
np = 64
ntype = cluster
status = rectime=1403183300,varattr=,jobs=,state=free,netload=1102560564743,gres=,loadave=0.00,ncpus=64,physmem=131988228kb,availmem=263457400kb,totmem=266160896kb,idletime=705,nusers=6,nsessions=17,sessions=2817 59201 59937 18341 21924 27356 30089 31663 32133 32934 34374 7341 42678 58843 59605 59606 59741,uname=Linux macondo01 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64,opsys=linux


However, whenever I submit a job through qsub, the job didn't run, and I got error message in the PBS_server log.



06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command new
06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Modified at request of Scheduler@macondo02.uq.edu.au
06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Run at request of Scheduler@macondo02.uq.edu.au
06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command recyc
06/19/2014 23:00:20;0010;PBS_Server;Job;54.macondo02.uq.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=7680kb resources_used.vmem=23876kb resources_used.walltime=00:00:01
06/19/2014 23:00:24;000d;PBS_Server;Job;54.macondo02.uq.edu.au;Post job file processing error; job 54.macondo02.uq.edu.au on host macondo01/0
06/19/2014 23:00:24;0100;PBS_Server;Job;54.macondo02.uq.edu.au;dequeuing from batch, state COMPLETE
06/19/2014 23:00:24;0040;PBS_Server;Svr;macondo02.uq.edu.au;Scheduler was sent the command term


Apparently the failure comes from posting job from the host(ie macondo02) to the guest (ie macondo01).



I have serveral idea in my mind:
1. I know it is necessary to establish a seamless shh between the host and guest using NFS. I have done that to MY OWN NORMAL user, and use this user to submit the qsub job. while error still occurs.
2. in the error file I saw another user called Scheduler@macondo02.uq.edu.au however I can neither find any info about this usr on cat /etc/groups, nor give seamless right to visit macondo01.



any suggestions would be appreciated!










share|improve this question




























    0















    The system has two machines, one (called macondo02) runs pbs_server and pbs_schedule, another (called macondo01) runs pbs_mom. I have ensured that the host can clearly identify the existance of the guest:



    $ pbsnodes -a
    macondo01
    state = free
    np = 64
    ntype = cluster
    status = rectime=1403183300,varattr=,jobs=,state=free,netload=1102560564743,gres=,loadave=0.00,ncpus=64,physmem=131988228kb,availmem=263457400kb,totmem=266160896kb,idletime=705,nusers=6,nsessions=17,sessions=2817 59201 59937 18341 21924 27356 30089 31663 32133 32934 34374 7341 42678 58843 59605 59606 59741,uname=Linux macondo01 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64,opsys=linux


    However, whenever I submit a job through qsub, the job didn't run, and I got error message in the PBS_server log.



    06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command new
    06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Modified at request of Scheduler@macondo02.uq.edu.au
    06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Run at request of Scheduler@macondo02.uq.edu.au
    06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command recyc
    06/19/2014 23:00:20;0010;PBS_Server;Job;54.macondo02.uq.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=7680kb resources_used.vmem=23876kb resources_used.walltime=00:00:01
    06/19/2014 23:00:24;000d;PBS_Server;Job;54.macondo02.uq.edu.au;Post job file processing error; job 54.macondo02.uq.edu.au on host macondo01/0
    06/19/2014 23:00:24;0100;PBS_Server;Job;54.macondo02.uq.edu.au;dequeuing from batch, state COMPLETE
    06/19/2014 23:00:24;0040;PBS_Server;Svr;macondo02.uq.edu.au;Scheduler was sent the command term


    Apparently the failure comes from posting job from the host(ie macondo02) to the guest (ie macondo01).



    I have serveral idea in my mind:
    1. I know it is necessary to establish a seamless shh between the host and guest using NFS. I have done that to MY OWN NORMAL user, and use this user to submit the qsub job. while error still occurs.
    2. in the error file I saw another user called Scheduler@macondo02.uq.edu.au however I can neither find any info about this usr on cat /etc/groups, nor give seamless right to visit macondo01.



    any suggestions would be appreciated!










    share|improve this question
























      0












      0








      0








      The system has two machines, one (called macondo02) runs pbs_server and pbs_schedule, another (called macondo01) runs pbs_mom. I have ensured that the host can clearly identify the existance of the guest:



      $ pbsnodes -a
      macondo01
      state = free
      np = 64
      ntype = cluster
      status = rectime=1403183300,varattr=,jobs=,state=free,netload=1102560564743,gres=,loadave=0.00,ncpus=64,physmem=131988228kb,availmem=263457400kb,totmem=266160896kb,idletime=705,nusers=6,nsessions=17,sessions=2817 59201 59937 18341 21924 27356 30089 31663 32133 32934 34374 7341 42678 58843 59605 59606 59741,uname=Linux macondo01 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64,opsys=linux


      However, whenever I submit a job through qsub, the job didn't run, and I got error message in the PBS_server log.



      06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command new
      06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Modified at request of Scheduler@macondo02.uq.edu.au
      06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Run at request of Scheduler@macondo02.uq.edu.au
      06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command recyc
      06/19/2014 23:00:20;0010;PBS_Server;Job;54.macondo02.uq.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=7680kb resources_used.vmem=23876kb resources_used.walltime=00:00:01
      06/19/2014 23:00:24;000d;PBS_Server;Job;54.macondo02.uq.edu.au;Post job file processing error; job 54.macondo02.uq.edu.au on host macondo01/0
      06/19/2014 23:00:24;0100;PBS_Server;Job;54.macondo02.uq.edu.au;dequeuing from batch, state COMPLETE
      06/19/2014 23:00:24;0040;PBS_Server;Svr;macondo02.uq.edu.au;Scheduler was sent the command term


      Apparently the failure comes from posting job from the host(ie macondo02) to the guest (ie macondo01).



      I have serveral idea in my mind:
      1. I know it is necessary to establish a seamless shh between the host and guest using NFS. I have done that to MY OWN NORMAL user, and use this user to submit the qsub job. while error still occurs.
      2. in the error file I saw another user called Scheduler@macondo02.uq.edu.au however I can neither find any info about this usr on cat /etc/groups, nor give seamless right to visit macondo01.



      any suggestions would be appreciated!










      share|improve this question














      The system has two machines, one (called macondo02) runs pbs_server and pbs_schedule, another (called macondo01) runs pbs_mom. I have ensured that the host can clearly identify the existance of the guest:



      $ pbsnodes -a
      macondo01
      state = free
      np = 64
      ntype = cluster
      status = rectime=1403183300,varattr=,jobs=,state=free,netload=1102560564743,gres=,loadave=0.00,ncpus=64,physmem=131988228kb,availmem=263457400kb,totmem=266160896kb,idletime=705,nusers=6,nsessions=17,sessions=2817 59201 59937 18341 21924 27356 30089 31663 32133 32934 34374 7341 42678 58843 59605 59606 59741,uname=Linux macondo01 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64,opsys=linux


      However, whenever I submit a job through qsub, the job didn't run, and I got error message in the PBS_server log.



      06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command new
      06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Modified at request of Scheduler@macondo02.uq.edu.au
      06/19/2014 23:00:19;0008;PBS_Server;Job;54.macondo02.edu.au;Job Run at request of Scheduler@macondo02.uq.edu.au
      06/19/2014 23:00:19;0040;PBS_Server;Svr;macondo02.edu.au;Scheduler was sent the command recyc
      06/19/2014 23:00:20;0010;PBS_Server;Job;54.macondo02.uq.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=7680kb resources_used.vmem=23876kb resources_used.walltime=00:00:01
      06/19/2014 23:00:24;000d;PBS_Server;Job;54.macondo02.uq.edu.au;Post job file processing error; job 54.macondo02.uq.edu.au on host macondo01/0
      06/19/2014 23:00:24;0100;PBS_Server;Job;54.macondo02.uq.edu.au;dequeuing from batch, state COMPLETE
      06/19/2014 23:00:24;0040;PBS_Server;Svr;macondo02.uq.edu.au;Scheduler was sent the command term


      Apparently the failure comes from posting job from the host(ie macondo02) to the guest (ie macondo01).



      I have serveral idea in my mind:
      1. I know it is necessary to establish a seamless shh between the host and guest using NFS. I have done that to MY OWN NORMAL user, and use this user to submit the qsub job. while error still occurs.
      2. in the error file I saw another user called Scheduler@macondo02.uq.edu.au however I can neither find any info about this usr on cat /etc/groups, nor give seamless right to visit macondo01.



      any suggestions would be appreciated!







      torque pbs






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jun 19 '14 at 13:26









      Chenming ZhangChenming Zhang

      1015 bronze badges




      1015 bronze badges




















          1 Answer
          1






          active

          oldest

          votes


















          0














          Try checking /var/log/syslog or PBS logfiles on the machine where was the job running, which was host macondo01.



          You're looking for something like this, probably error while copying job's logfile:



          pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool...


          You can find the actual log from that run in /var/spool/torque/undelivered/.



          The problem might be with PBS_SCP command which requires passwordless ssh access to machine, typically it uses command like this:

          $PBS_SCP -rpB <path to source> <user>@<destination.host>:<path to destination>






          share|improve this answer



























            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "2"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f606439%2ftorque-reports-error-when-posting-job-to-client-nodes%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Try checking /var/log/syslog or PBS logfiles on the machine where was the job running, which was host macondo01.



            You're looking for something like this, probably error while copying job's logfile:



            pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool...


            You can find the actual log from that run in /var/spool/torque/undelivered/.



            The problem might be with PBS_SCP command which requires passwordless ssh access to machine, typically it uses command like this:

            $PBS_SCP -rpB <path to source> <user>@<destination.host>:<path to destination>






            share|improve this answer





























              0














              Try checking /var/log/syslog or PBS logfiles on the machine where was the job running, which was host macondo01.



              You're looking for something like this, probably error while copying job's logfile:



              pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool...


              You can find the actual log from that run in /var/spool/torque/undelivered/.



              The problem might be with PBS_SCP command which requires passwordless ssh access to machine, typically it uses command like this:

              $PBS_SCP -rpB <path to source> <user>@<destination.host>:<path to destination>






              share|improve this answer



























                0












                0








                0







                Try checking /var/log/syslog or PBS logfiles on the machine where was the job running, which was host macondo01.



                You're looking for something like this, probably error while copying job's logfile:



                pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool...


                You can find the actual log from that run in /var/spool/torque/undelivered/.



                The problem might be with PBS_SCP command which requires passwordless ssh access to machine, typically it uses command like this:

                $PBS_SCP -rpB <path to source> <user>@<destination.host>:<path to destination>






                share|improve this answer















                Try checking /var/log/syslog or PBS logfiles on the machine where was the job running, which was host macondo01.



                You're looking for something like this, probably error while copying job's logfile:



                pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB /var/spool/torque/spool...


                You can find the actual log from that run in /var/spool/torque/undelivered/.



                The problem might be with PBS_SCP command which requires passwordless ssh access to machine, typically it uses command like this:

                $PBS_SCP -rpB <path to source> <user>@<destination.host>:<path to destination>







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 22 '15 at 15:56

























                answered Apr 22 '15 at 14:42









                TombartTombart

                1,1782 gold badges17 silver badges36 bronze badges




                1,1782 gold badges17 silver badges36 bronze badges



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Server Fault!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f606439%2ftorque-reports-error-when-posting-job-to-client-nodes%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                    Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                    What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company