WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

JobStatusLite crashing while tracking jobs - unable to locate daemon #9703

@amaltaro

Description

@amaltaro

Impact of the bug
WMAgents

Describe the bug
I've seen this problem twice over the last 24h, so here is an issue to get it fixed.
JobStatusLite crashes when it's getting an instance of the htcondor schedd daemon object:

Unable to locate local daemon

Maybe it's just a coincidence, but it first happened with submit5, now with submit4 (it might be worth it checking with Maria whether there was any condor_schedd outage or so)

How to reproduce it
Not sure

Expected behavior
The component should try to recreate the schedd object, if it fails again, then we should gracefully skip the cycle and try again in the next component execution.

Additional context and error message
Traceback from the logs:

2020-05-19 15:47:55,290:140664759113472:INFO:BossAirAPI:About to look for 1219 loadedJobs.
2020-05-19 15:47:55,315:140664759113472:ERROR:BossAirAPI:Unhandled exception while tracking jobs for plugin SimpleCondorPlugin!
Unable to locate local daemon
2020-05-19 15:47:55,434:140664759113472:ERROR:BaseWorkerThread:Error in worker algorithm (1):
Backtrace:
  <WMCore.BossAir.StatusPoller.StatusPoller object at 0x7fef1131cf90> <@========== WMException Start ==========@>
Exception Class: BossAirException
Message: Unhandled exception while tracking jobs for plugin SimpleCondorPlugin!
Unable to locate local daemon
        ModuleName : WMCore.BossAir.BossAirAPI
        MethodName : track
        ClassInstance : None
        FileName : /data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py
        ClassName : None
        LineNumber : 486
        ErrorNr : 0

Traceback: 
  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py", line 473, in track
    localRunning, localChanges, localCompletes = pluginInst.track(jobs=jobsToTrack[plugin])

  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/Plugins/SimpleCondorPlugin.py", line 216, in track
    schedd = htcondor.Schedd()

<@---------- WMException End ----------@>  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/WorkerThreads/BaseWorkerThread.py", line 182, in __call__
    tSpent, results, _ = algorithmWithDBExceptionHandler(parameters)

  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/Database/DBExceptionHandler.py", line 39, in wrapper
    return f(*args, **kwargs)
  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/Utils/Timers.py", line 24, in wrapper
    res = func(*arg, **kw)
  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/StatusPoller.py", line 68, in algorithm
    self.checkStatus()
  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/StatusPoller.py", line 92, in checkStatus
    runningJobs = self.bossAir.track()
  File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py", line 486, in track
    raise BossAirException(msg)

2020-05-19 15:47:55,435:140664759113472:INFO:Harness:>>>Terminating worker threads

Metadata

Metadata

Type

No type

Projects

Status

Waiting

Status

Waiting

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions