-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Description
Impact of the bug
WMAgents
Describe the bug
I've seen this problem twice over the last 24h, so here is an issue to get it fixed.
JobStatusLite crashes when it's getting an instance of the htcondor schedd daemon object:
Unable to locate local daemon
Maybe it's just a coincidence, but it first happened with submit5, now with submit4 (it might be worth it checking with Maria whether there was any condor_schedd outage or so)
How to reproduce it
Not sure
Expected behavior
The component should try to recreate the schedd object, if it fails again, then we should gracefully skip the cycle and try again in the next component execution.
Additional context and error message
Traceback from the logs:
2020-05-19 15:47:55,290:140664759113472:INFO:BossAirAPI:About to look for 1219 loadedJobs.
2020-05-19 15:47:55,315:140664759113472:ERROR:BossAirAPI:Unhandled exception while tracking jobs for plugin SimpleCondorPlugin!
Unable to locate local daemon
2020-05-19 15:47:55,434:140664759113472:ERROR:BaseWorkerThread:Error in worker algorithm (1):
Backtrace:
<WMCore.BossAir.StatusPoller.StatusPoller object at 0x7fef1131cf90> <@========== WMException Start ==========@>
Exception Class: BossAirException
Message: Unhandled exception while tracking jobs for plugin SimpleCondorPlugin!
Unable to locate local daemon
ModuleName : WMCore.BossAir.BossAirAPI
MethodName : track
ClassInstance : None
FileName : /data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py
ClassName : None
LineNumber : 486
ErrorNr : 0
Traceback:
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py", line 473, in track
localRunning, localChanges, localCompletes = pluginInst.track(jobs=jobsToTrack[plugin])
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/Plugins/SimpleCondorPlugin.py", line 216, in track
schedd = htcondor.Schedd()
<@---------- WMException End ----------@> File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/WorkerThreads/BaseWorkerThread.py", line 182, in __call__
tSpent, results, _ = algorithmWithDBExceptionHandler(parameters)
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/Database/DBExceptionHandler.py", line 39, in wrapper
return f(*args, **kwargs)
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/Utils/Timers.py", line 24, in wrapper
res = func(*arg, **kw)
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/StatusPoller.py", line 68, in algorithm
self.checkStatus()
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/StatusPoller.py", line 92, in checkStatus
runningJobs = self.bossAir.track()
File "/data/srv/wmagent/v1.3.0/sw/slc7_amd64_gcc630/cms/wmagent/1.3.0/lib/python2.7/site-packages/WMCore/BossAir/BossAirAPI.py", line 486, in track
raise BossAirException(msg)
2020-05-19 15:47:55,435:140664759113472:INFO:Harness:>>>Terminating worker threads
Metadata
Metadata
Assignees
Type
Projects
Status
Waiting
Status
Waiting