Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] login.usatlas.org down?

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Matt LeBlanc <matt.leblanc AT cern.ch>
  • To: Lincoln Bryant <lincolnb AT uchicago.edu>
  • Cc: Matthew Epland <matthew.epland AT cern.ch>, "atlas-connect-l AT lists.bnl.gov" <atlas-connect-l AT lists.bnl.gov>
  • Subject: Re: [Atlas-connect-l] login.usatlas.org down?
  • Date: Mon, 7 Jan 2019 17:33:00 +0100

Hi Lincoln,

It looks like I may have broken the condor daemon?

login:~ mleblanc$ condor_q
Error:

Extra Info: You probably saw this error because the condor_schedd is not
running on the machine you are trying to query. If the condor_schedd is not
running, the Condor system will not be able to find an address and port to
connect to and satisfy this request. Please make sure the Condor daemons are
running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to
query and you still see the error, the most likely cause is that you have
setup a personal Condor, you have not defined SCHEDD_NAME in your
condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE
setting. You must define either or both of those settings in your config
file, or you must use the -name option to condor_q. Please see the Condor
manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.

Cheers,
Matt

On Mon, Jan 7, 2019 at 5:10 PM Matt LeBlanc <matt.leblanc AT cern.ch> wrote:
Hi Lincoln,

No problem! I'll empty it, and start a few at a time in a little while from now.

Cheers,
Matt

On Mon, Jan 7, 2019 at 5:06 PM Lincoln Bryant <lincolnb AT uchicago.edu> wrote:
Matt,

I found at least one problem that is fixed now. However there are a
_lot_ of jobs in queue (70k). Any way you could temporarily remove some
of the queued jobs and reduce it to say <20k or <10k? I think our
glidein system is timing out trying to query the condor queue for your
jobs.

Thanks,
Lincoln

On Mon, 2019-01-07 at 14:21 +0000, Lincoln Bryant wrote:
> Hi Matt,
>
> Will take a look.
>
> --Lincoln
>
> On 1/7/2019 4:24 AM, Matt LeBlanc wrote:
> > Hi Lincoln,
> >
> > All of my open sessions crashed a few minutes ago similarly to how
> > they broke on Saturday. I am able to log in already, though my
> > condor jobs appear to be stuck idle in the queue.
> >
> > Cheers,
> > Matt
> >
> > On Sat, Jan 5, 2019 at 8:08 AM Lincoln Bryant <lincolnb AT uchicago.ed
> > u> wrote:
> > > On 1/4/2019 4:36 PM, Matthew Epland wrote:
> > > > Hello,
> > > >
> > > > Is login.usatlas.org down? My ssh connection just broke and I
> > > can not reconnect.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > > Hi Matt,
> > >
> > > We had a hypervisor issue at UChicago. You should be able to
> > > loginĀ 
> > > again. I am working on restoring Condor services now.
> > >
> > > --Lincoln
> > >
> > >
> > > _______________________________________________
> > > Atlas-connect-l mailing list
> > > Atlas-connect-l AT lists.bnl.gov
> > > https://lists.bnl.gov/mailman/listinfo/atlas-connect-l
> > >
> >
> >
> > _______________________________________________
> > Atlas-connect-l mailing list
> > Atlas-connect-l AT lists.bnl.gov
> > https://lists.bnl.gov/mailman/listinfo/atlas-connect-l
> > _______________________________________________
> > ATLAS Midwest Tier2 mailing list
> > http://mwt2.usatlasfacility.org


--
Matt LeBlanc
University of Arizona
Office: 40/1-C11 (CERN)
https://cern.ch/mleblanc/


--
Matt LeBlanc
University of Arizona
Office: 40/1-C11 (CERN)
https://cern.ch/mleblanc/



Archive powered by MHonArc 2.6.24.

Top of Page