atlas-connect-l AT lists.bnl.gov
Subject: Atlas-connect-l mailing list
List archive
- From: Matthew Epland <matthew.epland AT cern.ch>
- To: Lincoln Bryant <lincolnb AT uchicago.edu>
- Cc: "atlas-connect-l AT lists.bnl.gov" <atlas-connect-l AT lists.bnl.gov>
- Subject: Re: [Atlas-connect-l] login.usatlas.org down?
- Date: Sun, 13 Jan 2019 11:25:16 -0500
Hi guys,
It looks like something broke with the condor system again this morning, everything went off line around 8AM CST according to grafana. I can't connect to the cluster when running condor_q.
Yesterday the login machine was acting strange as well. It hung when trying to source ~/.bashrc on log in, and I couldn't rsync or scp files from remote machines - though I could rsync them out from a session on login. Both issues appear to have sorted themselves out however.
Thanks,
Matt Epland
On Mon, Jan 7, 2019 at 11:36 AM Lincoln Bryant <lincolnb AT uchicago.edu> wrote:
Ah, no that's me restarting Condor.
Anyhow, I found a firewall issue that is now resolved.
Can you try submitting again?
Thanks,
Lincoln
On Mon, 2019-01-07 at 17:33 +0100, Matt LeBlanc wrote:
> Hi Lincoln,
>
> It looks like I may have broken the condor daemon?
>
> login:~ mleblanc$ condor_q
> Error:
>
> Extra Info: You probably saw this error because the condor_schedd is
> not
> running on the machine you are trying to query. If the condor_schedd
> is not
> running, the Condor system will not be able to find an address and
> port to
> connect to and satisfy this request. Please make sure the Condor
> daemons are
> running and try again.
>
> Extra Info: If the condor_schedd is running on the machine you are
> trying to
> query and you still see the error, the most likely cause is that you
> have
> setup a personal Condor, you have not defined SCHEDD_NAME in your
> condor_config file, and something is wrong with your
> SCHEDD_ADDRESS_FILE
> setting. You must define either or both of those settings in your
> config
> file, or you must use the -name option to condor_q. Please see the
> Condor
> manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.
>
> Cheers,
> Matt
>
> On Mon, Jan 7, 2019 at 5:10 PM Matt LeBlanc <matt.leblanc AT cern.ch>
> wrote:
> > Hi Lincoln,
> >
> > No problem! I'll empty it, and start a few at a time in a little
> > while from now.
> >
> > Cheers,
> > Matt
> >
> > On Mon, Jan 7, 2019 at 5:06 PM Lincoln Bryant <lincolnb AT uchicago.ed
> > u> wrote:
> > > Matt,
> > >
> > > I found at least one problem that is fixed now. However there are
> > > a
> > > _lot_ of jobs in queue (70k). Any way you could temporarily
> > > remove some
> > > of the queued jobs and reduce it to say <20k or <10k? I think our
> > > glidein system is timing out trying to query the condor queue for
> > > your
> > > jobs.
> > >
> > > Thanks,
> > > Lincoln
> > >
> > > On Mon, 2019-01-07 at 14:21 +0000, Lincoln Bryant wrote:
> > > > Hi Matt,
> > > >
> > > > Will take a look.
> > > >
> > > > --Lincoln
> > > >
> > > > On 1/7/2019 4:24 AM, Matt LeBlanc wrote:
> > > > > Hi Lincoln,
> > > > >
> > > > > All of my open sessions crashed a few minutes ago similarly
> > > to how
> > > > > they broke on Saturday. I am able to log in already, though
> > > my
> > > > > condor jobs appear to be stuck idle in the queue.
> > > > >
> > > > > Cheers,
> > > > > Matt
> > > > >
> > > > > On Sat, Jan 5, 2019 at 8:08 AM Lincoln Bryant <lincolnb@uchic
> > > ago.ed
> > > > > u> wrote:
> > > > > > On 1/4/2019 4:36 PM, Matthew Epland wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > Is login.usatlas.org down? My ssh connection just broke
> > > and I
> > > > > > can not reconnect.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Matt
> > > > > > >
> > > > > > Hi Matt,
> > > > > >
> > > > > > We had a hypervisor issue at UChicago. You should be able
> > > to
> > > > > > login
> > > > > > again. I am working on restoring Condor services now.
> > > > > >
> > > > > > --Lincoln
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > Atlas-connect-l mailing list
> > > > > > Atlas-connect-l AT lists.bnl.gov
> > > > > > https://lists.bnl.gov/mailman/listinfo/atlas-connect-l
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Atlas-connect-l mailing list
> > > > > Atlas-connect-l AT lists.bnl.gov
> > > > > https://lists.bnl.gov/mailman/listinfo/atlas-connect-l
> > > > > _______________________________________________
> > > > > ATLAS Midwest Tier2 mailing list
> > > > > http://mwt2.usatlasfacility.org
> > >
> >
> >
> > --
> > Matt LeBlanc
> > University of Arizona
> > Office: 40/1-C11 (CERN)
> > https://cern.ch/mleblanc/
> >
>
>
Matthew Epland
Duke University Department of Physics
Duke University Department of Physics
-
[Atlas-connect-l] login.usatlas.org down?,
Matthew Epland, 01/04/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/04/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Lincoln Bryant, 01/05/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Lincoln Bryant, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Lincoln Bryant, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/07/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matthew Epland, 01/13/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/13/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matt LeBlanc, 01/13/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matthew Epland, 01/13/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matt LeBlanc, 01/14/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matt LeBlanc, 01/14/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/14/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matthew Epland, 01/21/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Matt LeBlanc, 01/21/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/21/2019
- Re: [Atlas-connect-l] login.usatlas.org down?, Lincoln Bryant, 01/21/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Lincoln Bryant, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Lincoln Bryant, 01/07/2019
-
Re: [Atlas-connect-l] login.usatlas.org down?,
Matt LeBlanc, 01/07/2019
Archive powered by MHonArc 2.6.24.