atlas-connect-l AT lists.bnl.gov
Subject: Atlas-connect-l mailing list
List archive
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
- From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
- To: Rob Gardner <rwg AT hep.uchicago.edu>
- Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
- Subject: Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
- Date: Thu, 23 Jan 2014 17:28:47 -0800
Hi Rob,
I could submit a cron job which runs every hour and I didn't give any condition as to where the jobs go ... Which is the easiest and best way to monitor?
Other thing:
============
If I would like to know which are the slots available in sites like MWT2 and AGLT2, How do I see?
I tried to see the status using condor
condor_status command doesn't give me anything and condor_status_all is giving me some info (Below). But this command doesn't give me number of pools available or If apart from Fresno, If I would like to force my jobs to either MWT2/AGLT2, How do I do?
Any suggestion is appreciated.
[hbawa@login testjobs]$ condor_status_all
Summary of available resources for all available HTCondor pools.
Total Owner Claimed Unclaimed Matched Preempting Backfill
LOCAL POOL:
POOL uc3-mgt.mwt2.org:
504 0 221 283 0 0 0
POOL appcloud.uchicago.edu:
Error: communication error
CEDAR:6001:Failed to connect to <128.135.119.232:9618>
Error: Couldn't contact the condor_collector on appcloud.uchicago.edu
(<128.135.119.232:9618>).
Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines and
jobs in the Condor pool. The condor_collector might not be running, it might
be refusing to communicate with you, there might be a network problem, or
there may be some other problem. Check with your system administrator to fix
this problem.
If you are the system administrator, check that the condor_collector is
running on appcloud.uchicago.edu (<128.135.119.232:9618>), check the
ALLOW/DENY configuration in your condor_config, and check the MasterLog and
CollectorLog files in your log directory for possible clues as to why the
condor_collector is not responding. Also see the Troubleshooting section of
the manual.
299 0 106 193 0 0 0
96 0 0 96 0 0 0
31 0 0 31 0 0 0
Harinder
On Thu, Jan 23, 2014 at 9:54 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Harinder,Yes, but expanding your test to send a job to MWT2 and AGLT2 (and later, other sites). This is testing not only the site, but there services in between getting the jobs to the site. Of course, we don’t want to be excessive in the testing, so a little judgement is needed.- RobOn Jan 23, 2014, at 11:43 AM, Dr. Harinder Singh Bawa <harinder.singh.bawa AT gmail.com> wrote:Hi Rob,Could you please correct me if I didn't understood your suggestion:What I am doing is now to send cron job every 1 hour from "atlasconnect" forcing it to run on "csufresno tier-3" in order to test Fresno-t3 site. Is that you are looking for ?HarinderOn Thu, Jan 23, 2014 at 6:47 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Dave,Can I request that on http://rccf.usatlas.org:8000/home.s we list only channels where there is on-going work. We’ve not heard anything from the Argonne sites or UC Irvine for a long time, so lets remove them for the time being.Status of others:TACC - okay just getting started. I am going to try a user-based Bosco submission later today. Later, will need to get with Peter to gather his Parrot magic for ATLAS jobs. Also - we’ll need to setup a squid service at TACC at some point.WT2 - ?SWT2 - ?Harinder: one thing we need is a little functional test that sends lightweight “site check” jobs through each channel on a period basis, say once per hour, and then report this into a site status board, of some sort. Sort of an AtlasConnect Exerciser. A very simple script run out of cron which submitted 5 minute tutorial-like jobs through each AtlasConnect channel would suffice as a start (using regular expressions in the Condor ClassAd to select specific resources).Thanks,- Rob<screenshot_1077.png>---Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago---Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago
-
[Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/23/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, Lincoln Bryant, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/24/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, Rob Gardner, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/24/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, David Lesny, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/23/2014
Archive powered by MHonArc 2.6.24.