atlas-connect-l AT lists.bnl.gov
Subject: Atlas-connect-l mailing list
List archive
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
- From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
- To: David Lesny <ddl AT illinois.edu>
- Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
- Subject: Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
- Date: Fri, 24 Jan 2014 09:01:26 -0800
Hi Dave,
Thanks!
I understand that jobs can be sent to specific RCC Factory using a "Requirement" in the condor_submit command file
Requirements = ( IS_RCC_MWT2 )
Requirements = ( IS_RCC_AGLT2 )
Requirements = ( IS_RCC_FRESNOSTATE )
But where to see the list of RCC Factories (Are there only three above or more too ?).
Harinder
On Fri, Jan 24, 2014 at 6:58 AM, David Lesny <ddl AT illinois.edu> wrote:
Harinder,
There is a way to send a job to a specific RCC Factory. This is done by using a "Requirement" in the condor_submit command file
Requirements = ( IS_RCC_MWT2 )Requirements = ( IS_RCC_AGLT2 )
Requirements = ( IS_RCC_FRESNOSTATE )
If any one of the above requirements is used, it will restrict the job to that particular Factory (ie site)
There are also a few environment variables defined for you when the jobs runs at a site
•
Rob, do we have a twiki started for AtlasConnect specific or some other place we can spell out these features?•$IS_RCC=True•$IS_RCC_<factory>=True
•$_RCC_Factory=<factory>•$_RCC_Port=<RCC Factory Port>•$_RCC_MaxIdleGlideins=nnn•$_RCC_IterationTime=<minutes>•$_RCC_MaxQueuedJobs=nnn•$_RCC_MaxRunningJobs=nnn•$_RCC_BoscoVersion=<bosco version>
Most of these are internal and not something a person would need.
Others could be useful to use within a job, for example...
case $_RCC_Factory in
(MWT2) echo "Running at Midwest Tier 2" ;;
(AGLT2) echo "Running at Great Lakes Tier 2" ;;
(FRESNOSTATE) echo "Running at Fresno State" ;;
(*) echo "Unknown site" ;;
esac
I have some of this in the generic RCC twiki, but this information can get lost and a bit confusing if done generically.
thanks, dave
On 1/23/2014 7:28 PM, Dr. Harinder Singh Bawa wrote:
Hi Rob,
I could submit a cron job which runs every hour and I didn't give any condition as to where the jobs go ... Which is the easiest and best way to monitor?
Other thing:============If I would like to know which are the slots available in sites like MWT2 and AGLT2, How do I see?
I tried to see the status using condor
condor_status command doesn't give me anything and condor_status_all is giving me some info (Below). But this command doesn't give me number of pools available or If apart from Fresno, If I would like to force my jobs to either MWT2/AGLT2, How do I do?
Any suggestion is appreciated.
[hbawa@login testjobs]$ condor_status_allSummary of available resources for all available HTCondor pools.Total Owner Claimed Unclaimed Matched Preempting BackfillLOCAL POOL:POOL uc3-mgt.mwt2.org:504 0 221 283 0 0 0POOL appcloud.uchicago.edu:Error: communication errorCEDAR:6001:Failed to connect to <128.135.119.232:9618>Error: Couldn't contact the condor_collector on appcloud.uchicago.edu(<128.135.119.232:9618>).
Extra Info: the condor_collector is a process that runs on the centralmanager of your Condor pool and collects the status of all the machines andjobs in the Condor pool. The condor_collector might not be running, it mightbe refusing to communicate with you, there might be a network problem, orthere may be some other problem. Check with your system administrator to fixthis problem.
If you are the system administrator, check that the condor_collector isrunning on appcloud.uchicago.edu (<128.135.119.232:9618>), check theALLOW/DENY configuration in your condor_config, and check the MasterLog andCollectorLog files in your log directory for possible clues as to why thecondor_collector is not responding. Also see the Troubleshooting section ofthe manual.299 0 106 193 0 0 096 0 0 96 0 0 031 0 0 31 0 0 0
Harinder
On Thu, Jan 23, 2014 at 9:54 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Harinder,
Yes, but expanding your test to send a job to MWT2 and AGLT2 (and later, other sites). This is testing not only the site, but there services in between getting the jobs to the site. Of course, we don’t want to be excessive in the testing, so a little judgement is needed.
- Rob
On Jan 23, 2014, at 11:43 AM, Dr. Harinder Singh Bawa <harinder.singh.bawa AT gmail.com> wrote:
Hi Rob,
Could you please correct me if I didn't understood your suggestion:
What I am doing is now to send cron job every 1 hour from "atlasconnect" forcing it to run on "csufresno tier-3" in order to test Fresno-t3 site. Is that you are looking for ?
Harinder
On Thu, Jan 23, 2014 at 6:47 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Dave,
Can I request that on http://rccf.usatlas.org:8000/home.s we list only channels where there is on-going work. We’ve not heard anything from the Argonne sites or UC Irvine for a long time, so lets remove them for the time being.
Status of others:
TACC - okay just getting started. I am going to try a user-based Bosco submission later today. Later, will need to get with Peter to gather his Parrot magic for ATLAS jobs. Also - we’ll need to setup a squid service at TACC at some point.WT2 - ?SWT2 - ?
Harinder: one thing we need is a little functional test that sends lightweight “site check” jobs through each channel on a period basis, say once per hour, and then report this into a site status board, of some sort. Sort of an AtlasConnect Exerciser. A very simple script run out of cron which submitted 5 minute tutorial-like jobs through each AtlasConnect channel would suffice as a start (using regular expressions in the Condor ClassAd to select specific resources).
Thanks,
- Rob
<screenshot_1077.png>
---Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago
---Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago
--
--
David Lesny
Senior Research Physicist
High Energy Physics
University of Illinois at Urbana-ChampaignOffice: 217-333-4972 | Fax: 217-333-4990
Skype: ddlesny | mwt2-ddlesny
-
[Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/23/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, Lincoln Bryant, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/24/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, Rob Gardner, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/24/2014
- Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser, David Lesny, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/24/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Dr. Harinder Singh Bawa, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
Rob Gardner, 01/23/2014
-
Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser,
David Lesny, 01/23/2014
Archive powered by MHonArc 2.6.24.