Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: David Lesny <ddl AT illinois.edu>
  • To: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>, Rob Gardner <rwg AT hep.uchicago.edu>
  • Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
  • Subject: Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
  • Date: Fri, 24 Jan 2014 08:58:46 -0600

Harinder,

There is a way to send a job to a specific RCC Factory. This is done by using a "Requirement" in the condor_submit command file

Requirements = ( IS_RCC_MWT2 )
Slide 15
Requirements = ( IS_RCC_AGLT2 )
Requirements = ( IS_RCC_FRESNOSTATE )

If any one of the above requirements is used, it will restrict the job to that particular Factory (ie site)


There are also a few environment variables defined for you when the jobs runs at a site




$IS_RCC=True
•$IS_RCC_<factory>=True
$_RCC_Factory=<factory>
•$_RCC_Port=<RCC Factory Port>
•$_RCC_MaxIdleGlideins=nnn
•$_RCC_IterationTime=<minutes> •$_RCC_MaxQueuedJobs=nnn •$_RCC_MaxRunningJobs=nnn •$_RCC_BoscoVersion=<bosco version>

Most of these are internal and not something a person would need.

Others could be useful to use within a job, for example...

case $_RCC_Factory in

  (MWT2)        echo "Running at Midwest Tier 2"     ;;
  (AGLT2)       echo "Running at Great Lakes Tier 2" ;;
  (FRESNOSTATE) echo "Running at Fresno State"       ;;
  (*)           echo "Unknown site"                  ;;

esac



Slide 12
Slide 11 Rob, do we have a twiki started for AtlasConnect specific or some other place we can spell out these features?
I have some of this in the generic RCC twiki, but this information can get lost and a bit confusing if done generically.




thanks, dave



Slide 12
On 1/23/2014 7:28 PM, Dr. Harinder Singh Bawa wrote:
Hi Rob,

I could submit a cron job which runs every hour and I didn't give any condition as to where the jobs go ... Which is the easiest and best way to monitor?



Other thing:
============
 If I would like to know which are the slots available in sites like MWT2 and AGLT2, How do I see?

I tried to see the status using condor 


 condor_status command doesn't give me anything and condor_status_all is giving me some info (Below). But this command doesn't give me number of pools available  or If apart from Fresno, If I would like to force my jobs to either MWT2/AGLT2, How do I do?



Any suggestion is appreciated.


[hbawa@login testjobs]$ condor_status_all
Summary of available resources for all available HTCondor pools.
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
LOCAL POOL:
POOL uc3-mgt.mwt2.org:
                       504     0     221       283       0          0        0
POOL appcloud.uchicago.edu:
Error: communication error
CEDAR:6001:Failed to connect to <128.135.119.232:9618>
Error: Couldn't contact the condor_collector on appcloud.uchicago.edu 
(<128.135.119.232:9618>). 

Extra Info: the condor_collector is a process that runs on the central 
manager of your Condor pool and collects the status of all the machines and 
jobs in the Condor pool. The condor_collector might not be running, it might 
be refusing to communicate with you, there might be a network problem, or 
there may be some other problem. Check with your system administrator to fix 
this problem. 

If you are the system administrator, check that the condor_collector is 
running on appcloud.uchicago.edu (<128.135.119.232:9618>), check the 
ALLOW/DENY configuration in your condor_config, and check the MasterLog and 
CollectorLog files in your log directory for possible clues as to why the 
condor_collector is not responding. Also see the Troubleshooting section of 
the manual. 
POOL uct2-bosco.uchicago.edu:11120?sock=collector:
                       299     0     106       193       0          0        0
POOL uct2-bosco.uchicago.edu:11018?sock=collector:
                        96     0       0        96       0          0        0
POOL uct2-bosco.uchicago.edu:11121?sock=collector:
                        31     0       0        31       0          0        0





Harinder


On Thu, Jan 23, 2014 at 9:54 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Harinder,

Yes, but expanding your test to send a job to MWT2 and AGLT2 (and later, other sites).  This is testing not only the site, but there services in between getting the jobs to the site. Of course, we don’t want to be excessive in the testing, so a little judgement is needed.  

- Rob

On Jan 23, 2014, at 11:43 AM, Dr. Harinder Singh Bawa <harinder.singh.bawa AT gmail.com> wrote:

Hi Rob,

Could you please correct me if I didn't understood your suggestion:

What I am doing is now to send cron job every 1 hour from "atlasconnect" forcing it to run on "csufresno tier-3" in order to test Fresno-t3 site. Is that you are looking for ?


Harinder


On Thu, Jan 23, 2014 at 6:47 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Dave,

Can I request that on http://rccf.usatlas.org:8000/home.s  we list only channels where there is on-going work.   We’ve not heard anything from the Argonne sites or UC Irvine for a long time, so lets remove them for the time being.

Status of others:

TACC - okay just getting started. I am going to try a user-based Bosco submission later today.   Later, will need to get with Peter to gather his Parrot magic for ATLAS jobs.  Also - we’ll need to setup a squid service at TACC at some point.
WT2 - ?
SWT2 - ?


Harinder: one thing we need is a little functional test that sends lightweight “site check” jobs through each channel on a period basis, say once per hour, and then report this into a site status board, of some sort.  Sort of an AtlasConnect Exerciser.   A very simple script run out of cron which submitted 5 minute tutorial-like jobs through each AtlasConnect channel would suffice as a start (using regular expressions in the Condor ClassAd to select specific resources).

Thanks,

- Rob




<screenshot_1077.png>


---
Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]



---
Rob Gardner • Twitter: @rwg • Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]



--
David Lesny

David Lesny

Senior Research Physicist

High Energy Physics
University of Illinois at Urbana-Champaign

Office: 217-333-4972  |  Fax: 217-333-4990

Skype: ddlesny | mwt2-ddlesny




Archive powered by MHonArc 2.6.24.

Top of Page