Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
  • To: David Lesny <ddl AT illinois.edu>
  • Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
  • Subject: Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
  • Date: Fri, 24 Jan 2014 09:01:26 -0800

Hi Dave,

Thanks!
I understand that jobs can be sent to specific RCC Factory using a "Requirement" in the condor_submit command file

Requirements = ( IS_RCC_MWT2 )
Requirements = ( IS_RCC_AGLT2 )
Requirements = ( IS_RCC_FRESNOSTATE )

But where to see the list of RCC Factories (Are there only three above or more too ?).

Harinder




On Fri, Jan 24, 2014 at 6:58 AM, David Lesny <ddl AT illinois.edu> wrote:
Harinder,

There is a way to send a job to a specific RCC Factory. This is done by using a "Requirement" in the condor_submit command file

Requirements = ( IS_RCC_MWT2 )
Requirements = ( IS_RCC_AGLT2 )
Requirements = ( IS_RCC_FRESNOSTATE )

If any one of the above requirements is used, it will restrict the job to that particular Factory (ie site)


There are also a few environment variables defined for you when the jobs runs at a site




$IS_RCC=True
$IS_RCC_<factory>=True
$_RCC_Factory=<factory>
$_RCC_Port=<RCC Factory Port>
$_RCC_MaxIdleGlideins=nnn
$_RCC_IterationTime=<minutes>
$_RCC_MaxQueuedJobs=nnn
$_RCC_MaxRunningJobs=nnn
$_RCC_BoscoVersion=<bosco version>

Most of these are internal and not something a person would need.

Others could be useful to use within a job, for example...

case $_RCC_Factory in

  (MWT2)        echo "Running at Midwest Tier 2"     ;;
  (AGLT2)       echo "Running at Great Lakes Tier 2" ;;
  (FRESNOSTATE) echo "Running at Fresno State"       ;;
  (*)           echo "Unknown site"                  ;;

esac




Rob, do we have a twiki started for AtlasConnect specific or some other place we can spell out these features?
I have some of this in the generic RCC twiki, but this information can get lost and a bit confusing if done generically.




thanks, dave




On 1/23/2014 7:28 PM, Dr. Harinder Singh Bawa wrote:
Hi Rob,

I could submit a cron job which runs every hour and I didn't give any condition as to where the jobs go ... Which is the easiest and best way to monitor?



Other thing:
============
 If I would like to know which are the slots available in sites like MWT2 and AGLT2, How do I see?

I tried to see the status using condor 


 condor_status command doesn't give me anything and condor_status_all is giving me some info (Below). But this command doesn't give me number of pools available  or If apart from Fresno, If I would like to force my jobs to either MWT2/AGLT2, How do I do?



Any suggestion is appreciated.


[hbawa@login testjobs]$ condor_status_all
Summary of available resources for all available HTCondor pools.
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
LOCAL POOL:
                       504     0     221       283       0          0        0
Error: communication error
CEDAR:6001:Failed to connect to <128.135.119.232:9618>
Error: Couldn't contact the condor_collector on appcloud.uchicago.edu 

Extra Info: the condor_collector is a process that runs on the central 
manager of your Condor pool and collects the status of all the machines and 
jobs in the Condor pool. The condor_collector might not be running, it might 
be refusing to communicate with you, there might be a network problem, or 
there may be some other problem. Check with your system administrator to fix 
this problem. 

If you are the system administrator, check that the condor_collector is 
running on appcloud.uchicago.edu (<128.135.119.232:9618>), check the 
ALLOW/DENY configuration in your condor_config, and check the MasterLog and 
CollectorLog files in your log directory for possible clues as to why the 
condor_collector is not responding. Also see the Troubleshooting section of 
the manual. 
                       299     0     106       193       0          0        0
                        96     0       0        96       0          0        0
                        31     0       0        31       0          0        0





Harinder


On Thu, Jan 23, 2014 at 9:54 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Harinder,

Yes, but expanding your test to send a job to MWT2 and AGLT2 (and later, other sites).  This is testing not only the site, but there services in between getting the jobs to the site. Of course, we don’t want to be excessive in the testing, so a little judgement is needed.  

- Rob

On Jan 23, 2014, at 11:43 AM, Dr. Harinder Singh Bawa <harinder.singh.bawa AT gmail.com> wrote:

Hi Rob,

Could you please correct me if I didn't understood your suggestion:

What I am doing is now to send cron job every 1 hour from "atlasconnect" forcing it to run on "csufresno tier-3" in order to test Fresno-t3 site. Is that you are looking for ?


Harinder


On Thu, Jan 23, 2014 at 6:47 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Dave,

Can I request that on http://rccf.usatlas.org:8000/home.s  we list only channels where there is on-going work.   We’ve not heard anything from the Argonne sites or UC Irvine for a long time, so lets remove them for the time being.

Status of others:

TACC - okay just getting started. I am going to try a user-based Bosco submission later today.   Later, will need to get with Peter to gather his Parrot magic for ATLAS jobs.  Also - we’ll need to setup a squid service at TACC at some point.
WT2 - ?
SWT2 - ?


Harinder: one thing we need is a little functional test that sends lightweight “site check” jobs through each channel on a period basis, say once per hour, and then report this into a site status board, of some sort.  Sort of an AtlasConnect Exerciser.   A very simple script run out of cron which submitted 5 minute tutorial-like jobs through each AtlasConnect channel would suffice as a start (using regular expressions in the Condor ClassAd to select specific resources).

Thanks,

- Rob




<screenshot_1077.png>


---
Rob Gardner • Twitter: @rwg  Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]



---
Rob Gardner • Twitter: @rwg  Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo


--

David Lesny

Senior Research Physicist

High Energy Physics
University of Illinois at Urbana-Champaign

Office: 217-333-4972  |  Fax: 217-333-4990

Skype: ddlesny | mwt2-ddlesny




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo




Archive powered by MHonArc 2.6.24.

Top of Page