Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
  • To: Rob Gardner <rwg AT hep.uchicago.edu>
  • Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
  • Subject: Re: [Atlas-connect-l] Removal of inactive sites from CycleServer, AtlasConnect Exerciser
  • Date: Thu, 23 Jan 2014 17:28:47 -0800

Hi Rob,

I could submit a cron job which runs every hour and I didn't give any condition as to where the jobs go ... Which is the easiest and best way to monitor?



Other thing:
============
 If I would like to know which are the slots available in sites like MWT2 and AGLT2, How do I see?

I tried to see the status using condor 


 condor_status command doesn't give me anything and condor_status_all is giving me some info (Below). But this command doesn't give me number of pools available  or If apart from Fresno, If I would like to force my jobs to either MWT2/AGLT2, How do I do?



Any suggestion is appreciated.


[hbawa@login testjobs]$ condor_status_all
Summary of available resources for all available HTCondor pools.
                     Total Owner Claimed Unclaimed Matched Preempting Backfill
LOCAL POOL:
                       504     0     221       283       0          0        0
Error: communication error
CEDAR:6001:Failed to connect to <128.135.119.232:9618>
Error: Couldn't contact the condor_collector on appcloud.uchicago.edu 

Extra Info: the condor_collector is a process that runs on the central 
manager of your Condor pool and collects the status of all the machines and 
jobs in the Condor pool. The condor_collector might not be running, it might 
be refusing to communicate with you, there might be a network problem, or 
there may be some other problem. Check with your system administrator to fix 
this problem. 

If you are the system administrator, check that the condor_collector is 
running on appcloud.uchicago.edu (<128.135.119.232:9618>), check the 
ALLOW/DENY configuration in your condor_config, and check the MasterLog and 
CollectorLog files in your log directory for possible clues as to why the 
condor_collector is not responding. Also see the Troubleshooting section of 
the manual. 
                       299     0     106       193       0          0        0
                        96     0       0        96       0          0        0
                        31     0       0        31       0          0        0





Harinder


On Thu, Jan 23, 2014 at 9:54 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Harinder,

Yes, but expanding your test to send a job to MWT2 and AGLT2 (and later, other sites).  This is testing not only the site, but there services in between getting the jobs to the site. Of course, we don’t want to be excessive in the testing, so a little judgement is needed.  

- Rob

On Jan 23, 2014, at 11:43 AM, Dr. Harinder Singh Bawa <harinder.singh.bawa AT gmail.com> wrote:

Hi Rob,

Could you please correct me if I didn't understood your suggestion:

What I am doing is now to send cron job every 1 hour from "atlasconnect" forcing it to run on "csufresno tier-3" in order to test Fresno-t3 site. Is that you are looking for ?


Harinder


On Thu, Jan 23, 2014 at 6:47 AM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Hi Dave,

Can I request that on http://rccf.usatlas.org:8000/home.s  we list only channels where there is on-going work.   We’ve not heard anything from the Argonne sites or UC Irvine for a long time, so lets remove them for the time being.

Status of others:

TACC - okay just getting started. I am going to try a user-based Bosco submission later today.   Later, will need to get with Peter to gather his Parrot magic for ATLAS jobs.  Also - we’ll need to setup a squid service at TACC at some point.
WT2 - ?
SWT2 - ?


Harinder: one thing we need is a little functional test that sends lightweight “site check” jobs through each channel on a period basis, say once per hour, and then report this into a site status board, of some sort.  Sort of an AtlasConnect Exerciser.   A very simple script run out of cron which submitted 5 minute tutorial-like jobs through each AtlasConnect channel would suffice as a start (using regular expressions in the Condor ClassAd to select specific resources).

Thanks,

- Rob




<screenshot_1077.png>


---
Rob Gardner • Twitter: @rwg  Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo


---
Rob Gardner • Twitter: @rwg  Skype: rwg773 • g+: rob.rwg • +1 312-804-0859 • University of Chicago




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo




Archive powered by MHonArc 2.6.24.

Top of Page