Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] 10K job test

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Lincoln Bryant <lincolnb AT uchicago.edu>
  • To: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
  • Cc: atlas-connect-l AT lists.bnl.gov
  • Subject: Re: [Atlas-connect-l] 10K job test
  • Date: Wed, 4 Dec 2013 18:02:49 -0600

Hi Harinder,

1. You can use this command to see where jobs are running on the ATLAS Connect login host:
condor_q -name login.atlas.ci-connect.net -pool uct2-bosco.mwt2.org:11120?sock=collector -run

If any jobs are running on your nodes, you should see that reflected in the HOST(S) column. Note that there is a difference between a condor_q against the ATLAS Connect login node and a condor_q against your local login node, because the HTCondor glideins run for ~30 minutes or so longer than the actual jobs.

Here's an example output:
$ condor_q -name login.atlas.ci-connect.net -pool uct2-bosco.mwt2.org:11120?sock=collector -run
-- Schedd: login.atlas.ci-connect.net : <128.135.158.156:56549?PrivAddr=%3c10.1.5.82:56549%3e&PrivNet=mwt2.org>
 ID      OWNER            SUBMITTED     RUN_TIME HOST(S)
  15.63  ivukotic       12/4  12:41   0+03:11:34 2658 AT iut2-c073.iu.edu
  15.65  ivukotic       12/4  12:41   0+03:11:34 2445 AT iut2-c086.iu.edu
  15.75  ivukotic       12/4  12:41   0+03:11:34 30539 AT iut2-c080.iu.edu
  15.79  ivukotic       12/4  12:41   0+03:11:32 22538 AT iut2-c056.iu.edu
  15.81  ivukotic       12/4  12:41   0+03:11:32 10917 AT iut2-c085.iu.edu
  15.82  ivukotic       12/4  12:41   0+03:11:32 23417 AT iut2-c095.iu.edu
  15.83  ivukotic       12/4  12:41   0+03:11:32 28904 AT iut2-c092.iu.edu
  15.84  ivukotic       12/4  12:41   0+03:11:32 16445 AT iut2-c086.iu.edu
  15.85  ivukotic       12/4  12:41   0+03:11:32 17005 AT iut2-c121.iu.edu
  15.87  ivukotic       12/4  12:41   0+03:11:32 20936 AT iut2-c105.iu.edu
  15.502 ivukotic       12/4  12:41   0+02:45:29 32357 AT iut2-c114.iu.edu
  15.599 ivukotic       12/4  12:41   0+02:38:26 5192 AT iut2-c090.iu.edu
  15.600 ivukotic       12/4  12:41   0+02:38:26 6644 AT iut2-c044.iu.edu
  15.601 ivukotic       12/4  12:41   0+02:38:25 6177 AT iut2-c046.iu.edu
  15.604 ivukotic       12/4  12:41   0+02:38:26 20530 AT iut2-c054.iu.edu
  15.605 ivukotic       12/4  12:41   0+02:38:26 8155 AT iut2-c117.iu.edu
  15.609 ivukotic       12/4  12:41   0+02:38:26 11505 AT iut2-c106.iu.edu
  15.629 ivukotic       12/4  12:41   0+02:38:03 28297 AT iut2-c098.iu.edu
  15.630 ivukotic       12/4  12:41   0+02:38:03 15129 AT iut2-c118.iu.edu

2. We can check to see if the username of the submitter is in the job somewhere, otherwise we can inject this information into the job and write some documentation on how to retrieve it. 

Hope that helps.

Cheers,
Lincoln

On Dec 4, 2013, at 5:24 PM, Dr. Harinder Singh Bawa wrote:

Hi Rob , lincoln,

I have some queries, if you can answer would be appreciated. I am seeing 206 jobs being run on our Fresno T3 cluster under name "fresnoatlas" which is registered as connect client. 

3474.0   fresnoatlas    12/4  14:45   0+00:00:00 I  0   0.0  condor_exec.exe -d
3475.0   fresnoatlas    12/4  14:45   0+00:00:00 I  0   0.0  condor_exec.exe -d
3476.0   fresnoatlas    12/4  14:45   0+00:00:00 I  0   0.0  condor_exec.exe -d
3477.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d
3478.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d
3479.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d
3480.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d
3481.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d
3482.0   fresnoatlas    12/4  14:46   0+00:00:00 I  0   0.0  condor_exec.exe -d

206 jobs; 30 idle, 176 running, 0 held

*********************************************************************************

This was the question I asked before: You submitted say 10k jobs from atlas connect. 

From our side:

1. How do we see how many jobs are being allotted to Fresno T3.  Using condor_q -global gives me  Fresno Atlas got 206 jobs, but Is it all I need to look for?

2. Since "fresnoatlas" is the account registered in connect client,  If I understand its kind of route to Fresno T3. How do we know which user had their jobs running? Is there any monitoring/bookkeeping we can do from condor point of view.


Harinder











On Wed, Dec 4, 2013 at 2:49 PM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:
Thanks Lincoln!

On Dec 4, 2013, at 4:45 PM, Lincoln Bryant <lincolnb AT uchicago.edu> wrote:

On it -- we need to install the extra packages from the other Connect sites.

--Lincoln

On Dec 4, 2013, at 4:44 PM, Rob Gardner wrote:

As well as the “distribution” command.

On Dec 4, 2013, at 4:36 PM, Rob Gardner <rwg AT hep.uchicago.edu> wrote:

Just a heads up — I’ve submitted 10k jobs (each just sleeps 5 minutes) from login.usatlas.org.

Also, Lincoln, the historygram command is not installed.

---
Rob Gardner • Skype rwg773 • 312-804-0859 • University of Chicago


---
Rob Gardner • Skype rwg773 • 312-804-0859 • University of Chicago

_______________________________________________
Atlas-connect-l mailing list
Atlas-connect-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/atlas-connect-l


---
Rob Gardner • Skype rwg773 • 312-804-0859 • University of Chicago


_______________________________________________
Atlas-connect-l mailing list
Atlas-connect-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/atlas-connect-l




--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo





Archive powered by MHonArc 2.6.24.

Top of Page