Skip to Content.
Sympa Menu

atlas-connect-l - [Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
  • To: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
  • Subject: [Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3
  • Date: Thu, 10 Apr 2014 14:10:11 -0700

Hello,

I am writing related to the thread where jobs on Fresno T-3 were killed due to disk space. We didn't find any such problem and continue monitoring it. The problem could be due to many factors though.

Since Jay is investigating into the history of previous jobs, I want to test the system and have submitted jobs from "atlasconnect" forcing the jobs to run on "FresnoState" . But my jobs are on idle state for long time.

Moreover, I am getting the following message: Wanted to know if "FresnoState" was kind of blacklisted/blocked for a moment? We wanted to test with some jobs and monitor..........

***************************************************************
The jobs requirements is:

Jobs = 230
getenv         = False
executable     = SkimSlimLarge.sh
output         = output/SkimSlimLarge.out.$(Process)
error          = error/SkimSlimLarge.error.$(Process)
log            = log/SkimSlimLarge.log.$(Process)
arguments = $(Process) $(Jobs)
transfer_input_files = filter-and-merge-d3pd.py,x509up_u55261,inputFileListLarge,branchesList,cutCode
universe       = vanilla

Requirements = ( IS_RCC_FRESNOSTATE )
WhenToTransferOutput = ON_EXIT
+ProjectName = "atlas-org-fresno-state"
queue $(Jobs)


Moreover, If I wanted to check the queue as I use to do, I am getting following message

*****************
[hbawa@login log]$ condor_q -name login.atlas.ci-connect.net -pool uct2-bosco.mwt2.org:11120?sock=collector -run
Error: Couldn't contact the condor_collector on 

Extra Info: the condor_collector is a process that runs on the central 
manager of your Condor pool and collects the status of all the machines and 
jobs in the Condor pool. The condor_collector might not be running, it might 
be refusing to communicate with you, there might be a network problem, or 
there may be some other problem. Check with your system administrator to fix 
this problem. 

If you are the system administrator, check that the condor_collector is 
running on uct2-bosco.mwt2.org:11120?sock=collector, check the ALLOW/DENY 
configuration in your condor_config, and check the MasterLog and CollectorLog 
files in your log directory for possible clues as to why the condor_collector 
is not responding. Also see the Troubleshooting section of the manual








--
Dr. Harinder Singh Bawa

                                          
[web][facebook][youtube][twitter]
California State University, Fresno Logo



  • [Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3, Dr. Harinder Singh Bawa, 04/10/2014

Archive powered by MHonArc 2.6.24.

Top of Page