atlas-connect-l AT lists.bnl.gov
Subject: Atlas-connect-l mailing list
List archive
[Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3
- From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa AT gmail.com>
- To: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
- Subject: [Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3
- Date: Thu, 10 Apr 2014 14:10:11 -0700
Hello,
I am writing related to the thread where jobs on Fresno T-3 were killed due to disk space. We didn't find any such problem and continue monitoring it. The problem could be due to many factors though.
Since Jay is investigating into the history of previous jobs, I want to test the system and have submitted jobs from "atlasconnect" forcing the jobs to run on "FresnoState" . But my jobs are on idle state for long time.
Moreover, I am getting the following message: Wanted to know if "FresnoState" was kind of blacklisted/blocked for a moment? We wanted to test with some jobs and monitor..........
***************************************************************
The jobs requirements is:
Jobs = 230
getenv = False
executable = SkimSlimLarge.sh
output = output/SkimSlimLarge.out.$(Process)
error = error/SkimSlimLarge.error.$(Process)
log = log/SkimSlimLarge.log.$(Process)
arguments = $(Process) $(Jobs)
transfer_input_files = filter-and-merge-d3pd.py,x509up_u55261,inputFileListLarge,branchesList,cutCode
universe = vanilla
Requirements = ( IS_RCC_FRESNOSTATE )
WhenToTransferOutput = ON_EXIT
+ProjectName = "atlas-org-fresno-state"
queue $(Jobs)
Moreover, If I wanted to check the queue as I use to do, I am getting following message
*****************
[hbawa@login log]$ condor_q -name login.atlas.ci-connect.net -pool uct2-bosco.mwt2.org:11120?sock=collector -run
Error: Couldn't contact the condor_collector on
Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines and
jobs in the Condor pool. The condor_collector might not be running, it might
be refusing to communicate with you, there might be a network problem, or
there may be some other problem. Check with your system administrator to fix
this problem.
If you are the system administrator, check that the condor_collector is
running on uct2-bosco.mwt2.org:11120?sock=collector, check the ALLOW/DENY
configuration in your condor_config, and check the MasterLog and CollectorLog
files in your log directory for possible clues as to why the condor_collector
is not responding. Also see the Troubleshooting section of the manual
- [Atlas-connect-l] Jobs on idle state (Not running) on Fresno T3, Dr. Harinder Singh Bawa, 04/10/2014
Archive powered by MHonArc 2.6.24.