Skip to Content.
Sympa Menu

atlas-connect-l - [Atlas-connect-l] Can't submit condor jobs from US ATLAS

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Christopher Meyer <chris.meyer AT cern.ch>
  • To: "atlas-connect-l AT lists.bnl.gov" <atlas-connect-l AT lists.bnl.gov>
  • Subject: [Atlas-connect-l] Can't submit condor jobs from US ATLAS
  • Date: Wed, 30 Mar 2016 21:56:24 +0000

Dear Experts,

I can't currently submit jobs to the condor queue on US ATLAS, maybe someone knows what might be going wrong? I'm using the same submission script I've used for a while.

I've attached a few dagman files that may be helpful. In particular, the out file has this error:

03/30/16 16:48:15 Warning: failed to get attribute DAGNodeName
03/30/16 16:48:15 ERROR: log file /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.nodes.log is on NFS.
03/30/16 16:48:15 Error: log file /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.nodes.log on NFS
03/30/16 16:48:15 **** condor_scheduniv_exec.273467.0 (condor_DAGMAN) pid 791917 EXITING WITH STATUS 1

Let me know if more information would be useful.

Thanks!
Chris

Attachment: dag.mc12_8TeV.lib.err
Description: Binary data

000 (273467.000.000) 03/30 16:48:15 Job submitted from host: <192.170.227.199:11460?addrs=192.170.227.199-11460>
...
001 (273467.000.000) 03/30 16:48:15 Job executing on host: <192.170.227.199:11460?addrs=192.170.227.199-11460>
...
005 (273467.000.000) 03/30 16:48:15 Job terminated.
	(1) Normal termination (return value 1)
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	0  -  Run Bytes Sent By Job
	0  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	0  -  Total Bytes Received By Job
...

Attachment: dag.mc12_8TeV.lib.out
Description: Binary data

Attachment: dag.mc12_8TeV.dagman.out
Description: Binary data

# Filename: /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.condor.sub
# Generated by condor_submit_dag /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV 
universe	= scheduler
executable	= /usr/bin/condor_dagman
getenv		= True
output		= /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.lib.out
error		= /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.lib.err
log		= /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.dagman.log
remove_kill_sig	= SIGUSR1
+OtherJobRemoveRequirements	= "DAGManJobId =?= $(cluster)"
# Note: default on_exit_remove expression:
# ( ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
# attempts to ensure that DAGMan is automatically
# requeued by the schedd if it exits abnormally or
# is killed (e.g., during a reboot).
on_exit_remove	= (ExitSignal =?= 11 || (ExitCode =!= UNDEFINED && ExitCode >=0 && ExitCode <= 2))
copy_to_spool	= False
arguments	= "-p 0 -f -l . -Lockfile /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.lock -AutoRescue 1 -DoRescueFrom 0 -Dag /home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV -Suppress_notification -CsdVersion $CondorVersion:' '8.4.4' 'Feb' '04' '2016' '$ -Dagman /usr/bin/condor_dagman"
environment	= _CONDOR_DAGMAN_LOG=/home/cjmeyer/condor_output/sys/mc12_8TeV/dag.mc12_8TeV.dagman.out;_CONDOR_SCHEDD_ADDRESS_FILE=/var/lib/condor/spool/.schedd_address;_CONDOR_SCHEDD_DAEMON_AD_FILE=/var/lib/condor/spool/.schedd_classad;_CONDOR_MAX_DAGMAN_LOG=0
queue

Attachment: dag.mc12_8TeV
Description: Binary data




Archive powered by MHonArc 2.6.24.

Top of Page