Skip to Content.
Sympa Menu

sphenix-software-l - [Sphenix-software-l] reminder: condor changes today

sphenix-software-l AT lists.bnl.gov

Subject: sPHENIX discussion of software

List archive

Chronological Thread  
  • From: pinkenburg <pinkenburg AT bnl.gov>
  • To: "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>, phenix-off-l <phenix-off-l AT lists.bnl.gov>
  • Subject: [Sphenix-software-l] reminder: condor changes today
  • Date: Tue, 8 Jan 2019 12:32:18 -0500

Hi folks,

today our condor setup will be changed, here is the mail from last week again:

Hi folks,

rcf is making significant changes to the farm setup. In the end there won't be any more dedicated machines, just quotas in a shared cluster (where jobs can spill over to unused resources). In general the new environment will be less restrictive than the current general queue. Jobs will be allowed to run for 3 days and use 1.4 GB memory by default (previously 6 hours and sometimes <1GB depending on the generosity of the host experiment).

Changes you need to know about:
The default condor job file is a lot simpler now, no more experiment= or job types. The new policy is explained in:

https://www.racf.bnl.gov/docs/sw/condor/newpolicy

you can find an example job file here (but please do not set your GetEnv=True):

https://www.racf.bnl.gov/docs/sw/condor/quickstart

What is new is the possibility to request memory and cores (request_memory and request_cpus) if needed. I am not aware of us running multi-threaded jobs where you could make use of more than one core but the memory might be important.  Your job is allowed to use 30% more memory than requested before being evicted (so all jobs which need > 1.8GB need to set this, you can use MB or GB). Technically condor assumes 2GB/core and will allocate as many cores as necessary to satisfy the requested memory (if you request 20GB, you will have to wait until a machine has 10 slots free before this job can run) and your user quota will be charged as if you ran on that many cores. With that change there is no himem jobtype necessary anymore but you will need to know within 30% how much memory your job needs (and yes you can go above 20GB).

Next Tuesday our interactive machines will be changed to this. I assume the main thing you need to change is removing the requirement to run on phenix machines which was needed for long running jobs. If you leave this in the number of machines which can run your jobs is drastically reduced.

Chris

P.S. if you find your job is not starting for some reason (use condor_q -better-analyze <condor id> to get an idea which requirement prevents it from running), you can use condor_qedit to change the requirements, e.g. to set the virtual memory request to >10:

condor_qedit <condor id> Requirements=VirtualMemory > 10

Once condor fins machines which match your requirements it'll start them. This way you do not have to resubmit stuck jobs.

--
*************************************************************

Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000

*************************************************************




  • [Sphenix-software-l] reminder: condor changes today, pinkenburg, 01/08/2019

Archive powered by MHonArc 2.6.24.

Top of Page