sphenix-software-l AT lists.bnl.gov
Subject: sPHENIX discussion of software
List archive
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>, phenix-off-l <phenix-off-l AT lists.bnl.gov>
- Subject: [Sphenix-software-l] condor changes next Tuesday
- Date: Wed, 2 Jan 2019 16:38:11 -0500
Hi folks,
rcf is making significant changes to the farm setup. In the end there won't be any more dedicated machines, just quotas in a shared cluster (where jobs can spill over to unused resources). In general the new environment will be less restrictive than the current general queue. Jobs will be allowed to run for 3 days and use 1.4 GB memory by default (previously 6 hours and sometimes <1GB depending on the generosity of the host experiment).
Changes you need to know about:
The default condor job file is a lot simpler now, no more experiment= or job types. The new policy is explained in:
https://www.racf.bnl.gov/docs/sw/condor/newpolicy
you can find an example job file here (but please do not set your GetEnv=True):
https://www.racf.bnl.gov/docs/sw/condor/quickstart
What is new is the possibility to request memory and cores (request_memory and request_cpus) if needed. I am not aware of us running multi-threaded jobs where you could make use of more than one core but the memory might be important. Your job is allowed to use 30% more memory than requested before being evicted (so all jobs which need > 1.8GB need to set this, you can use MB or GB). Technically condor assumes 2GB/core and will allocate as many cores as necessary to satisfy the requested memory (if you request 20GB, you will have to wait until a machine has 10 slots free before this job can run) and your user quota will be charged as if you ran on that many cores. With that change there is no himem jobtype necessary anymore but you will need to know within 30% how much memory your job needs (and yes you can go above 20GB).
Next Tuesday our interactive machines will be changed to this. I assume the main thing you need to change is removing the requirement to run on phenix machines which was needed for long running jobs. If you leave this in the number of machines which can run your jobs is drastically reduced.
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
- [Sphenix-software-l] condor changes next Tuesday, pinkenburg, 01/02/2019
Archive powered by MHonArc 2.6.24.