sphenix-software-l AT lists.bnl.gov
Subject: sPHENIX discussion of software
List archive
[[Sphenix-software-l] ] please check your memory requests for your condor jobs
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>
- Subject: [[Sphenix-software-l] ] please check your memory requests for your condor jobs
- Date: Sun, 2 Mar 2025 12:54:07 -0500
Hi folks,
we have around 60k condor slots but run only 20k jobs. The reason for this are jobs requesting more memory than the 4GB we have per core. Sometimes this is necessary (the sim jobs need 10GB) but condor goes with what you request when allocating resources, not with what your job actually needs (means if you request 20GB and your job uses 2GB, condor will allocate 20GB).
As of late last week I think we have the random evictions and weird memory accounting by condor (where high i/o was counted as memory) sorted out. The current setting is that your job can exceed the requested memory by 20% before it gets evicted. Additionally the memory usage condor prints out every so often into the condor log seems fairly accurate.
I went through a few running jobs and found that they request (sometimes way) too much memory which is a large contributor to our idle cores.
Right now this really jeopardizes the sims for ppg02
Please be mindful and have a look if your memory requests are backed by your actual usage
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
- [[Sphenix-software-l] ] please check your memory requests for your condor jobs, pinkenburg, 03/02/2025
Archive powered by MHonArc 2.6.24.