sphenix-software-l AT lists.bnl.gov
Subject: sPHENIX discussion of software
List archive
[[Sphenix-software-l] ] how to minimize disk ops in your condor jobs
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>
- Subject: [[Sphenix-software-l] ] how to minimize disk ops in your condor jobs
- Date: Thu, 19 Sep 2024 12:28:15 -0400
Hi folks,
we are seeing pretty severe bottlenecks in lustre. The monitoring indicates that it is not so much the i/o but the disk ops which slow it down by keeping the servers busy with seeking data. This is an inherent feature of reading and analyzing data from TFiles, so there is not much we can do about that in terms of coding.
One possible remedy is to copy those files to the local disk in your condor jobs. Using cp allows all the caching to work and is therefore a lot more lenient on lustre (it also works better than rsync). I added a little section to our wiki how to do this in the script you are submitting to condor:
https://wiki.sphenix.bnl.gov/index.php?title=Condor#Minimizing_Disk_Ops
I have been doing this for the simulations for many years now and it works just fine even on a scale of 50k jobs. I didn't want to make the above example too complicated - the sims actually pass the output location as an argument via list files, so the same script can be reused everywhere.
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
- [[Sphenix-software-l] ] how to minimize disk ops in your condor jobs, pinkenburg, 09/19/2024
Archive powered by MHonArc 2.6.24.