sphenix-software-l AT lists.bnl.gov
Subject: sPHENIX discussion of software
List archive
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>
- Subject: [Sphenix-software-l] MDC status
- Date: Fri, 20 Nov 2020 19:53:29 -0500
Hi folks,
I produced 10mio minbias hijing events (guess what, hijing has a 0.04% chance to run into an infinite loop in the subroutine luzdis). They are located in /sphenix/sim/sim01/sphnxpro/MDC1/sHijing_HepMC/data (1000 files with 10000 events each). sHijing has now a rudimentary command line parser so one can set the number of events, the seed and the output filename from the cmd line (no more multiple xml files where one typically forgets to update the seed and ends up with identical files).
The macro and code for the pass1 (G4 hits production) is ready. I enabled the bbc, micromegas and the epd (since the epd is outside of our regular acceptance it doesn't interfere with our baseline). The flow and Fermi motion afterburners are enabled. We only save the detector hits (no absorber hits) and the truth info. The output filesize (based on 2 jobs) is 12-13GB for 100 events, the running time is in the order of 15-20 hours. So we are looking at 100,000 condor jobs (and need 228 cpu years for processing) and a total storage need of 1.3PB. This wouldn't be a big problem to run under normal circumstances but the memory consumption of each jobs is 20GB and this drastically reduces the number of simultaneous jobs we can run. Our memory goes in quanta of 2GB, so these jobs will only start if a machine has 10 idle cores when the scheduler checks a node. The farm is always busy, so chances of this happening are limited. It'll get better with time since one of our jobs quitting will free up 10 cores and the next jobs will just take over. The other problem is that we have a lot of old hardware where 10 cores are a substantial fraction of a node (I don't think we have nodes with fewer than 10 cores - they are not that old). Basically our throughput is hard to predict and so I just submitted 5000 jobs and threw all our condor slots into the sphenix queue to see what we can get out of rcf. But I am not terribly optimistic that 10 mio events will be possible in a months time if we cannot reduce the memory consumption.
Anyway - given that our G4 code has been stable for a long time we can likely use those files for the MDC and I would like to start the production to get going as soon as possible. From a test I have two g4hits files available under
/sphenix/sim/sim01/sphnxpro/MDC1/sHijing_HepMC/G4Hits
please have a look at those. They lack the flow and fermi motion afterburner but are otherwise identical to what is being run right now.
The first version of the reconstruction pass macro (pass2) which can run on the DST's produced by pass1 is also running. There will definitely be some tuning and changes in the reconstruction code but once the hijing production spits out some hits files we can run this to produce input for the topical groups (and anyone who wants to analyze this). The processing for the above mentioned two hits files is ongoing, the DSTs will be written to (in the hope that their memory stays more or less at 6GB where it is right now):
/sphenix/sim/sim01/sphnxpro/MDC1/sHijing_HepMC/DST
give it till tomorrow before you look. Those jobs run more than just the tracking and they seems to take about 10 minutes/event (based on 2 events), ~15 hours for those 100 events. On the positive side - event wise memory leaks which we have been fixing over the last few days are not too critical when you run over 100 events only.
Just a reminder, the MDC git repo is
https://github.com/sPHENIX-Collaboration/MDC1.
The production Fun4All macros (pass1 and pass2) are located in the macros/detectors/sPHENIX directory:
https://github.com/sPHENIX-Collaboration/MDC1/tree/main/macros/detectors/sPHENIX
They do call the common subsystem macros, so they stay in sync with our latest and greatest (until we tag the show and make some production build). Feel free to submit PR's with changes (or let me know).
Have a good weekend,
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
-
[Sphenix-software-l] MDC status,
pinkenburg, 11/20/2020
-
Re: [Sphenix-software-l] MDC status,
Anthony Frawley, 11/20/2020
- Re: [Sphenix-software-l] MDC status, pinkenburg, 11/21/2020
-
Re: [Sphenix-software-l] MDC status,
Anthony Frawley, 11/20/2020
Archive powered by MHonArc 2.6.24.