Skip to Content.
Sympa Menu

sphenix-tracking-l - Re: [Sphenix-tracking-l] [EXTERNAL] memory plots for 10k events

sphenix-tracking-l AT lists.bnl.gov

Subject: sPHENIX tracking discussion

List archive

Chronological Thread  
  • From: "Osborn, Joe" <osbornjd AT ornl.gov>
  • To: pinkenburg <pinkenburg AT bnl.gov>, sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-tracking-l] [EXTERNAL] memory plots for 10k events
  • Date: Fri, 14 Jan 2022 16:14:02 +0000

Hi Chris,

 

So I guess we have two problems to add to our list:

 

  1. Why does the memory initially jump up to 8 GB out of the gate
  2. Why does the memory grow by a factor of ~2

 

I have two follow up questions regarding this:

 

  1. How do we go about debugging where the problem(s) exist?
  2. Do you have logfiles for these jobs to try to track down why ~2% of the jobs are aborted (or are these reproducible in some simple way)?

 

 

---------------------------

 

Joe Osborn, Ph.D.

Associate Research Scientist

Oak Ridge National Laboratory

osbornjd AT ornl.gov

(859)-433-8738

 

 

From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on behalf of pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Date: Friday, January 14, 2022 at 10:48 AM
To: sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
Subject: [EXTERNAL] [Sphenix-tracking-l] memory plots for 10k events

Hi folks,

I ran over 10k events from the latest production under prmon with the
current tracking (Wednesday to be specific). The plots are attached. It
doesn't seem to make a difference if the tracks are written out or not
(with output/no output) . There goes my pet theory which was based on
https://github.com/pinkenburg/rootmemory, though those tests were done
without reading root objects which is what our tracking does.

Our resident memory grows by quite a bit, roughly a factor of 2 (rss
memory only in PrMon_wtime_vs_rss_with_output.png). If the vmem turns
out to be a feature we need to adjust the swap space of our nodes by
quite a bit.

Another observation is that asking for 10k events results in reading
10195 events - something in our chain is discarding events at a 2% level
(returning ABORT_EVENT).

Chris

--
*************************************************************

Christopher H. Pinkenburg       ;    pinkenburg AT bnl.gov
                                ;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory  ;    phone: (631) 344-5692
Physics Department Bldg 510 C   ;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************




Archive powered by MHonArc 2.6.24.

Top of Page