Skip to Content.
Sympa Menu

sphenix-tracking-l - Re: [Sphenix-tracking-l] memory plots for 10k events

sphenix-tracking-l AT lists.bnl.gov

Subject: sPHENIX tracking discussion

List archive

Chronological Thread  
  • From: pinkenburg <pinkenburg AT bnl.gov>
  • To: sphenix-tracking-l AT lists.bnl.gov
  • Subject: Re: [Sphenix-tracking-l] memory plots for 10k events
  • Date: Fri, 14 Jan 2022 17:26:31 -0500

Hi Hugo,

https://web.sdcc.bnl.gov/jenkins-sphenix/job/sPHENIX/job/test-default-detector-valgrind-pipeline/1184/valgrindResult/pid=49960,0x1c1230/

It hides behind
0x1c1230 2,128 bytes in 38 blocks are definitely lost in loss record 37,000 of 43,535

If one has multiple blocks those are suspicious because it's the same leak multiple times (either in a loop - we have a few from "new G4RotationMatrix()") or for each event in process_event which are the leaks we need to fix

Chris


On 1/14/2022 5:14 PM, Hugo Pereira Da Costa via sPHENIX-tracking-l wrote:

Hi Chris, others


I'm trying to see if the error you quote below has disapeared from the valgrind report of https://github.com/sPHENIX-Collaboration/coresoftware/pull/1387

How can I do that,

looking at the valgrind report there https://web.sdcc.bnl.gov/jenkins-sphenix/job/sPHENIX/job/test-default-detector-valgrind-pipeline/1184/valgrindResult/pid=49960/  I see a list of all possible leaks, but without a way of knowing which piece of code they correspond to. Am I missing some magic link ?


Thanks,


Hugo



On 1/14/2022 9:31 AM, pinkenburg via sPHENIX-tracking-l wrote:
Hi Tony, Joe,

it's time to look at valgrind again, I haven't hunted memory leaks in a long time.

valgrind from jenkins points to an event by event leak in the TpcClusterizer.cc:

https://web.sdcc.bnl.gov/jenkins-sphenix/job/sPHENIX/job/test-default-detector-valgrind-pipeline/1180/valgrindResult/pid=41720,0x1b3fbd/

I don't understand this - uniq_ptrs should not leak memory but then there is root :) Given the number of tpc clusters we have in hijing events this might be a substantial part of the lost memory.

I think there is a similar warning coming from GenFit - but I assume once GenFit is ditched, this will go away by itself.

In terms of the abort event - these jobs run only on the new hardware since the files are in lustre. I hope by the end of today we can run on other nodes using minio to read those files. But over the weekend I'll just run one with verbosity enabled to print out the return codes. That should tell us the culprit and then we can skip to the event which triggers it for debugging (if the reason is not immediately apparent).

Chris


On 1/14/2022 11:16 AM, Anthony Frawley wrote:
Hello Chris,
Ouch. So even after the rapid rise to 8 GB resident memory, it doubles again. Do we have any memory profiling tools?
Tony

From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on behalf of pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Sent: Friday, January 14, 2022 10:46 AM
To: sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
Subject: [Sphenix-tracking-l] memory plots for 10k events
 
Hi folks,

I ran over 10k events from the latest production under prmon with the
current tracking (Wednesday to be specific). The plots are attached. It
doesn't seem to make a difference if the tracks are written out or not
(with output/no output) . There goes my pet theory which was based on
https://urldefense.com/v3/__https://github.com/pinkenburg/rootmemory__;!!PhOWcWs!lzlLUxlIUvOP0PdItgdBME0jGja82Pe6D-1ppFUXDVVXJSXxpROhmlRu49EBIYPT$ , though those tests were done
without reading root objects which is what our tracking does.

Our resident memory grows by quite a bit, roughly a factor of 2 (rss
memory only in PrMon_wtime_vs_rss_with_output.png). If the vmem turns
out to be a feature we need to adjust the swap space of our nodes by
quite a bit.

Another observation is that asking for 10k events results in reading
10195 events - something in our chain is discarding events at a 2% level
(returning ABORT_EVENT).

Chris

--
*************************************************************

Christopher H. Pinkenburg       ;    pinkenburg AT bnl.gov
                                ;    https://urldefense.com/v3/__http://www.phenix.bnl.gov/*pinkenbu__;fg!!PhOWcWs!lzlLUxlIUvOP0PdItgdBME0jGja82Pe6D-1ppFUXDVVXJSXxpROhmlRu499iM7pX$

Brookhaven National Laboratory  ;    phone: (631) 344-5692
Physics Department Bldg 510 C   ;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************

-- 
*************************************************************

Christopher H. Pinkenburg	;    pinkenburg AT bnl.gov
				;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory	;    phone: (631) 344-5692
Physics Department Bldg 510 C	;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************

_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l

_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l

-- 
*************************************************************

Christopher H. Pinkenburg	;    pinkenburg AT bnl.gov
				;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory	;    phone: (631) 344-5692
Physics Department Bldg 510 C	;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************



Archive powered by MHonArc 2.6.24.

Top of Page