Skip to Content.
Sympa Menu

sphenix-tracking-l - Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)

sphenix-tracking-l AT lists.bnl.gov

Subject: sPHENIX tracking discussion

List archive

Chronological Thread  
  • From: Christof Roland <christof.roland AT cern.ch>
  • To: Christof Roland <christof.roland AT cern.ch>, "Christof Roland via sPHENIX-tracking-l" <sphenix-tracking-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
  • Date: Fri, 21 Jan 2022 16:12:38 +0100

Hi again, 

I am still struggling with the memory leak. 
This may turn out to be a little more intricate than we thought. 

I get the Reset() of TrkrClusterHitAcssocv3 to be executed now, but all attempts to get rid of the memory
failed so far. The only way I get the memory to be wiped is if I call clusterhitassoc->clear() at the end 
of the processing of each hitset inside each given thread processing it. 

Any attempts clearing the memory outside the threads fail.
So this is probably the place to look, but i am not sure I understand yet how this works.
Each thread gets a pointer to the multimap that is supposed to store the association information.  

std::multimap<TrkrDefs::cluskey, TrkrDefs::hitkey>  *clusterhitassoc

This is then filled for each cluster with the entries from a vector containing the hitkeys constituting the cluster
explicit 
 if(my_data.do_assoc && my_data.clusterhitassoc){
for (unsigned int i = 0; i < hitkeyvec.size(); i++){
 my_data.clusterhitassoc->insert(std::make_pair(ckey, hitkeyvec[i]));
}
      }

no explicit memory allocation or new operators involved.
Does anybody have an idea how to work around this?

cheers

   Christof 


On 20. Jan 2022, at 21:00, Christof Roland via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:

Hi, 

I ran over 1000 - 4000 events. 

I put a print statement in the Reset() of TrkrClusterHitAssocv3 and it never shows. 

I suspect there is some confusion as well with the versioning and where you are using an explite
pointer to the base class and where you acctually use v3. 

I also found a loop that should clear and set to zero the multimaps used in the HitAssoc, but as of now 
it doesn't get called. 

But at least turning on and off the hit assiciation turns on and off the big memory leak. 
There is another small one from the ACTS layer association. But that one is a small fraction of a GB
over 4k events. I will investigate that one if have the hit assoc one stuffed. 

Btw. we may want to get rid of the hit assoc container completely at some point. 
For reco we doent need it and for truth association we will probably make it obsolete as well 
wheee we rework the logic here. 

cheers

   Christof 


On 20. Jan 2022, at 18:31, pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:

Hi Tony, Christof,

I wonder if we are barking up the wrong tree here. The memory leak plot ran over 10k events. If I look at the scale of Christofs plot it is likely from a single input file with 400 events. Maybe we are seeing just the map blowing up by accommodating the largest event it saw.

But clearing the map this way might do some good since I think we have the problem that a map memory footprint  doesn't really shrink if you .clear() them.

Chris

On 1/20/2022 12:19 PM, Anthony Frawley wrote:
Dear Christof and Chris,

TrkrHitTruthAssociation has the same reset instruction. I assume that Christof is running over already produced hits files, so would not see a memory leak from that.

The clear command for a multimap reduces the size to zero. So evidently the problem is that something is not releasing the memory.

I came across a suggestion that may be worth trying:


The suggestion is to create an empty map (in local scope, presumably) and use swap to exchange the content with persistent map:
The example was:

=============

Try:

{
    std::map<int,int> empty_map;
    empty_map.swap(_map);
}

(At least, this is the usual way to convince a standard library container actually to release its memory.)

=============

It would be worth trying this in the reset method of the container.

Tony



From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on behalf of Christof Roland via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Sent: Thursday, January 20, 2022 10:17 AM
To: pinkenburg <pinkenburg AT bnl.gov>
Cc: sphenix-tracking-l AT lists.bnl.gov <sphenix-tracking-l AT lists.bnl.gov>
Subject: Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
 
Hi Chris, 

ok, I just ran a job explicitely resetting the maps in the clusterizer. This didn't change things. 
See below
Its the Reset() of TrkrClusterHitAssocv3.cc
The filling of it is well contained in an if statement in the clusterizer, so I'll check there if something else is 
alloacted in there.

What is the recommended way of resetting a map?

cheers

   Christof 


<prmon_clusterizer_reset.jpg>

On 20. Jan 2022, at 16:00, pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:

Hi Christof,

the event is cleared in a separate step - after processing by calling the Reset() method for every object under the DST node (we have some opaque foreach type way to iterate over the node tree, what is being executed here is offline/framework/phool/PHNodeReset.cc).

I think the problem is likely in a Reset() method of the container. Which one we are talking about is opaque to me but if I look at the containers in trackbase, they just clear their map without calling the Reset() method of their members. If a member of this map allocates memory it leaks right there.

Chris


On 1/20/2022 7:49 AM, Christof Roland via sPHENIX-tracking-l wrote:
Hi Everybody, 

I tried a few stunts to find the memory leak in the TpcClusterizer and what finally 
stopped it was turning off the creation of the ClusterHitAssoociation map. 
With this disabled the memory along 1000 events is pretty stable. See PrMon plot below.

It looks like thes ClusterHitsAssoc map is not properly reset. Looking in the Clusterizer code 
neither the clustermap nor the assoc map are reset. Which makes sense since the INTT and 
MVTX clusterizers also write to the same maps. 

While ClusterMap apparently gets reset properly, the assoc map doesn't.
Where do we do this actually?

Cheers

   Christof 


<prmon_clusterizer_nohitassoc.jpg>

_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l

-- 
*************************************************************

Christopher H. Pinkenburg	;    pinkenburg AT bnl.gov
				;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory	;    phone: (631) 344-5692
Physics Department Bldg 510 C	;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************
_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l


-- 
*************************************************************

Christopher H. Pinkenburg	;    pinkenburg AT bnl.gov
				;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory	;    phone: (631) 344-5692
Physics Department Bldg 510 C	;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************
<PrMon_wtime_vs_rss_with_output.png>_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l

_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l




Archive powered by MHonArc 2.6.24.

Top of Page