sphenix-tracking-l AT lists.bnl.gov
Subject: sPHENIX tracking discussion
List archive
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: Christof Roland via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
- Subject: Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
- Date: Fri, 21 Jan 2022 11:56:40 -0500
Hi Christof,
this might be a threading issue. I just checked - map::insert does not seem to be thread safe. Can you put a mutex around this insert and see if this make the leak go away? Or just run single threaded?
Chris
On 1/21/2022 10:12 AM, Christof Roland
wrote:
Hi again,
I am still struggling with the memory leak.
This may turn out to be a little more intricate than
we thought.
I get the Reset() of TrkrClusterHitAcssocv3 to be
executed now, but all attempts to get rid of the memory
failed so far. The only way I get the memory to be
wiped is if I call clusterhitassoc->clear() at the end
of the processing of each hitset inside each given
thread processing it.
Any attempts clearing the memory outside the threads
fail.
So this is probably the place to look, but i am not
sure I understand yet how this works.
Each thread gets a pointer to the multimap that is
supposed to store the association information.
I put a print statement in the
Reset() of TrkrClusterHitAssocv3 and it never shows.
std::multimap<TrkrDefs::cluskey, TrkrDefs::hitkey>
*clusterhitassoc
This is then filled for each cluster with the entries from
a vector containing the hitkeys constituting the cluster
explicit
if(my_data.do_assoc && my_data.clusterhitassoc){
for
(unsigned int i = 0; i < hitkeyvec.size(); i++){
my_data.clusterhitassoc->insert(std::make_pair(ckey,
hitkeyvec[i]));
}
}
no explicit memory allocation or new operators involved.
Does anybody have an idea how to work around this?
cheers
Christof
On 20. Jan 2022, at 21:00, Christof Roland via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:
Hi,I ran over 1000 - 4000 events.
I suspect there is some confusion as well with the versioning and where you are using an explite pointer to the base class and where you acctually use v3.
I also found a loop that should clear and set to zero the multimaps used in the HitAssoc, but as of now it doesn't get called.
But at least turning on and off the hit assiciation turns on and off the big memory leak. There is another small one from the ACTS layer association. But that one is a small fraction of a GB over 4k events. I will investigate that one if have the hit assoc one stuffed.
Btw. we may want to get rid of the hit assoc container completely at some point. For reco we doent need it and for truth association we will probably make it obsolete as well wheee we rework the logic here.
cheers
Christof
On 20. Jan 2022, at 18:31, pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:
TrkrHitTruthAssociation has the same reset instruction. I assume that Christof is running over already produced hits files, so would not see a memory leak from that.Hi Tony, Christof,
I wonder if we are barking up the wrong tree here. The memory leak plot ran over 10k events. If I look at the scale of Christofs plot it is likely from a single input file with 400 events. Maybe we are seeing just the map blowing up by accommodating the largest event it saw.
But clearing the map this way might do some good since I think we have the problem that a map memory footprint doesn't really shrink if you .clear() them.
Chris
On 1/20/2022 12:19 PM, Anthony Frawley wrote:Dear Christof and Chris,
The clear command for a multimap reduces the size to zero. So evidently the problem is that something is not releasing the memory.
I came across a suggestion that may be worth trying:
https://stackoverflow.com/questions/6770316/memory-leak-with-map
The suggestion is to create an empty map (in local scope, presumably) and use swap to exchange the content with persistent map: The example was:
=============
Try:
{
std::map<int,int> empty_map;
empty_map.swap(_map);
}
(At least, this is the usual way to convince a standard library container actually to release its memory.)
It would be worth trying this in the reset method of the container.
Tony
From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on
behalf of Christof Roland via
sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Sent: Thursday,
January 20, 2022 10:17 AM
To: pinkenburg <pinkenburg AT bnl.gov> Cc: sphenix-tracking-l AT lists.bnl.gov <sphenix-tracking-l AT lists.bnl.gov> Subject: Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
Hi Chris,
To: pinkenburg <pinkenburg AT bnl.gov> Cc: sphenix-tracking-l AT lists.bnl.gov <sphenix-tracking-l AT lists.bnl.gov> Subject: Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found)
ok, I just ran a job
explicitely resetting the maps in the
clusterizer. This didn't change things.
See below
Its the Reset() of TrkrClusterHitAssocv3.cc
The filling of it is well
contained in an if statement in the
clusterizer, so I'll check there if
something else is
alloacted in there.
What is the recommended way of
resetting a map?
cheers
Christof
<prmon_clusterizer_reset.jpg>
On 20. Jan 2022, at 16:00, pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:Hi Christof,_______________________________________________
the event is cleared in a separate step - after processing by calling the Reset() method for every object under the DST node (we have some opaque foreach type way to iterate over the node tree, what is being executed here is offline/framework/phool/PHNodeReset.cc).
I think the problem is likely in a Reset() method of the container. Which one we are talking about is opaque to me but if I look at the containers in trackbase, they just clear their map without calling the Reset() method of their members. If a member of this map allocates memory it leaks right there.
Chris
On 1/20/2022 7:49 AM, Christof Roland via sPHENIX-tracking-l wrote:
Hi Everybody,
I tried a few stunts to find the memory leak in the TpcClusterizer and what finallystopped it was turning off the creation of the ClusterHitAssoociation map.With this disabled the memory along 1000 events is pretty stable. See PrMon plot below.
It looks like thes ClusterHitsAssoc map is not properly reset. Looking in the Clusterizer codeneither the clustermap nor the assoc map are reset. Which makes sense since the INTT andMVTX clusterizers also write to the same maps.
While ClusterMap apparently gets reset properly, the assoc map doesn't.Where do we do this actually?
Cheers
Christof
<prmon_clusterizer_nohitassoc.jpg>
_______________________________________________ sPHENIX-tracking-l mailing list sPHENIX-tracking-l AT lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l
-- ************************************************************* Christopher H. Pinkenburg ; pinkenburg AT bnl.gov ; http://www.phenix.bnl.gov/~pinkenbu Brookhaven National Laboratory ; phone: (631) 344-5692 Physics Department Bldg 510 C ; fax: (631) 344-3253 Upton, NY 11973-5000 *************************************************************
sPHENIX-tracking-l mailing list sPHENIX-tracking-l AT lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l
-- ************************************************************* Christopher H. Pinkenburg ; pinkenburg AT bnl.gov ; http://www.phenix.bnl.gov/~pinkenbu Brookhaven National Laboratory ; phone: (631) 344-5692 Physics Department Bldg 510 C ; fax: (631) 344-3253 Upton, NY 11973-5000 *************************************************************<PrMon_wtime_vs_rss_with_output.png>_______________________________________________ sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l
_______________________________________________ sPHENIX-tracking-l mailing list sPHENIX-tracking-l AT lists.bnl.gov https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l
-- ************************************************************* Christopher H. Pinkenburg ; pinkenburg AT bnl.gov ; http://www.phenix.bnl.gov/~pinkenbu Brookhaven National Laboratory ; phone: (631) 344-5692 Physics Department Bldg 510 C ; fax: (631) 344-3253 Upton, NY 11973-5000 *************************************************************
-
[Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
pinkenburg, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Anthony Frawley, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
pinkenburg, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/21/2022
- Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found), pinkenburg, 01/21/2022
- Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found), Christof Roland, 01/21/2022
- Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found), pinkenburg, 01/21/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/21/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
pinkenburg, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Anthony Frawley, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
Christof Roland, 01/20/2022
-
Re: [Sphenix-tracking-l] TpcClusterizer memory leak (found),
pinkenburg, 01/20/2022
Archive powered by MHonArc 2.6.24.