Skip to Content.
Sympa Menu

sphenix-software-l - Re: [Sphenix-software-l] Quick study on storage containers

sphenix-software-l AT lists.bnl.gov

Subject: sPHENIX discussion of software

List archive

Chronological Thread  
  • From: "Huang, Jin" <jhuang AT bnl.gov>
  • To: Christof E Roland <cer AT mit.edu>, "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-software-l] Quick study on storage containers
  • Date: Fri, 8 Apr 2022 18:02:37 +0000

Hi Christof and Hugo

 

Just to share one thought on this:

 

> we can get rid of maps wherever we can.

 

Many memory concerns are focused on the map. However, there are two aspects mixed here:

  1. At each insertion into the map, we make a dynamic allocation of memory for a new class, hitset, hit, or clusters, onto the heap.
  1. These are small object of varying sizes. It appears to me this could be one cause of large memory usage just for heap management, which affects both map and TObjArrays.
  2. vector resolve it somewhat by allocating a continuous block of memory for storage and it could be one cause of the efficiency (besides saving indexing in point 2).  
  3. This problem has nothing to do with the map itself though
map provide an efficiency random access to a non-continues key space.
  1. To do that, it allocate memory and internal structure to save the key and a pointer to our class. Nonetheless, we are not saving our class in map.
  2. For continuous key space, vector probably will do better both in memory and in speed

 

Therefore, before we blame map for memory usage, it appears useful to test the case that

  • keep the current map structure untouched for fast indexing in non-continuous key space
  • but for our class, instead of allocating it on heap, allocating it on a stl::vector or on a TClonesArray (for better ROOT IO)
  • TClonesArray can be configured to handle ROOT IO, and at readback rebuild the stl::map index. I think Hugo used this strategy for the PHENIX Muon Spectrumeter

 

This approach use the cost of memory for (key + pointer)/object to retain the benefit of fast indexing provided by map, and least invasive change to the current code. For continuous keyed objects, we probably want to just switch to vector or TClonesArray

 

Cheers

 

Jin

 

______________________________

 

Jin HUANG

 

Physicist, Ph.D.

Brookhaven National Laboratory

Physics Department, Bldg 510 C

Upton, NY 11973-5000

 

Office: 631-344-5898

Cell:   757-604-9946

______________________________

 

-----Original Message-----
From: sPHENIX-software-l <sphenix-software-l-bounces AT lists.bnl.gov> On Behalf Of Christof E Roland via sPHENIX-software-l
Sent: Friday, April 8, 2022 7:46 AM
To: sphenix-software-l AT lists.bnl.gov
Subject: [Sphenix-software-l] Quick study on storage containers

 

Hi,

 

following our discussion on Tuesday i did a quick study on our storage containers, especially the ones using maps, i.e.  the hitsetcontainer.

The hitsetcontainer is the construct storing what is, from the reco software point of view, the closest representation of raw data we have defined.

 

I filled the hitsetcontainer with either 5 k or ~5.5 million fake hits per event and looked at the evolution of the memory footprint of the jobs using the prmon tool. From the difference between the low and high occupancy case I canlculate the job memory "price tag" of storing a hit, i.e. 3 unint32 - 12 bytes.

 

Storing 5 million extra hits in a std map increases the job memory by 0.5GB. This corresponds to a ~100byte effective memory price tag including all the overhead and pre allocated memory around stl maps.

 

Using TObjArrays its about 60 byte

 

For stl vectors its ~14 which, given the precision with which you can read off the GB job memory scale in prmon, is equal to the 12 bytes you expect.

 

I guess we should investigate to which extend we can get rid of maps wherever we can.

 

Cheers

 

   Christof

 

 




Archive powered by MHonArc 2.6.24.

Top of Page