sphenix-tracking-l AT lists.bnl.gov
Subject: sPHENIX tracking discussion
List archive
- From: pinkenburg <pinkenburg AT bnl.gov>
- To: sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
- Subject: [Sphenix-tracking-l] memory tracking
- Date: Fri, 22 Apr 2022 19:59:15 -0400
Hi folks,
I picked apart prmon and did some reading what we should look at for memory (though I am still trying to wrap my head around a few things). It turns out pss (proportional set size) is the most accurate value for our purpose. It's the actually used memory (rss) where in addition shared memory is divided by the number of processes using it. For us typically rss = pss but I don't know if multi-threading can change this.
vmem is largely irrelevant, prmon just adds blindly the size of everything even if it has not been allocated. The general recommendation seems to be to just ignore this.
in /proc/self/smaps one can actually get information what library allocates how much, how much is on the heap (everything "new") and how much is allocated by mmap (shows up as anon, docs say it's mmaped - files mapped to memory).
The attached plots show the clustering based on what Christof has. cluster_orig.png is his macro instrumented with a memory dump for every event for 100 events. The libraries are negligible so I didn't implement splitting that up into single library contributions. In my understanding the heap is what we control with our code. The behavior looks reasonable with ups and downs depending on the number of hits I assume. The mmap stays flat but then jumps for a large event and stays up there, adding 1.5GB - that's bad. Giving us a total of around 4GB.
I pulled out the libraries and methods from the G4 macros and just added everything by hand (cluster_orig_dedicated.png). Looks like I save 500MB. For the heap I could argue that some superfluous code is executed but I don't understand the mmap savings. At least the features stay the same.
Now let me go back to my pet peeve, turning on output writing (where everything is saved including the large trkrhits). The heap allocation shows no structure anymore (as does the mmaped stuff) but the memory use shoots up to 6.5GB.
All this needs more investigating, this was a single process on an empty node but I do wonder if root i/o is a major part of our memory problem.
If you want to play with this (from tomorrow mornings build on), it is a singleton which you can set up in your macro (anywhere before you run the first event). Fun4All picks it up from there. The attached macro reads the text file and makes those plots
#include <fun4all/Fun4AllMonitoring.h>
Fun4AllMonitoring *moni = Fun4AllMonitoring::instance();
moni->OutFileName("clustering_orig.log");
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
Attachment:
cluster_orig_dedicated_output.png
Description: PNG image
Attachment:
cluster_orig_dedicated.png
Description: PNG image
Attachment:
cluster_orig.png
Description: PNG image
{
ifstream infile(filename);
string instring;
getline(infile, instring);
cout << instring << endl;
uint64_t heappss;
uint64_t otherpss;
uint64_t event;
uint64_t mmappss;
vector<double> heappssvec;
vector<double> otherpssvec;
vector<double> allpssvec;
vector<double> eventvec;
vector<double> mmappssvec;
while (infile >> event >> heappss >> mmappss >> otherpss)
{
cout << "event: " << event << ", heap: " << heappss << ", mmap: " << mmappss
<< ", other: " << otherpss << endl;
eventvec.push_back(event);
heappssvec.push_back(heappss/1000000.);
mmappssvec.push_back(mmappss/1000000.);
otherpssvec.push_back(otherpss/1000000.);
allpssvec.push_back(heappss/1000000. + otherpss/1000000. + mmappss/1000000.);
}
infile.close();
// draw a frame to define the range
auto mg = new TMultiGraph();
TGraph *gr = new TGraph(eventvec.size(),&eventvec[0],&heappssvec[0]);
gr->SetLineColor(2);
gr->SetLineWidth(4);
gr->SetMarkerColor(4);
gr->SetMarkerStyle(25);
gr->SetTitle("Heap/Other Memory Use");
gr->GetXaxis()->SetTitle("Event");
gr->GetYaxis()->SetTitle("Memory (GB)");
// gr->Draw("ACP");
mg->Add(gr);
TGraph *gr1 = new TGraph(eventvec.size(),&eventvec[0],&otherpssvec[0]);
gr1->SetLineColor(3);
gr1->SetLineWidth(4);
gr1->SetMarkerColor(4);
gr1->SetMarkerStyle(26);
gr1->GetXaxis()->SetTitle("Event");
gr1->GetYaxis()->SetTitle("Memory (GB)");
// gr->Draw("ACP");
mg->Add(gr1);
TGraph *gr2 = new TGraph(eventvec.size(),&eventvec[0],&mmappssvec[0]);
gr2->SetLineColor(7);
gr2->SetLineWidth(4);
gr2->SetMarkerColor(4);
gr2->SetMarkerStyle(27);
gr2->GetXaxis()->SetTitle("Event");
gr2->GetYaxis()->SetTitle("Memory (GB)");
// gr->Draw("ACP");
mg->Add(gr2);
TGraph *gr3 = new TGraph(eventvec.size(),&eventvec[0],&allpssvec[0]);
gr3->SetLineColor(6);
gr3->SetLineWidth(4);
gr3->SetMarkerColor(4);
gr3->SetMarkerStyle(24);
gr3->GetXaxis()->SetTitle("Event");
gr3->GetYaxis()->SetTitle("Memory (GB)");
// gr->Draw("ACP");
mg->Add(gr3);
mg->Draw("acp");
auto l = new TLegend(0.15,0.65,0.3,0.85,"","NDC");
l->SetBorderSize(0.);
l->SetTextFont(42);
l->AddEntry(gr3, "Sum", "lp");
l->AddEntry(gr, "Heap", "lp");
l->AddEntry(gr2, "mmap", "lp");
l->AddEntry(gr1, "libraries", "lp");
l->Draw();
}
- [Sphenix-tracking-l] memory tracking, pinkenburg, 04/22/2022
Archive powered by MHonArc 2.6.24.