Skip to Content.
Sympa Menu

sphenix-tracking-l - Re: [Sphenix-tracking-l] [EXTERNAL] We have a timing problem...

sphenix-tracking-l AT lists.bnl.gov

Subject: sPHENIX tracking discussion

List archive

Chronological Thread  
  • From: Christof Roland <christof.roland AT cern.ch>
  • To: pinkenburg <pinkenburg AT bnl.gov>
  • Cc: sphenix-tracking-l AT lists.bnl.gov
  • Subject: Re: [Sphenix-tracking-l] [EXTERNAL] We have a timing problem...
  • Date: Wed, 3 Nov 2021 17:32:26 +0100

Hi Chris, Joe, 

i tried to run an event that run suspiciously long through callgrind, 
but the graph I get from the output.dot file is corrupted. You can display it as is
and as soon as you zoom in to make sense of the connections the figure erases itself. 
Highly annoying. 

A few timers and print statements give a little more insight on the PHActsSiliconSeeding at least:
  
Finished PHActsSiliconSeeding process_event
PHActsSiliconSeeding Acts seed time 15.8078
PHActsSiliconSeeding circle fit time 1186.91
PHActsSiliconSeeding total event time 1202.72

the circle fit runs ~110 fits in ~10ms per fit approximately. 10ms for a circle fit is huge. 
A full ACTS fit is supposed to be done in a fraction of a millisecond.
So smething seems to get stuck in the clrcle fitting. 

Also the distribution of processing time is not what it used to be. It has a low most probable value
and massive outliers, see figure below.

PHActsSiliconSeeding Time per event in seconds:


On 3. Nov 2021, at 15:18, pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov> wrote:

Hi,

in the meantime I got at least some of the MB sims ready for tracking, the logs with the timer printouts at the end are
/sphenix/sim/sim01/sphnxpro/MDC2/sHijing_HepMC/fm_0_20/Tracks/log/condor-0000000002-*.out
It uses the mdc2.2 build which is fairly recent.

By far the biggest contributor is PHActsSiliconSeeding which takes many minutes. You can filter the time per event via this:

cat /sphenix/sim/sim01/sphnxpro/MDC2/sHijing_HepMC/fm_0_20/Tracks/log/condor-0000000002-*.out | grep 'PHActsSiliconSeeding_TOP: per event time'

Chris

On 11/2/2021 10:38 AM, Christof Roland via sPHENIX-tracking-l wrote:
Hi, 

the effect is big enough so we should be able to figure it out with the profiler
or some timers in the code. let me have a look. 

cheers

   Christof

On 2. Nov 2021, at 15:33, Osborn, Joe <osbornjd AT ornl.gov> wrote:

Hi Christof,
 
Before I went on family leave, I was wondering if the local coordinates PR was the cause of this also. However, I had run jobs with a previous ana build (before this PR was merged) and found the same results as after the PR was merged.
 
There is some improvement that can be made to the local coordinates storage code by e.g. storing the clusters as a map of <surfaceID, cluster> so that we don’t have to do the surface lookup each time, but I don’t think this would cause such a significant slow down.
 
Joe
 
---------------------------
  Joe Osborn, Ph.D. Associate Research Scientist Oak Ridge National Laboratory osbornjd AT ornl.gov (859)-433-8738     From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on behalf of Christof Roland via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Date: Tuesday, November 2, 2021 at 9:42 AM
To: Anthony Frawley via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Subject: [EXTERNAL] [Sphenix-tracking-l] We have a timing problem...
Hi Everybody, 

following up on the discussion on timeing performance of our current code 
I ran a few benchmark jobs. 1000 jobs one event each hijing 0-20 + 50kHz of pileup.
These are jobs submitted from sphnx02 so I assume even if there are a few 
slow machines this will not dominate the results. 

Time per event is here:
InttClusterizer                                 1000 0.014891 sec
MvtxClusterizer                         1000 0.088210 sec
TpcClusterizer                                  1000 2.677374 sec
TpcClusterCleaner                               1000 0.045161 sec
PHActsSiliconSeeding                    1000 509.086792 sec
PHActsVertexPropagator          1000 0.044235 sec
PHCASeeding                                     1000 3.974943 sec
PHSimpleKFProp                          1000 3.371190 sec
PrePropagatorPHTpcTrackSeedCircleFit    1000 0.108033 sec
PHTpcTrackSeedCircleFit         1000 0.107930 sec
PHTrackCleaner                          1000 0.007340 sec
PHGhostRejection                                1000 0.211178 sec
PHSiliconTpcTrackMatching               1000 1.440864 sec
PHActsFirstTrkFitter                    1000 2.409158 sec
PHSimpleVertexFinder                    1000 0.044909 sec
PHRaveVertexing                         1000 0.409105 sec
PHGenFitTrackProjection         1000 0.000386 sec
SvtxEvaluator                                   1000 418.793579 sec

It looks like all modules touching the actual hits are horribly slow now. 
This looks like in our recent changes to the local coordinate storage we introduced 
an inefficient loop to look up things. 

I'll try to run a job through callgrind now to see if I can trace this down some more. 

cheers

   Christof 

_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l


_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l

-- 
*************************************************************

Christopher H. Pinkenburg	;    pinkenburg AT bnl.gov
				;    http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory	;    phone: (631) 344-5692
Physics Department Bldg 510 C	;    fax:   (631) 344-3253
Upton, NY 11973-5000

*************************************************************
_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l



Archive powered by MHonArc 2.6.24.

Top of Page