sphenix-tracking-l AT lists.bnl.gov
Subject: sPHENIX tracking discussion
List archive
[Sphenix-tracking-l] Fw: running job A on 2GB/core
- From: Anthony Frawley <afrawley AT fsu.edu>
- To: "sphenix-tracking-l AT lists.bnl.gov" <sphenix-tracking-l AT lists.bnl.gov>
- Subject: [Sphenix-tracking-l] Fw: running job A on 2GB/core
- Date: Mon, 13 Jun 2022 14:27:22 +0000
From: pinkenburg <pinkenburg AT bnl.gov>
Sent: Sunday, June 12, 2022 11:26 AM
To: Anthony Frawley <afrawley AT fsu.edu>
Subject: Re: [Sphenix-tracking-l] running job A on 2GB/core
Sent: Sunday, June 12, 2022 11:26 AM
To: Anthony Frawley <afrawley AT fsu.edu>
Subject: Re: [Sphenix-tracking-l] running job A on 2GB/core
Hi Tony,
the first jobs finished by now (logs in /sphenix/sim/sim01/sphnxpro/mdc2/logs/shijing_hepmc/fm_0_20/pass4_jobA/log), the old hardware is job 0-9999, the new hardware job 10000-19999 but there is no big difference. PHSiliconTpcTrackMatching is the big one:
MakeActsGeometry_TOP: accumulated time (ms): 0.524259
MakeActsGeometry_TOP: per event time (ms): 0.00130089
-------------------------------------- ** --------------------------------------
PHActsSiliconSeeding_TOP: accumulated time (ms): 1.15422e+06
PHActsSiliconSeeding_TOP: per event time (ms): 2864.06
-------------------------------------- ** --------------------------------------
PHActsTrkFitter_TOP: accumulated time (ms): 301.048
PHActsTrkFitter_TOP: per event time (ms): 0.747019
-------------------------------------- ** --------------------------------------
PHCASeeding_TOP: accumulated time (ms): 3.02426e+06
PHCASeeding_TOP: per event time (ms): 7504.36
-------------------------------------- ** --------------------------------------
PHMicromegasTpcTrackMatching_TOP: accumulated time (ms): 162.668
PHMicromegasTpcTrackMatching_TOP: per event time (ms): 0.403642
-------------------------------------- ** --------------------------------------
PHSiliconSeedMerger_TOP: accumulated time (ms): 1.29976e+06
PHSiliconSeedMerger_TOP: per event time (ms): 3225.21
-------------------------------------- ** --------------------------------------
PHSiliconTpcTrackMatching_TOP: accumulated time (ms): 1.25269e+08
PHSiliconTpcTrackMatching_TOP: per event time (ms): 310842
-------------------------------------- ** --------------------------------------
PHSimpleKFProp_TOP: accumulated time (ms): 2.69038e+06
PHSimpleKFProp_TOP: per event time (ms): 6675.89
-------------------------------------- ** --------------------------------------
PHTpcDeltaZCorrection_TOP: accumulated time (ms): 178434
PHTpcDeltaZCorrection_TOP: per event time (ms): 442.763
-------------------------------------- ** --------------------------------------
PHTpcResiduals_TOP: accumulated time (ms): 4.37003
PHTpcResiduals_TOP: per event time (ms): 0.0108438
-------------------------------------- ** --------------------------------------
TpcLoadDistortionCorrection_TOP: accumulated time (ms): 0.463211
TpcLoadDistortionCorrection_TOP: per event time (ms): 0.00114941
I copied the first 10 input segments from job0 to gpfs. The macros are in
https://github.com/sPHENIX-Collaboration/MDC2/tree/main/submit/fm_0_20/pass4_jobA/rundir
the first jobs finished by now (logs in /sphenix/sim/sim01/sphnxpro/mdc2/logs/shijing_hepmc/fm_0_20/pass4_jobA/log), the old hardware is job 0-9999, the new hardware job 10000-19999 but there is no big difference. PHSiliconTpcTrackMatching is the big one:
MakeActsGeometry_TOP: accumulated time (ms): 0.524259
MakeActsGeometry_TOP: per event time (ms): 0.00130089
-------------------------------------- ** --------------------------------------
PHActsSiliconSeeding_TOP: accumulated time (ms): 1.15422e+06
PHActsSiliconSeeding_TOP: per event time (ms): 2864.06
-------------------------------------- ** --------------------------------------
PHActsTrkFitter_TOP: accumulated time (ms): 301.048
PHActsTrkFitter_TOP: per event time (ms): 0.747019
-------------------------------------- ** --------------------------------------
PHCASeeding_TOP: accumulated time (ms): 3.02426e+06
PHCASeeding_TOP: per event time (ms): 7504.36
-------------------------------------- ** --------------------------------------
PHMicromegasTpcTrackMatching_TOP: accumulated time (ms): 162.668
PHMicromegasTpcTrackMatching_TOP: per event time (ms): 0.403642
-------------------------------------- ** --------------------------------------
PHSiliconSeedMerger_TOP: accumulated time (ms): 1.29976e+06
PHSiliconSeedMerger_TOP: per event time (ms): 3225.21
-------------------------------------- ** --------------------------------------
PHSiliconTpcTrackMatching_TOP: accumulated time (ms): 1.25269e+08
PHSiliconTpcTrackMatching_TOP: per event time (ms): 310842
-------------------------------------- ** --------------------------------------
PHSimpleKFProp_TOP: accumulated time (ms): 2.69038e+06
PHSimpleKFProp_TOP: per event time (ms): 6675.89
-------------------------------------- ** --------------------------------------
PHTpcDeltaZCorrection_TOP: accumulated time (ms): 178434
PHTpcDeltaZCorrection_TOP: per event time (ms): 442.763
-------------------------------------- ** --------------------------------------
PHTpcResiduals_TOP: accumulated time (ms): 4.37003
PHTpcResiduals_TOP: per event time (ms): 0.0108438
-------------------------------------- ** --------------------------------------
TpcLoadDistortionCorrection_TOP: accumulated time (ms): 0.463211
TpcLoadDistortionCorrection_TOP: per event time (ms): 0.00114941
I copied the first 10 input segments from job0 to gpfs. The macros are in
https://github.com/sPHENIX-Collaboration/MDC2/tree/main/submit/fm_0_20/pass4_jobA/rundir
The Fun4All_G4_sPHENIX_jobA.C runs out of the box, you just need to set the Enable::Production to false (leaving it true will likely complain when it tries to copy the output onto itself)
Chris
On 6/11/2022 3:10 PM, Anthony Frawley wrote:
Chris
On 6/11/2022 3:10 PM, Anthony Frawley wrote:
Hi Chris,
I ran a test with pp pileup events (which I can do conveniently). I got 6.7 s per event (including only the processes that run in Job A). That seems OK, but it is not the same thing as AuAu. I would need to see the time numbers from the log for the AuAu case for clues as to what is going on.
Tony
From: pinkenburg <pinkenburg AT bnl.gov>
Sent: Saturday, June 11, 2022 12:18 PM
To: Anthony Frawley <afrawley AT fsu.edu>
Subject: Re: [Sphenix-tracking-l] running job A on 2GB/coreHi Tony,
I don't want to send this to the list. On the negative side - we seem to have taken a massive hit in terms of cpu. What I am seeing is that no jobs has finished yet and just dividing the running time by the number of processed events gives 490sec/event. That node is older and I am now running some comparisons on the new hardware now but just from tailing the log - we are in the many minutes/evt range (with the jobs getting close to 100% cpu)
Chris
On 6/10/2022 3:57 PM, Anthony Frawley wrote:
Hi Chris,
Thanks, this is good news. It shows that we will have flexibility in the machines we use for all of our job types, including the most time consuming one (Job A).
Tony
From: sPHENIX-tracking-l <sphenix-tracking-l-bounces AT lists.bnl.gov> on behalf of pinkenburg via sPHENIX-tracking-l <sphenix-tracking-l AT lists.bnl.gov>
Sent: Friday, June 10, 2022 2:47 PM
To: sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
Subject: [Sphenix-tracking-l] running job A on 2GB/coreHi folks,
I got the 2GB/core test node. It's an older machine with 24 logical
cores. Our current job A's sit comfortably between 1.5 and 1.7GB, once
the copying of the input files is done, the swap daemon goes back to
sleep and the jobs make full use of the cpu:
29130 sphnxpro 20 0 4812960 1.6g 11916 R 100.0 3.4 7:15.86 root.exe
29192 sphnxpro 20 0 4744488 1.5g 8860 R 100.0 3.3 7:11.88 root.exe
29286 sphnxpro 20 0 4814684 1.6g 6960 R 100.0 3.4 6:40.13 root.exe
29301 sphnxpro 20 0 4805920 1.5g 8680 R 100.0 3.3 6:39.17 root.exe
29402 sphnxpro 20 0 4717164 1.5g 13112 R 100.0 3.2 6:24.74 root.exe
29598 sphnxpro 20 0 4709088 1.5g 32476 R 100.0 3.3 5:19.09 root.exe
29632 sphnxpro 20 0 4774348 1.5g 30968 R 100.0 3.2 5:16.41 root.exe
29871 sphnxpro 20 0 4763380 1.5g 32488 R 100.0 3.3 5:13.16 root.exe
29896 sphnxpro 20 0 4868604 1.7g 33208 R 100.0 3.6 5:16.76 root.exe
30113 sphnxpro 20 0 4801508 1.6g 59212 R 100.0 3.5 4:01.72 root.exe
30218 sphnxpro 20 0 4647632 1.5g 60536 R 100.0 3.2 4:03.74 root.exe
30328 sphnxpro 20 0 4733160 1.6g 60520 R 100.0 3.3 4:06.29 root.exe
24952 sphnxpro 20 0 4815524 1.3g 6944 R 99.7 2.8 15:01.31 root.exe
28995 sphnxpro 20 0 4764232 1.5g 13212 R 99.7 3.3 7:25.74 root.exe
29574 sphnxpro 20 0 4772152 1.5g 32632 R 99.7 3.2 5:15.08 root.exe
30264 sphnxpro 20 0 4774560 1.6g 60756 R 99.3 3.4 4:02.89 root.exe
30860 sphnxpro 20 0 4467088 1.3g 94504 R 99.3 2.9 2:09.04 root.exe
29171 sphnxpro 20 0 4785864 1.6g 6960 R 99.0 3.4 7:07.58 root.exe
29769 sphnxpro 20 0 4682344 1.5g 30956 R 98.7 3.1 5:11.67 root.exe
30404 sphnxpro 20 0 4762580 1.6g 60776 R 98.7 3.5 4:03.98 root.exe
29509 sphnxpro 20 0 4878296 1.7g 33248 R 95.7 3.6 5:19.22 root.exe
30262 sphnxpro 20 0 4773848 1.6g 60136 R 93.4 3.4 4:01.12 root.exe
29542 sphnxpro 20 0 4665276 1.4g 33316 R 87.4 3.1 5:18.91 root.exe
30379 sphnxpro 20 0 4792176 1.6g 59192 R 83.1 3.5 4:02.14 root.exe
All older nodes have regular hard disks, so the reading takes time which
leads to <100% cpu for some jobs. I let it run for 10k jobs to see how
this holds up.
Not sure if this is visible for everybody, if you want to see how this
is doing, the grafana page for spool 0346 is:
https://urldefense.com/v3/__https://monitoring.sdcc.bnl.gov/grafana/d/000000026/linux-farm-collectd?orgId=1&refresh=1m&var-Experiment=spool&var-Hostname=spool0346_sdcc_bnl_gov&var-Interface=eth1&from=now-2d&to=now__;!!PhOWcWs!2wAZu_j0vW7oSLYgm_JQABoz3NdT-J2Azxqu5lxKAaunpaJXn46zA_nFlnk-dINF46rZehMSCB4Dk889K95afOS7b2yuRCfrug$
Chris
--
*************************************************************
Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; https://urldefense.com/v3/__http://www.phenix.bnl.gov/*pinkenbu__;fg!!PhOWcWs!2wAZu_j0vW7oSLYgm_JQABoz3NdT-J2Azxqu5lxKAaunpaJXn46zA_nFlnk-dINF46rZehMSCB4Dk889K95afOS7b2y7jFrZPA$
Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000
*************************************************************
_______________________________________________
sPHENIX-tracking-l mailing list
sPHENIX-tracking-l AT lists.bnl.gov
https://urldefense.com/v3/__https://lists.bnl.gov/mailman/listinfo/sphenix-tracking-l__;!!PhOWcWs!2wAZu_j0vW7oSLYgm_JQABoz3NdT-J2Azxqu5lxKAaunpaJXn46zA_nFlnk-dINF46rZehMSCB4Dk889K95afOS7b2y4sPbh-A$
-- ************************************************************* Christopher H. Pinkenburg ; pinkenburg AT bnl.gov ; http://www.phenix.bnl.gov/~pinkenbu Brookhaven National Laboratory ; phone: (631) 344-5692 Physics Department Bldg 510 C ; fax: (631) 344-3253 Upton, NY 11973-5000 *************************************************************
-- ************************************************************* Christopher H. Pinkenburg ; pinkenburg AT bnl.gov ; http://www.phenix.bnl.gov/~pinkenbu Brookhaven National Laboratory ; phone: (631) 344-5692 Physics Department Bldg 510 C ; fax: (631) 344-3253 Upton, NY 11973-5000 *************************************************************
-
[Sphenix-tracking-l] running job A on 2GB/core,
pinkenburg, 06/10/2022
-
Re: [Sphenix-tracking-l] running job A on 2GB/core,
Anthony Frawley, 06/10/2022
-
Message not available
-
Message not available
-
Message not available
- [Sphenix-tracking-l] Fw: running job A on 2GB/core, Anthony Frawley, 06/13/2022
-
Message not available
-
Message not available
-
Message not available
-
Re: [Sphenix-tracking-l] running job A on 2GB/core,
Anthony Frawley, 06/10/2022
Archive powered by MHonArc 2.6.24.