Skip to Content.
Sympa Menu

sphenix-tracking-l - [Sphenix-tracking-l] running job A on 2GB/core

sphenix-tracking-l AT lists.bnl.gov

Subject: sPHENIX tracking discussion

List archive

Chronological Thread  
  • From: pinkenburg <pinkenburg AT bnl.gov>
  • To: sphenix-tracking <sphenix-tracking-l AT lists.bnl.gov>
  • Subject: [Sphenix-tracking-l] running job A on 2GB/core
  • Date: Fri, 10 Jun 2022 14:47:10 -0400

Hi folks,

I got the 2GB/core test node. It's an older machine with 24 logical cores. Our current job A's sit comfortably between 1.5 and 1.7GB, once the copying of the input files is done, the swap daemon goes back to sleep and the jobs make full use of the cpu:

29130 sphnxpro  20   0 4812960   1.6g  11916 R 100.0  3.4   7:15.86 root.exe
29192 sphnxpro  20   0 4744488   1.5g   8860 R 100.0  3.3   7:11.88 root.exe
29286 sphnxpro  20   0 4814684   1.6g   6960 R 100.0  3.4   6:40.13 root.exe
29301 sphnxpro  20   0 4805920   1.5g   8680 R 100.0  3.3   6:39.17 root.exe
29402 sphnxpro  20   0 4717164   1.5g  13112 R 100.0  3.2   6:24.74 root.exe
29598 sphnxpro  20   0 4709088   1.5g  32476 R 100.0  3.3   5:19.09 root.exe
29632 sphnxpro  20   0 4774348   1.5g  30968 R 100.0  3.2   5:16.41 root.exe
29871 sphnxpro  20   0 4763380   1.5g  32488 R 100.0  3.3   5:13.16 root.exe
29896 sphnxpro  20   0 4868604   1.7g  33208 R 100.0  3.6   5:16.76 root.exe
30113 sphnxpro  20   0 4801508   1.6g  59212 R 100.0  3.5   4:01.72 root.exe
30218 sphnxpro  20   0 4647632   1.5g  60536 R 100.0  3.2   4:03.74 root.exe
30328 sphnxpro  20   0 4733160   1.6g  60520 R 100.0  3.3   4:06.29 root.exe
24952 sphnxpro  20   0 4815524   1.3g   6944 R  99.7  2.8  15:01.31 root.exe
28995 sphnxpro  20   0 4764232   1.5g  13212 R  99.7  3.3   7:25.74 root.exe
29574 sphnxpro  20   0 4772152   1.5g  32632 R  99.7  3.2   5:15.08 root.exe
30264 sphnxpro  20   0 4774560   1.6g  60756 R  99.3  3.4   4:02.89 root.exe
30860 sphnxpro  20   0 4467088   1.3g  94504 R  99.3  2.9   2:09.04 root.exe
29171 sphnxpro  20   0 4785864   1.6g   6960 R  99.0  3.4   7:07.58 root.exe
29769 sphnxpro  20   0 4682344   1.5g  30956 R  98.7  3.1   5:11.67 root.exe
30404 sphnxpro  20   0 4762580   1.6g  60776 R  98.7  3.5   4:03.98 root.exe
29509 sphnxpro  20   0 4878296   1.7g  33248 R  95.7  3.6   5:19.22 root.exe
30262 sphnxpro  20   0 4773848   1.6g  60136 R  93.4  3.4   4:01.12 root.exe
29542 sphnxpro  20   0 4665276   1.4g  33316 R  87.4  3.1   5:18.91 root.exe
30379 sphnxpro  20   0 4792176   1.6g  59192 R  83.1  3.5   4:02.14 root.exe

All older nodes have regular hard disks, so the reading takes time which leads to <100% cpu for some jobs. I let it run for 10k jobs to see how this holds up.

Not sure if this is visible for everybody, if you want to see how this is doing, the grafana page for spool 0346 is:
https://monitoring.sdcc.bnl.gov/grafana/d/000000026/linux-farm-collectd?orgId=1&refresh=1m&var-Experiment=spool&var-Hostname=spool0346_sdcc_bnl_gov&var-Interface=eth1&from=now-2d&to=now

Chris

--
*************************************************************

Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
; http://www.phenix.bnl.gov/~pinkenbu

Brookhaven National Laboratory ; phone: (631) 344-5692
Physics Department Bldg 510 C ; fax: (631) 344-3253
Upton, NY 11973-5000

*************************************************************





Archive powered by MHonArc 2.6.24.

Top of Page