Skip to Content.
Sympa Menu

atlas-connect-l - [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Lincoln Bryant <lincolnb AT uchicago.edu>
  • To: Matt LeBlanc <matt.leblanc AT cern.ch>
  • Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
  • Subject: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers
  • Date: Thu, 26 Mar 2020 18:13:28 +0000

Hi Matt,

Apologies but I've held your jobs on ATLAS Connect because they seem to be OOMing our worker nodes.

From the PS tree of a worker:
ruc.mwt2 198552  9.1 82.7 3956304672 162588084 ? Rl  12:47   1:59      |                   |       \_ eventloop_batch_worker 96 ./config.root

thats 150GiB RSS memory for that job. 

We have limited access to the machine room right now due to COVID19-related lockdown at the University, so I immediately held all of your jobs to prevent any other workers from rebooting.

Could you look into the memory utilization when you get a chance?

Thanks,
Lincoln



Archive powered by MHonArc 2.6.24.

Top of Page