atlas-connect-l AT lists.bnl.gov
Subject: Atlas-connect-l mailing list
List archive
[Atlas-connect-l] Held ATLAS Connect jobs OOMing workers
- From: Lincoln Bryant <lincolnb AT uchicago.edu>
- To: Matt LeBlanc <matt.leblanc AT cern.ch>
- Cc: atlas-connect-l <atlas-connect-l AT lists.bnl.gov>
- Subject: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers
- Date: Thu, 26 Mar 2020 18:13:28 +0000
Hi Matt,
Apologies but I've held your jobs on ATLAS Connect because they seem to be OOMing our worker nodes.
From the PS tree of a worker:
ruc.mwt2 198552 9.1 82.7 3956304672 162588084 ? Rl 12:47 1:59 | | \_ eventloop_batch_worker 96 ./config.root
thats 150GiB RSS memory for that job.
We have limited access to the machine room right now due to COVID19-related lockdown at the University, so I immediately held all of your jobs to prevent any other workers from rebooting.
Could you look into the memory utilization when you get a chance?
Thanks,
Lincoln
-
[Atlas-connect-l] Held ATLAS Connect jobs OOMing workers,
Lincoln Bryant, 03/26/2020
-
Re: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers,
Matt LeBlanc, 03/26/2020
-
Re: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers,
Matt LeBlanc, 03/27/2020
- Re: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers, Matt LeBlanc, 03/27/2020
-
Re: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers,
Matt LeBlanc, 03/27/2020
-
Re: [Atlas-connect-l] Held ATLAS Connect jobs OOMing workers,
Matt LeBlanc, 03/26/2020
Archive powered by MHonArc 2.6.24.