Skip to Content.
Sympa Menu

sphenix-maps-l - Re: [Sphenix-maps-l] [EXTERNAL] ebdc11, 12...

sphenix-maps-l AT lists.bnl.gov

Subject: sPHENIX MAPS tracker discussion

List archive

Chronological Thread  
  • From: "Schambach, Jo" <schambachjj AT ornl.gov>
  • To: "Purschke, Martin" <purschke AT bnl.gov>, sphenix-mvtx-l <sphenix-maps-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-maps-l] [EXTERNAL] ebdc11, 12...
  • Date: Tue, 10 Jan 2023 19:21:18 +0000

Hi Martin,
That sounds OK with me. I will be at BNL starting next week, so we can get
started on that.
Thanks,
Jo

-----Original Message-----
From: sPHENIX-MAPS-l <sphenix-maps-l-bounces AT lists.bnl.gov> On Behalf Of
Martin Purschke via sPHENIX-MAPS-l
Sent: Tuesday, January 10, 2023 1:54 PM
To: sphenix-mvtx-l <sphenix-maps-l AT lists.bnl.gov>
Subject: [EXTERNAL] [Sphenix-maps-l] ebdc11, 12...

MVTXers -

Cameron and I spoke biefly yesterday about getting more MVTX nodes up and
running (first to see that all FELIXes work in their future hosts).

I'd like to hold on to the nodes in the lower rack in their current state (==
OS) for a little bit longer, since I run the imminent MDC with a multiple of
6 machines for proper load-balancing (running 48, then 42 in the past as we
were dedicating more machines to other tasks). That doesn't prevent us from
installing the FELIXes to do the smoke test, and then start cloning the OS
when the MDC is done.

There are a few issues with that cloning.

First, ebdc11 is not yet in a final state. I needs to switch to a
lustre-aware kernel so it can talk to the buffer boxes (that is, lustre-mount
the file system). This kernel version difference will affect the FELIX
driver. seb01, for example, has

> [root@seb01 ~]# uname -a
> Linux seb01 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Fri Jun 17
> 18:46:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

This isn't too big a deal, it just takes rebuilding the driver against this
kernel version.

Second, and that's a bit more serious, ebdc11 has been installed with the
Redhat system-setup defaults. It uses the devicemapper (for no apparent
reason with just one NVME system disk), and that makes it basically not
cloneable. I never do that for that exact reason (in addition, this defeats
or at least super-complicates any attempt to get at the system disk from a
rescue system to fix a problem when the machine won't boot).

So I'd say that instead of cloning we make a new machine the right way with
lustre support and all, see that it works just like ebc11 + lustre etc, and
then we start cloning from that master then. Once we are happy, we re-clone
ebdc11 that we can keep as a reference until last.

BTW, we should also revisit the host naming / numbering scheme since I just
pulled the number 11 out of a hat back then. We don't have to call them ebdc
if a different name makes more sense and avoids confusion.

Best,

Martin


--
Martin L. Purschke, Ph.D. ; purschke AT bnl.gov
;
https://urldefense.com/v3/__https://gcc02.safelinks.protection.outlook.com/?url=http*3A*2F*2Fwww.phenix.bnl.gov*2F*purschke&data=05*7C01*7Cschambachjj*40ornl.gov*7Cc025734b56c041c0914708daf33c27a1*7Cdb3dbd434c4b45449f8a0553f9f5f25e*7C1*7C0*7C638089736942048214*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*7C*7C*7C&sdata=clTJpBZzNPr0Ivv6QqGCXAtkqDid46bLhDBqUp5VzRM*3D&reserved=0__;JSUlJX4lJSUlJSUlJSUlJSUlJSUl!!P4SdNyxKAPE!AGnItIDNzE91skASrIWh3scWKcecb2IKoJp7rSyPbz6uU3irBmDtP4I3_Mu129kpzzbAM3DGaoQvO7JqIwBHCfHByzEzOmEDvg$

;
Brookhaven National Laboratory ; phone: +1-631-344-5244
Physics Department Bldg 510 C ; fax: +1-631-344-3253
Upton, NY 11973-5000 ; skype: mpurschke
-----------------------------------------------------------------------
_______________________________________________
sPHENIX-MAPS-l mailing list
sPHENIX-MAPS-l AT lists.bnl.gov
https://urldefense.com/v3/__https://gcc02.safelinks.protection.outlook.com/?url=https*3A*2F*2Flists.bnl.gov*2Fmailman*2Flistinfo*2Fsphenix-maps-l&data=05*7C01*7Cschambachjj*40ornl.gov*7Cc025734b56c041c0914708daf33c27a1*7Cdb3dbd434c4b45449f8a0553f9f5f25e*7C1*7C0*7C638089736942048214*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000*7C*7C*7C&sdata=OrwmxQGepPbCTzNBmjOFdjUutERGesYh5j5WnkSv5EE*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJQ!!P4SdNyxKAPE!AGnItIDNzE91skASrIWh3scWKcecb2IKoJp7rSyPbz6uU3irBmDtP4I3_Mu129kpzzbAM3DGaoQvO7JqIwBHCfHByzG4nIURDg$





Archive powered by MHonArc 2.6.24.

Top of Page