Skip to Content.
Sympa Menu

sphenix-maps-l - Re: [Sphenix-maps-l] ebdc11, 12...

sphenix-maps-l AT lists.bnl.gov

Subject: sPHENIX MAPS tracker discussion

List archive

Chronological Thread  
  • From: Martin Purschke <purschke AT bnl.gov>
  • To: sphenix-mvtx-l <sphenix-maps-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-maps-l] ebdc11, 12...
  • Date: Tue, 10 Jan 2023 14:18:55 -0500

Hi Yasser,

(dropping it back to the list)

I can still give up 2 machines before I start cutting into the "x6" cohorts. So we can make one new master and another one to practice cloning.

I haven't cloned an actual RH machine - the only addition to, say, CentOS or another RH clone is the embedding into the RH "entitlement" scheme that I don't know much about. That is needed to get at the repositories etc. (Only the bufferboxes run a a real RH distro because of the HPSS interface stuff, and that was complicated for those 6 machines).

As a forward-looking question - since CERN and FNAL have jointly decided to use AlmaLinux as their standard distribution, is there incentive to do the same? If so, the time would be before we clone and waste time.

Best
Martin

On 1/10/23 14:05, Yasser Corrales Morales wrote:
Hi Martin,

I think you proposal looks fine for me. But there is something no clear for me. Do we need to wait you finish you MDC test to start setting up the second machine?

also I think we should prefer using MVTX_FLX0-5 (same name in ALF DB) for the servers but I will let Jo to comment in case he suggest a different name.

cheers.

Yasser.

On 1/10/23 11:53, Martin Purschke via sPHENIX-MAPS-l wrote:
MVTXers -

Cameron and I spoke biefly yesterday about getting more MVTX nodes up and running (first to see that all FELIXes work in their future hosts).

I'd like to hold on to the nodes in the lower rack in their current state (== OS) for a little bit longer, since I run the imminent MDC with a multiple of 6 machines for proper load-balancing (running 48, then 42 in the past as we were dedicating more machines to other tasks). That doesn't prevent us from installing the FELIXes to do the smoke test, and then start cloning the OS when the MDC is done.

There are a few issues with that cloning.

First, ebdc11 is not yet in a final state. I needs to switch to a lustre-aware kernel so it can talk to the buffer boxes (that is, lustre-mount the file system). This kernel version difference will affect the FELIX driver. seb01, for example,  has

[root@seb01 ~]# uname -a
Linux seb01 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Fri Jun 17 18:46:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

This isn't too big a deal, it just takes rebuilding the driver against this kernel version.

Second, and that's a bit more serious, ebdc11 has been installed with the Redhat system-setup defaults. It uses the devicemapper (for no apparent reason with just one NVME system disk), and that makes it basically not cloneable. I never do that for that exact reason (in addition, this defeats or at least super-complicates any attempt to get at the system disk from a rescue system to fix a problem when the machine won't boot).

So I'd say that instead of cloning we make a new machine the right way with lustre support and all, see that it works just like ebc11 + lustre etc, and then we start cloning from that master then. Once we are happy, we re-clone ebdc11 that we can keep as a reference until last.

BTW, we should also revisit the host naming / numbering scheme since I just pulled the number 11 out of a hat back then. We don't have to call them ebdc if a different name makes more sense and avoids confusion.

Best,

    Martin


--
Yasser Corrales Morales

Staff Scientist at the Relavistic Heavy Ion Group

Laboratory for Nuclear Science

Massachusetts Institute of Technology



--
Martin L. Purschke, Ph.D. ; purschke AT bnl.gov
; http://www.phenix.bnl.gov/~purschke
;
Brookhaven National Laboratory ; phone: +1-631-344-5244
Physics Department Bldg 510 C ; fax: +1-631-344-3253
Upton, NY 11973-5000 ; skype: mpurschke
-----------------------------------------------------------------------




Archive powered by MHonArc 2.6.24.

Top of Page