Skip to Content.
Sympa Menu

sphenix-maps-l - Re: [Sphenix-maps-l] ebdc11, 12...

sphenix-maps-l AT lists.bnl.gov

Subject: sPHENIX MAPS tracker discussion

List archive

Chronological Thread  
  • From: Ming Liu <ming AT bnl.gov>
  • To: Martin Purschke <purschke AT bnl.gov>, sphenix-mvtx-l <sphenix-maps-l AT lists.bnl.gov>
  • Cc: "Kim, Andrey" <andrey.kim AT uconn.edu>
  • Subject: Re: [Sphenix-maps-l] ebdc11, 12...
  • Date: Tue, 10 Jan 2023 13:59:33 -0500

Thank you Martin, that makes sense.

We have two helpers (Andrey and Jared) in January, they both leave ~Feb 4, so
it would great if we can do the FLX server smoke test before that.

Ming



-----


Dr. Ming Xiong Liu
P-3, MS H846
Physics Division

Office: 505.667.7125
Mobile: 505.412.7396
Los Alamos National Laboratory





On 1/10/23, 1:54 PM, "sPHENIX-MAPS-l on behalf of Martin Purschke via
sPHENIX-MAPS-l" <sphenix-maps-l-bounces AT lists.bnl.gov
<mailto:sphenix-maps-l-bounces AT lists.bnl.gov> on behalf of
sphenix-maps-l AT lists.bnl.gov <mailto:sphenix-maps-l AT lists.bnl.gov>> wrote:


MVTXers -


Cameron and I spoke biefly yesterday about getting more MVTX nodes up
and running (first to see that all FELIXes work in their future hosts).


I'd like to hold on to the nodes in the lower rack in their current
state (== OS) for a little bit longer, since I run the imminent MDC with
a multiple of 6 machines for proper load-balancing (running 48, then 42
in the past as we were dedicating more machines to other tasks). That
doesn't prevent us from installing the FELIXes to do the smoke test, and
then start cloning the OS when the MDC is done.


There are a few issues with that cloning.


First, ebdc11 is not yet in a final state. I needs to switch to a
lustre-aware kernel so it can talk to the buffer boxes (that is,
lustre-mount the file system). This kernel version difference will
affect the FELIX driver. seb01, for example, has


> [root@seb01 ~]# uname -a
> Linux seb01 3.10.0-1160.49.1.el7_lustre.x86_64 #1 SMP Fri Jun 17 18:46:08
> UTC 2022 x86_64 x86_64 x86_64 GNU/Linux


This isn't too big a deal, it just takes rebuilding the driver against
this kernel version.


Second, and that's a bit more serious, ebdc11 has been installed with
the Redhat system-setup defaults. It uses the devicemapper (for no
apparent reason with just one NVME system disk), and that makes it
basically not cloneable. I never do that for that exact reason (in
addition, this defeats or at least super-complicates any attempt to get
at the system disk from a rescue system to fix a problem when the
machine won't boot).


So I'd say that instead of cloning we make a new machine the right way
with lustre support and all, see that it works just like ebc11 + lustre
etc, and then we start cloning from that master then. Once we are happy,
we re-clone ebdc11 that we can keep as a reference until last.


BTW, we should also revisit the host naming / numbering scheme since I
just pulled the number 11 out of a hat back then. We don't have to call
them ebdc if a different name makes more sense and avoids confusion.


Best,


Martin




--
Martin L. Purschke, Ph.D. ; purschke AT bnl.gov <mailto:purschke AT bnl.gov>
; http://www.phenix.bnl.gov/~purschke <http://www.phenix.bnl.gov/~purschke>
;
Brookhaven National Laboratory ; phone: +1-631-344-5244
Physics Department Bldg 510 C ; fax: +1-631-344-3253
Upton, NY 11973-5000 ; skype: mpurschke
-----------------------------------------------------------------------
_______________________________________________
sPHENIX-MAPS-l mailing list
sPHENIX-MAPS-l AT lists.bnl.gov <mailto:sPHENIX-MAPS-l AT lists.bnl.gov>
https://lists.bnl.gov/mailman/listinfo/sphenix-maps-l
<https://lists.bnl.gov/mailman/listinfo/sphenix-maps-l>








Archive powered by MHonArc 2.6.24.

Top of Page