Skip to Content.
Sympa Menu

phys-npps-mgmt-l - Re: [Phys-npps-mgmt-l] Rucio for ePIC

phys-npps-mgmt-l AT lists.bnl.gov

Subject: NPPS Leadership Team

List archive

Chronological Thread  
  • From: Torre Wenaus <wenaus AT gmail.com>
  • To: "Van Buren, Gene" <gene AT bnl.gov>
  • Cc: Torre Wenaus via Phys-npps-mgmt-l <phys-npps-mgmt-l AT lists.bnl.gov>
  • Subject: Re: [Phys-npps-mgmt-l] Rucio for ePIC
  • Date: Fri, 28 Apr 2023 17:42:35 -0400

Thanks Gene. I suppose this applies to any data access layer sitting over large servers, but with xrootd you have other options.
  Torre

On Fri, Apr 28, 2023 at 5:39 PM Van Buren, Gene <gene AT bnl.gov> wrote:
Hi, all

I'm unfamiliar with a lot of these arguments, but STAR makes a lot of use of xrootd and I figured I'd share one of the xrootd practices of SDCC that STAR isn't thrilled about (and Jerome is with me on this)....

STAR has had a mix of distributed computing nodes in xrootd plus a few larger dedicated xrootd servers. The larger servers are probably easier to maintain, but carry two main risks for us:
1) When a larger server node is having issues (e.g. network load or other slowness), many users and their jobs suffer instead of a few.
2) When a larger server loses data (fails, node goes down), a lot of data is lost instead of a little.

The SDCC is unfortunately pushing us towards the larger servers moving forward.

-Gene

> On Apr 28, 2023, at 12:20 PM, Torre Wenaus via Phys-npps-mgmt-l <phys-npps-mgmt-l AT lists.bnl.gov> wrote:
>
> ah, I forgot that one on the list I'm assembling, thanks Brett
>
> On Fri, Apr 28, 2023 at 12:19 PM Brett Viren <bv AT bnl.gov> wrote:
> Thanks Kolja,
>
> I wonder if the S3 solution also provides the "read-ahead" feature that
> makes XrootD so effective at latency hiding.
>
> -Brett.
>
> Kolja Kauder <kkauder AT gmail.com> writes:
>
> > I _think_ Jerome essentially was just intrigued by the new and shiny S3 at the beginning. Work on adding an
> > XrootD layer did happen but stalled for reasons I'm not privy to and was put "on pause" in July '22.
> > Importantly, the following is an inference, not a thing I've actually heard, but I strongly suspect that SDCC
> > will point out that S3 has been working well for ECCE, ATHENA, and now ePIC, so why divert resources from that?
> >
> > On Fri, Apr 28, 2023 at 8:44 AM Torre Wenaus via Phys-npps-mgmt-l <phys-npps-mgmt-l AT lists.bnl.gov> wrote:
> >
> >     Thanks a lot Brett. Kolja is well versed in the (Jerome) arguments and can comment.
> >       Torre
> >   
> >     On Fri, Apr 28, 2023 at 8:42 AM Brett Viren <bv AT bnl.gov> wrote:
> >   
> >         Hi Torre,
> >       
> >         Mostly on XRootD:
> >       
> >         DUNE @ SDCC is small fry in comparison to EIC @ SDCC, both in resource
> >         needs and in that BNL is not the host, but XRootD and Rucio are needed
> >         in DUNE as well.  Maybe this "synergy" is somehow useful in your
> >         arguments.
> >       
> >         I don't know the backstory but it seems really weird that XRootD is a
> >         sticking point.  At least I fail to see any effort/resource argument
> >         against it.  Here's why:
> >       
> >         Back when LBNE (pre DUNE) and Daya Bay had RACF nodes, Ofer gave us
> >         great service to set up and operate XRootD servers and clients.  These
> >         groups were in the "free tier" of RACF service with only about a dozen
> >         nodes each.  I think the effort to scale from that to what EIC needs
> >         would be rather less than linear in the number of nodes.  We also had
> >         the extra complication that the XRootD storage nodes doubled as batch or
> >         interactive nodes.  I expect keeping these roles separate would make for
> >         an even easier provision.  And adding that EIC is not in the "free tier"
> >         it seems historically inconsistent for the XRootD request to be denied.
> >
> >         It may be useful to know the arguments from SDCC to refuse the request.
> >         Maybe they can be included in what you are assembling.
> >
> >         -Brett.
> >       
> >         Torre Wenaus via Phys-npps-mgmt-l <phys-npps-mgmt-l AT lists.bnl.gov>
> >         writes:
> >       
> >         > Hi all,
> >         > In addition to getting past the BNL embarrassment of not offering XRootD to EIC despite years of
> >         requests, I'd
> >         > like to explore ways NPPS could help ePIC with a further and much more impactful change in their data
> >         > management, adopting Rucio. JLab is working on this now. At BNL it sits (rightly) in the wings while
> >         we focus
> >         > on sPHENIX. If people have thoughts on how we can help, including leveraging sPHENIX work without
> >         impeding it,
> >         > I'd like to hear them.
> >         >   Torre
> >         >
> >         > --
> >         > -- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
> >         > -- BNL 510A 1-222 | 631-681-7892
> >         > _______________________________________________
> >         > Phys-npps-mgmt-l mailing list
> >         > Phys-npps-mgmt-l AT lists.bnl.gov
> >         > https://lists.bnl.gov/mailman/listinfo/phys-npps-mgmt-l
> >
> >     --
> >     -- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
> >     -- BNL 510A 1-222 | 631-681-7892
> >     _______________________________________________
> >     Phys-npps-mgmt-l mailing list
> >     Phys-npps-mgmt-l AT lists.bnl.gov
> >     https://lists.bnl.gov/mailman/listinfo/phys-npps-mgmt-l
> >
> > --
> > ________________________
> > Kolja Kauder, Ph.D.
> > NPPS, EIC
> > Brookhaven National Lab, Upton, NY
> > +1 (631) 344-5935
> > he/him/his
> > ________________________
>
>
> --
> -- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
> -- BNL 510A 1-222 | 631-681-7892
> _______________________________________________
> Phys-npps-mgmt-l mailing list
> Phys-npps-mgmt-l AT lists.bnl.gov
> https://lists.bnl.gov/mailman/listinfo/phys-npps-mgmt-l



--
-- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
-- BNL 510A 1-222 | 631-681-7892



Archive powered by MHonArc 2.6.24.

Top of Page