Skip to Content.
Sympa Menu

phys-npps-mgmt-l - Re: [Phys-npps-mgmt-l] CSI/NPPS LDRD A proposal

phys-npps-mgmt-l AT lists.bnl.gov

Subject: NPPS Leadership Team

List archive

Chronological Thread  
  • From: Brett Viren <bv AT bnl.gov>
  • To: "Laycock, Paul" <laycock AT bnl.gov>
  • Cc: Torre Wenaus via Phys-npps-mgmt-l <phys-npps-mgmt-l AT lists.bnl.gov>, Tadashi Maeno <tmaeno AT cern.ch>
  • Subject: Re: [Phys-npps-mgmt-l] CSI/NPPS LDRD A proposal
  • Date: Fri, 21 May 2021 09:54:10 -0400

Hi Paul,

"Laycock, Paul" <laycock AT bnl.gov> writes:

> There were some talks at CHEP, ...

Thanks for these two. Creating "semantic meaning" inside AI/ML latent
space is a particularly interesting technique especially for the
GAN-as-fast-sim application.

The UCluster talk helps solidify some thoughts for me.

tl;dr: Training and inference are two very separate problem spaces and
their dichotomy should inform our strategy.

Here's my take on that dichotomy:

- Distributed training is a "one time problem". Each network
architecture pattern requires its own R&D. It is suited to HPC.

UCluster's graph-NN architecture is conceptually perfect for
distribution while a more monolithic network would have very different
challenges to distribute. Maybe the classic GAN can put the D and the G
on two separate GPUs but further distribution must attack monoliths.
This zoology of R&D makes for good job security for CSI types but at
some point it begins to look a lot like engineering (nttiawwt).

- Distributed inference is an "all the time problem". It is a more
general and practical problem sharing space with hyper-parameter
optimization and accelerating heuristic algorithms (eg FFT). It is
suited to HTC(+GPU).

I feel this second problem is more important to actually applying AI/ML
and "getting the science out" of the data. It may be derided as "mere
engineering" by some, but without building bridges we all get wet.

So, our meta problem is how to get funding using the sexy "one-time"
problem while actually solving the "all-the-time" problem which I think
is the real bottleneck.

-Brett.

Attachment: signature.asc
Description: PGP signature




Archive powered by MHonArc 2.6.24.

Top of Page