Skip to Content.
Sympa Menu

phys-npps-mgmt-l - [Phys-npps-mgmt-l] Fwd: Draft ML services abstract for FOA proposal

phys-npps-mgmt-l AT lists.bnl.gov

Subject: NPPS Leadership Team

List archive

Chronological Thread  
  • From: Torre Wenaus <wenaus AT gmail.com>
  • To: NPPS leadership team <Phys-npps-mgmt-l AT lists.bnl.gov>
  • Subject: [Phys-npps-mgmt-l] Fwd: Draft ML services abstract for FOA proposal
  • Date: Thu, 7 Apr 2022 10:33:21 -0400

FOA draft. Comments, volunteers appreciated. Brett and Paul, can I add you as (effort ~free) participants? We should be mentioning DUNE in the proposal.
  Torre

---------- Forwarded message ---------
From: Torre Wenaus <wenaus AT gmail.com>
Date: Thu, Apr 7, 2022 at 9:57 AM
Subject: Draft ML services abstract for FOA proposal
To: Tadashi Maeno <tmaeno AT cern.ch>, Alexei Klimentov <aak AT bnl.gov>, Paul Nilsson <nnilsson AT bnl.gov>, Kaushik De <kaushik AT uta.edu>, Fernando H. Barreiro Megino <fernando.harald.barreiro.megino AT cern.ch>, Wen Guan <wguan.icedew AT gmail.com>, Rui Zhang <rui.zhang AT cern.ch>, Meifeng <mlin AT bnl.gov>, Pascuzzi, Vincent <vpascuzzi AT bnl.gov>


Hi all,
Here's a draft abstract for an FOA proposal. Abstract has to go to to Hong tomorrow, followed by a 2-page LOI in the next 1-2 weeks. Funding level guidance is $500k-$750k median, $1.5M max (total over three years I think?). Thoughts and comments appreciated. We have no choice about submitting one; Dmitri Denisov has told us we need to. We certainly have the capability for a strong one, as I try to outline here. It fits perfectly in the ecosystem topic area of the call. I'm just a bit cynical about the cost/benefit of such things! Benefit will be large if it goes through, cost will dominate if we write a proposal and it is turned down. But you don't win if you don't try. And as Dmitri pointed out, the visibility conferred by such a proposal is important in itself. 

The call is here

Comments, questions appreciated!

A Scalable and Distributed Machine Learning Service for Data-Intensive Applications

 

AI/ML applications are in a period of rapid growth and innovation in HEP as in the wider world, on the foundation of a powerful and still evolving open source tool set. Developing such applications in HEP typically takes place on the desktop or local cluster, sometimes facilitated by GPU acceleration on modestly scaled resources. Ready access to large scale resources (regional to global grids, HPCs, opportunistic clouds) could greatly accelerate developing and refining existing applications requiring substantial processing, by shortening optimization and training latencies by orders of magnitude. More importantly, such access could enable a transformative expansion of creativity and innovation in conceiving and developing AI/ML applications. Lifting the practical constraints of working at the scale of owned/local resources would unshackle not just the applications but of conceptualizing them in the first place.

 

We propose to draw on our world leading expertise and capability in scientific workflow management at the largest scales, as realized in the PanDA workload management system and its wider ecosystem including the Intelligent Data Delivery Service (iDDS), to make a transformative contribution to the HEP AI/ML ecosystem in the form of an experiment agnostic system and services that offer AI/ML developers and users a low threshold of entry and powerful automation and monitoring tools to bring their AI/ML development to large scale computing resources. Our aim is to support the full development process -- conceptualization, brainstorming, prototyping, iterative development, optimization, training, and application in analysis and production -- on the dynamic landscape of the widest array of processing resources available to a researcher at a given time. Keys to this are a distributed system able to operate transparently and coherently across distinct heterogeneous resources; portability of development environments and applications; supporting and leveraging the full, rapidly evolving suite of open source AI/ML tools including particularly those for leveraging large scale resources for optimization and training; uniform and easy to use authentication; control, automation and monitoring systems that put the user in well informed control of the system; interactive and close-to-interactive response latencies to support the fastest possible iterative development; and integration with the analysis tool suites in broad use, particularly the python scientific software stack including Jupyter. 

 

The PanDA ecosystem possesses these key attributes and we have used them to implement support for AI/ML hyperparameter optimization workflows, currently used in an ATLAS production context to leverage GPU resources across the ATLAS grid to optimize the 300 networks making up the FastCaloGAN component of the production ATLAS fast simulation AtlFast3, requiring 100 GPU-days for one optimization pass. Through the project proposed here we will build on this capability to create an experiment agnostic software stack supporting a suite of highly scalable AI/ML services and workflows able to leverage HPCs (including LCFs), opportunistic clouds (including Amazon EC2 and Google GCP), campus and grid clusters, and extending down to the desktop command line and Jupyter environment.

 

PI: Torre Wenaus

 

Expected/hoped for participants (likely to evolve):

lab:

NPPS: Torre Wenaus, Tadashi Maeno, Alexei Klimentov, Paul Nilsson

CSI: Meifeng Lin, Vince Pascuzzi

JLab: We may seek their participation, reciprocally with our involvement in their real time reco FOA proposal. They may be an avenue to involving minority serving institutions with good software capability, they have several such relationships. Have not suggested this to them yet.

 

uni:

UTA: Kaushik De, Fernando Megino Barreiro

U Wisconsin Madison: Wen Guan, Rui Zhang, Tuan M. Pham?

others?


 

Estimated support request (translating to funding level):

25% of the PI, as required

other NPPS: keep it modest? the NPPS part is already big by requirement

CSI: ?

Universities: the rest, constituting at least 30%? 40%? of the total funding request


  Torre

--
-- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
-- BNL 510A 1-222 | 631-681-7892 |  wenaus AT gmail.com | npps.bnl.gov | wenaus.com
-- NPPS Mattermost room: https://chat.sdcc.bnl.gov/npps/channels/town-square


--
-- Torre Wenaus, BNL NPPS Group, ATLAS Experiment
-- BNL 510A 1-222 | 631-681-7892 |  wenaus AT gmail.com | npps.bnl.gov | wenaus.com
-- NPPS Mattermost room: https://chat.sdcc.bnl.gov/npps/channels/town-square


  • [Phys-npps-mgmt-l] Fwd: Draft ML services abstract for FOA proposal, Torre Wenaus, 04/07/2022

Archive powered by MHonArc 2.6.24.

Top of Page