phys-npps-mgmt-l AT lists.bnl.gov

Subject: NPPS Leadership Team

List archive

Re: [[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`

From: Brett Viren <bv AT bnl.gov>
To: Torre Wenaus <wenaus AT gmail.com>
Cc: NPPS leadership team <Phys-npps-mgmt-l AT lists.bnl.gov>
Subject: Re: [[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`
Date: Thu, 05 Jun 2025 12:09:30 -0400

Hi Torre,

I'll attend the meeting tomorrow but here are some initial thoughts:

Torre Wenaus <wenaus AT gmail.com> writes:

> * MCP is moving fast, Tadashi already has a PanDA demonstrator, which
> we will learn from in implementing services in the ePIC streaming
> workflow testbed. As we define APIs and services for the testbed, we
> want to make them MCP-capable from the beginning.
>
> * question is, what LLM to use?

BNL pays for / provides Azure AI which includes OpenAI's and also
DeepSeek[1] LLMs. I don't know the details but Xin Qian in our group
has an API key through BNL and I've briefly tested it for him using an
Open WebUI instance. That works. (Then I needed a "lightllm" proxy but
latest Open WebUI is supposed to access Azure AI directly).

So far, the fixed access cost to the group has been around $10 which was
dominated by a fixed cost just for access. PanDA's cost will of course
depend on which LLMs are used and on the number and nature of the
queries. So far my minor playing was using the cheaper ones (o4-mini, I
think it was). The cost of my individual queries were a few cents,
again, depending on details.

> Claude costs money. We have our own GPU box, npps0.bnl.gov.

Google Gemini Flash 2.5 is fairly competitive with Claude Sonnet 4 but
with an enormous 1M token context. Claude is "smarter". This opinion
is based on code generation and code analysis tasks.

I think any small LLM's that fit on RTX 4090 will have tough time
competing with "large LLMs" from OpenAI, etc for the most challenging
queries.

But, the types of tasks needed here may not need a "large LLM". And
"small LLMs" are getting better, almost literally, daily. I use llama 3
70b almost every day to replace Google searching and qwen3 just came out
and its quantized version seems about as good with less GPU RAM needed.

> Can we set up a reverse proxied service serving the BNL network? The
> BNL network is not enough, but it’s a start until there’s something
> else (general CERN service, but doesn’t help EIC; a means of paying
> for a cloud service, impossible at BNL and may get quickly expensive
> anyway; …)
>
> this is a question to (at least) Brett: Dmitri Smirnov points to such
> a service Brett helped him set up long ago, https://www.cnipol.bnl.gov
> . Can we set up an nppsbot.bnl.gov service in the same mold (unless
> someone has a better idea)

Yes, "www.phy.bnl.gov" Apache has actually several reverse proxied
endpoints fronting web servers on internal hosts. This can be done
either on a domain name or a URL sub-path basis.

However, I'm not sure this is necessarily the best approach for this
case. I worry mostly about the security and resource usage implications
about exposing GPU resources to the wild internet.

Also, the architecture of what is needed is still murky in my mind. It
would help to have some "architecture" / connectivity diagram showing
PanDA server, LLM and MCP endpoints. Maybe something like that is in a
presentation already?

In any case, I'll help this however I can. This MCP business seems
interesting to me in general.

-Brett.

[1] Which opens a bizarre policy conundrum that we are not allowed to
use DeepSeek yet it is provided to us with official BNL licensing of
Azure AI. It may be that this version is "blessed".

Attachment: signature.asc
Description: PGP signature

[[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`, Torre Wenaus, 06/05/2025
- Re: [[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`, Torre Wenaus, 06/05/2025
  - Re: [[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`, Brett Viren, 06/05/2025
    - Re: [[Phys-npps-mgmt-l] ] Leadership team meeting tomorrow`, Torre Wenaus, 06/05/2025