sphenix-emcal-l AT lists.bnl.gov
Subject: sPHENIX EMCal discussion
List archive
[Sphenix-emcal-l] running the entire emcal - status
- From: Martin Purschke <purschke AT bnl.gov>
- To: "sphenix-emcal-l AT lists.bnl.gov" <sphenix-emcal-l AT lists.bnl.gov>
- Subject: [Sphenix-emcal-l] running the entire emcal - status
- Date: Wed, 10 May 2023 23:48:20 -0400
Dear Emcal aficionados,
first off, not really a victory mail yet. The findings reported earlier today are still 100% true - there is a significant chance that one of the 8 servers fails to get events, invalidating the entire run.
Also true that it's not the same server - I started 3 runs, got seb02, seb07, and seb04 that failed to start, respectively, one each time.
Here's what run control has to say in the last run I started -
[phnxrc@seb02 emcal_shadow]$ rc_status
seb00 5969 16578 6051.96 1 0
seb01 5969 16579 6052.33 1 0
seb02 5969 16580 6052.69 1 0
seb03 5969 16575 5674.24 1 0
seb04 5969 1 0.00397491 1 0
seb05 5969 16580 5045.85 1 0
seb06 5969 16580 5045.85 1 0
seb07 5969 16580 5045.85 1 0
You see, seb04 dropped out here. We need to figure out what the story is here, and why they fail to get data so often.
With that said, I felt that the way we are starting a bunch of servers is quite error-prone and also inconvenient. I temporarily started (but did not boot-enable) the same systemd-level rcdaq service that restarts the rcdaq_servers when they get shut down (or crash, or go away for any other reason). (The TPC and INTT have been using this for a few weeks now).
That start step was the *only* part that required us to actually log in to a given seb machine. Now the entire configuration can be done remotely, that is, from any given node, such as (soon) the operator consoles.
To support that (and not wanting to change stuff in your directory), I cloned the entire operations/emcal area into operations/emcal_shadow where I made the changes. If/once we are ok with that, we could preserve the existing emcal somewhere and rename emcal_shadow to emcal - your prerogative.
So first off, I made a "generic" setup file similar to the one that John had already made and that keys on the value of RCDAQHOST or, if that's not defined, on the hostname. I called it "rcdaq_setup_emcal_generic.sh" - well, whatever we want to call it.
Here is a script to set up all 8 rcdaqs in one fell swoop:
#! /bin/bash
for i in {0..7}; do
export RCDAQHOST=seb0$i
bash rcdaq_setup_emcal_generic.sh
done
By setting the RCDAQHOST variable, everything in the script acts on the RCDAQ instance on that machine.
I took the section out that figured out if a server is running, no longer needed.
The "generic" aspect is at the beginning of the script -
#! /bin/sh
H=$RCDAQHOST
[ -z "$H" ] && H=$(hostname)
CONFIGFILE="/home/phnxrc/operations/emcal_shadow/${H}_emcal.scf"
so it picks the right config file.
Let me state the obvious - since you already had the individual sebXX_setup.sh files, we might as well re-use those - I just shied away from editing 8 files rather than one.
I also made a convenience script "all_rcdaq_statuses.sh" that starts, for feel-good value, 8 individual rcdaq_status.pl processes with the right RCDAQHOST def:
#! /bin/bash
for i in {0..7}; do RCDAQHOST=seb0$i rcdaq_status.pl & done
That results then in an array of the status GUIs shown in the screenshot below.
I found a 10Hz forced-accept modebit setup loaded; I saved this as 10Hz_pulse_FA.scheduler that you can reload with
gtm_load_modebits 5 10Hz_pulse_FA.scheduler
I like a bit more control and loaded "0Hz.scheduler" with no FAs. That allows me to control the rate better with ext. triggers.
Ok, coming back to the beginning - we need increase the successful-start probability big time, or we won't be able to run.
Best,
Martin
--
Martin L. Purschke, Ph.D. ; purschke AT bnl.gov
; http://www.phenix.bnl.gov/~purschke
;
Brookhaven National Laboratory ; phone: +1-631-344-5244
Physics Department Bldg 510 C ; fax: +1-631-344-3253
Upton, NY 11973-5000 ; skype: mpurschke
-----------------------------------------------------------------------
Attachment:
eight_statuses.jpg
Description: JPEG image
- [Sphenix-emcal-l] running the entire emcal - status, Martin Purschke, 05/10/2023
Archive powered by MHonArc 2.6.24.