Skip to Content.
Sympa Menu

sphenix-software-l - [Sphenix-software-l] Fwd: SDCC Facility network intervention of Tuesday Dec 1, 2020 (6:00-18:00 Eastern Time)

sphenix-software-l AT lists.bnl.gov

Subject: sPHENIX discussion of software

List archive

Chronological Thread  
  • From: pinkenburg <pinkenburg AT bnl.gov>
  • To: PHENIX Current Participants <phenix-p-l AT lists.bnl.gov>, phenix-off-l <phenix-off-l AT lists.bnl.gov>, "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>, "sphenix-l AT lists.bnl.gov" <sphenix-l AT lists.bnl.gov>
  • Subject: [Sphenix-software-l] Fwd: SDCC Facility network intervention of Tuesday Dec 1, 2020 (6:00-18:00 Eastern Time)
  • Date: Tue, 1 Dec 2020 17:48:39 -0500

Hi folks,

rcf is back

Chris



-------- Forwarded Message -------- Subject: Re: SDCC Facility network intervention of Tuesday Dec 1, 2020 (6:00-18:00 Eastern Time) Date: Tue, 1 Dec 2020 17:46:10 -0500 From: alezayt <alezayt AT rcf.rhic.bnl.gov> To: liaisons AT rcf.rhic.bnl.gov, sdcc_liaisons AT bnl.gov CC: RCF Staff <rcfstaff AT rcf.rhic.bnl.gov>

Hi All,

It is my pleasure to announce that as of 17:20 Eastern Time today we have seen the recovery of all SDCC production systems to a functional state, and thus can declare the Facility wide network intervention of Dec 1, 2020 completed successfully.

All primary goals of the intervention listed in the original message below are now fulfilled withing the designated time window, and as the result the central networking system of the SDCC Facility is ready for the transparent transition to the B515/B725 datacenter operations later in FY21.

Please let us know if you see any issues with the SDCC services and subsystems as the workload is scaled back and the regular user activities are resumed.

Thanks!

Cheers,
Alex.


On 11/25/2020 5:20 PM, alezayt wrote:
Hi All,

This is a reminder that on Tuesday Dec 1, 2020 during the time window of 6:00-18:00 Eastern Time the SDCC Facility and ITD Network Engineering Group are going to carry out the scheduled network intervention for which the details were given in the earlier announcement (message quoted below).

The SciZone network connectivity interruption is expected to begin at 6:00am Eastern Time on Dec 1, 2020.


The updated summary of the impact of this intervention on the central DISK storage systems hosted at SDCC looks as follows:

- Systems that are expected to stay up throughout the intervention (for external clients only) service wise are: BNL FTS, Belle II CDB, Belle II DDM, Belle Dirac, Belle Rucio, CVMFS (real hardware based components).

- ATLAS T1 dCache and Belle II dCache stay up network wise but go down service wise due to interruption of SciZone DNS.

- RHIC GPFS goes down because of interruption of connectivity of client clusters in SDCC RHEV.

- BNLBox goes down because of interruption of its storage backend (separate announcements were made to all BNLBox users earlier this week).


Since the SciZone based methods of communication are going to be disrupted during the intervention (including SDCC mail service and Mattermost Chat service), please feel free to reach out to me directly on the day of intervention using, if necessary:

- My Gmail address: alexander.s.zaytsev AT gmail.com

- My cell phone number: +16312941084


Thanks!

Cheers,
Alex.

P.S.: Happy Thanksgiving!


On 11/13/2020 4:59 PM, alezayt wrote:
Hi All,

As it was mentioned during the SDCC Liaison meetings over the last 2 months, on Tuesday Dec 1, 2020 between 6:00 and 18:00 Eastern Time the SDCC Facility is going go through a disruptive network intervention. This intervention is a crucial step on the way to commissioning of the new B725 based datacenter for the SDCC Facility and a smooth transition to B515/B725 datacenters interoperations as anticipated to happen in 2021Q3.


The main goals of the intervention are:

- The complete replacement of the existing B515 datacenter network Science Core modular Arista switch pair approaching end of life with the 400 GbE enabled modular Arista switch pair that would serve the B515 side of SDCC Facility for the next 5 years.

- Retirement of the other components of SDCC central network equipment that are approaching end of life and no longer needed in the consolidated SDCC SciZone architecture ("Merged RHIC/ATLAS FrontEnd").

- Consolidation of the remaining components of the SDCC central network equipment serving general infrastructure ("Service Blocks") and completion of the construction of the unified Science Zone (the process that started back to 2017).


Due to the scale and complexity of the operations that need to be performed on the network infrastructure of the Facility during this intervention, it is likely to take a substantial portion of the 12h time window reserved for it is going to be used, resulting in multi-hour long network connectivity interruption for the SDCC Science Zone (SciZone) which would imply:

- The draining of all HPC clusters and CPU Farms hosted at SDCC over a necessary grace period before the day of intervention, followed by the period of recovery after the intervention,

- Unavailability of many services hosted in SDCC Facility such as: user SSH gateways, NX systems, mail server, user authentication services, during the intervention window,

- No NTP and DNS service inside the

- Unavailability of a significant portion of SDCC web infrastructure (everything behind the SDCC reverse proxies including Mattermost chat) during the intervention window.

- Loss of connectivity over direct fiber uplinks between B515 SciCore and its clients deployed in other buildings on BNL Campus, such as STAR CH HPSS uplink (B10006), CAD HPSS uplink (B911), CFN uplink to B515 SciZone (B735), NSLS II direct uplink to B515 SciZone (B74x), B725 ACL uplink to B515 SciZone (B725/area 1-181).


A limited set of central Storage systems hosted at SDCC, such as ATLAS dCache, Belle II dCache and BNLBox are expected to survive and continue to serve external clients only (those on BNL Campus and outside BNL) during this intervention.


Please propagate this information to the respective user communities, and let me know if you have questions or concerns about the way in a specific SDCC system is going to be affected during this intervention.


Also, since SDCC mail service and Mattermost chat are going to be both unavailable during the intervention, please use my G-mail address

alexander.s.zaytsev AT gmail.com

for electronic communications during the intervention window.


Sincerely yours,
Alexandr ZAYTSEV
---
BNL, SDCC


  • [Sphenix-software-l] Fwd: SDCC Facility network intervention of Tuesday Dec 1, 2020 (6:00-18:00 Eastern Time), pinkenburg, 12/01/2020

Archive powered by MHonArc 2.6.24.

Top of Page