Skip to Content.
Sympa Menu

atlas-connect-l - [Atlas-connect-l] Failed jobs due to xrootd issues?

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Christopher Meyer <chris.meyer AT cern.ch>
  • To: <atlas-connect-l AT lists.bnl.gov>
  • Subject: [Atlas-connect-l] Failed jobs due to xrootd issues?
  • Date: Wed, 10 Feb 2016 20:06:53 +0000

Dear Experts,

A few days ago I submitted a large number of jobs using condor through ATLAS connect. However, a number of them failed with errors like those in the log file I've attached. I've run on some of these files directly from login.usatlas.org (using the root:// path) and everything works fine.

There was only one of the first type, which failed on:
iut2-c191.iu.edu

The second type failed on these nodes:
uct2-c166.mwt2.org
uct2-c173.mwt2.org
uct2-c127.mwt2.org
uct2-c086.mwt2.org
uct2-c108.mwt2.org
uct2-c180.mwt2.org
uct2-c154.mwt2.org
uct2-c078.mwt2.org
uct2-c045.mwt2.org
uct2-c170.mwt2.org
uct2-c133.mwt2.org
uct2-c103.mwt2.org
uct2-c172.mwt2.org
uct2-c080.mwt2.org
uct2-c171.mwt2.org

Does anyone have an idea what might be going wrong? Or if there's something I can do to protect against this?

Thanks!
Chris
160208 22:57:07 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160208 22:57:07 860954 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2097152 bytes) from substream 0.

160209 01:09:48 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:09:49 862863 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:12:20 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:12:21 890796 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:14:52 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:14:54 891121 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:17:24 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:17:25 891723 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:19:59 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:20:00 892066 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:22:34 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:22:35 892686 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
160209 01:25:06 860939 Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.218:22920]). Retrying ...
160209 01:25:06 860939 Xrd: SendGenCommand: Too many redirections for request
kXR_readv. Aborting command.
160209 01:25:07 860939 Xrd: GetParallelStreamCount: Unknown logical conn 16
160209 01:25:07 860939 Xrd: GetParallelStreamToUse: Unknown logical conn 16
160209 01:25:07 860939 Xrd: WriteToServer: Unknown logical conn 16
160209 01:25:07 860939 Xrd: WriteToServer: Unknown logical conn 16
160209 01:25:07 860939 Xrd: SendGenCommand: Too many redirections for request
kXR_read. Aborting command.
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:5930277, branch:mcevt_weight, entry:102, badread=1, nerrors=1,
basketnumber=3
160209 01:25:07 893647 Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2018455 bytes) from substream 0.
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:103, badread=0, nerrors=2,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:104, badread=0, nerrors=3,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:105, badread=0, nerrors=4,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:106, badread=0, nerrors=5,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:107, badread=0, nerrors=6,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:108, badread=0, nerrors=7,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:109, badread=0, nerrors=8,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:110, badread=0, nerrors=9,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Error in <TBranchElement::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00125.merge.output.root
at byte:0, branch:mcevt_weight, entry:111, badread=0, nerrors=10,
basketnumber=3
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=8978230, len=47858, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=8951901, len=26329, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=8978230, len=47858, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=8951901, len=26329, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=7346413, len=2946, fNbytes=0, fObjlen=0, trying to repair
Warning in <TBasket::ReadBasketBuffers>: basket: has fNevBuf=0 but
fEntryOffset=0, pos=8978230, len=47858, fNbytes=0, fObjlen=0, trying to repair
160208 23:03:10 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160208 23:03:11 1986861Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2075614 bytes) from substream 0.
160208 23:05:50 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160208 23:05:51 1999588Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2090445 bytes) from substream 0.
160208 23:08:28 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160208 23:08:29 2006639Xrd: XrdClientMessage::ReadRaw: Failed to read data
(2093621 bytes) from substream 0.
160208 23:34:33 2017103Xrd: XrdClientSock::RecvRaw: Error reading from
socket: Connection reset by peer
160208 23:34:33 2017103Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160208 23:37:06 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160208 23:37:07 1986860Xrd: XrdClientSock::RecvRaw: Error reading from
socket: Connection reset by peer
160208 23:37:07 1986860Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160208 23:39:42 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [uct2-s5.mwt2.org:1094]). Retrying ...
160208 23:59:47 2064244Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160209 00:00:59 2064245Xrd: XrdClientSock::RecvRaw: Error reading from
socket: Connection reset by peer
160209 00:00:59 2064245Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160209 00:03:30 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160209 00:03:31 1986852Xrd: CheckErrorStatus: Server [149.165.225.219:22782]
declared: session not found(error code: 3011)
160209 00:06:09 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160209 00:10:24 2066378Xrd: XrdClientMessage::ReadRaw: Failed to read data
(1219792 bytes) from substream 0.
160209 00:26:18 2066377Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160209 00:27:02 2066655Xrd: XrdClientSock::RecvRaw: Error reading from
socket: Connection reset by peer
160209 00:27:02 2066655Xrd: XrdClientMessage::ReadRaw: Failed to read header
(8 bytes).
160209 00:29:35 1986852Xrd: ReadPartialAnswer: Failed to read msg from
connmgr (server [149.165.225.219:22782]). Retrying ...
160209 00:29:36 1986852Xrd: SendGenCommand: Too many redirections for request
kXR_open. Aborting command.
160209 00:29:36 1986852Xrd: OpenFileWhenRedirected: File open failed.
160209 00:29:36 1986852Xrd: SendGenCommand: Too many redirections for request
kXR_readv. Aborting command.
Error in <TXNetFile::ReadBuffer>: The remote file is not open
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:1581521412, branch:EF_2e12Tvh_loose1, entry:17599, badread=1,
nerrors=1, basketnumber=88
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-13882) ;
trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNevBufSize is incorrect
(-943686495) ; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:-210612374, branch:EF_2mu13, entry:17599, badread=1, nerrors=2,
basketnumber=88
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-23254) ;
trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-772873197)
; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNevBufSize is incorrect
(-793977865) ; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:-7386830850312805830, branch:EF_e24vhi_medium1, entry:17599,
badread=1, nerrors=3, basketnumber=88
Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-1526804528)
; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-747086601)
; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:8499033983717550177, branch:EF_e60_medium1, entry:17599, badread=1,
nerrors=4, basketnumber=88
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-19356) ;
trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNbytes is incorrect (-1264579726)
; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNevBufSize is incorrect
(-1687862690) ; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:-692427372, branch:EF_mu18_tight_mu8_EFFS, entry:17599, badread=1,
nerrors=5, basketnumber=88
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-28449) ;
trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-1730323467)
; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNevBufSize is incorrect
(-793977865) ; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:336374917, branch:EF_mu24i_tight, entry:17599, badread=1, nerrors=6,
basketnumber=88
Error in <TBasket::Streamer>: The value of fKeylen is incorrect (-15819) ;
trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fObjlen is incorrect (-1398860667)
; trying to recover by setting it to zero
Error in <TBasket::Streamer>: The value of fNevBufSize is incorrect
(-1794306456) ; trying to recover by setting it to zero
Error in <TBranch::GetBasket>: File:
root://fax.mwt2.org//atlas/rucio/user.btannenw:user.btannenw.003316._00092.merge.output.root
at byte:5981155918917247241, branch:EF_mu36_tight, entry:17599, badread=1,
nerrors=7, basketnumber=88



Archive powered by MHonArc 2.6.24.

Top of Page