star-hp-l AT lists.bnl.gov
Subject: STAR HardProbes PWG
List archive
- From: "Mooney, Isaac" <isaac.mooney AT yale.edu>
- To: Robert Líčeník <licenik AT ujf.cas.cz>, STAR HardProbes PWG <star-hp-l AT lists.bnl.gov>
- Subject: Re: [Star-hp-l] Run 14 data restoration
- Date: Tue, 4 Jul 2023 15:51:38 +0000
Hi Robert,
I’m not an RCF expert, but I believe this problem is with xrootd or a specific node/set of nodes, rather than with the data itself. From https://www.star.bnl.gov/public/comp/prod/localdata/ProdDDstreams_pico.html I
see that the production you’re trying to access is close to 100% on DD [presplit is ~93% but the others are ~100%]. So it shouldn’t be a problem with missing data. I was also working with a student who had the same problem a few days ago on a completely different
dataset, so there may be some problems with the servers right now. I would recommend bringing this up on Mattermost and seeing what the experts say, although this problem has come up before as well. I’m quoting some of their responses to these problems below.
To summarize, it seems that either there is a problem with xrootd on a node, which could be fixed, or there is a transient problem that would be solved just by trying again. Maybe give that a try and see if it’s better now, while also bringing it up on Mattermost
in case there is a more systemic problem that should be brought to someone’s attention.
Thanks,
Isaac
Jerome: "Yes so the scheduler tries a few times. Error 3011 can be issued for several reasons:
the dataserver is not available or overloaded
the network is saturated (yes, it can happen)
the file is really not there
We have no other ways than retrying.”
Gene: "I just tried opening a few other random files from different productions on the same node as that file (rcas6078.rcf.bnl.gov) and got the same error. The node appears to be up, according to ganglia, so it may be a problem specific to xrootd on that
node. I'll submit a ticket. Thanks for the tip.”
Leve: "My best guess (I don't administrate the xrootd nodes) is that perhaps xrootd servers (it's one name in the url but they are load balanced) were overloaded at the time.”
On Jul 4, 2023, at 8:42 AM, Robert Líčeník via Star-hp-l <star-hp-l AT lists.bnl.gov> wrote:
Hello everyone,
for the record, I am talking about the low mid and presplit luminosity parts of the dataset. We are not using the high lumi part, because there is no centrality definition.I don't think the issue is the files missing. When I submit jobs (using the catalog query), this is the snippet of the output:
Analyzing XML...XML OK
Executing : get_file_list.pl -keys fdid,storage,site,node,path,filename,events -cond production=P18ih,library=SL20d,filetype=daq_reco_picoDst,trgsetupname=AuAu_200_production_mid_2014||AuAu_200_production_low_2014||AuAu_200_production_2014,storage!=HPSS,filename~st_physics -limit 0....................377391 Entries Recovered
[2023.07.04 07:51:58 EDT] Dataset size is 377391 files
Removing files not on site BNL
[2023.07.04 07:52:05 EDT] Dataset size is 377391 files
-------Processing recovered dataset for xrootd/xrootddev/rfio-------
Dropping HPSS files
[2023.07.04 07:52:11 EDT] Dataset size is 377391 files
Dropping files with duplicate LFN (files that are the same)
[2023.07.04 07:52:36 EDT] Started with 377391 files, current size is 277442files, 99949 duplicate files dropped.
Splitting dataset entries by size (minSize=10 ,maxSize=10 )
[2023.07.04 07:52:41 EDT] Dataset size is 277442 files
I also tested it running get_file_list.pl manually and I get exactly the same number of files (/direct/star+u/licenrob/licenrob/jets/pico_2014midlowpre.list),this number is very close to the number of files on HPSS (/direct/star+u/licenrob/licenrob/jets/pico_2014midlowpre_hpss.list),more precisely 98.5 %. So far, everything seems reasonable.
However, when running my analysis code, many files appear to be inaccessible.Here is a snippet of one of the log files:
StInfo: Read in picoDst file /tmp/licenrob/40DEAA0A4347821C32A01647C174D2E3_3/INPUTFILES/st_physics_15077003_raw_5000009.picoDst.root
230609 13:52:53 24687 Xrd: CheckErrorStatus: Server [xrdstar.rcf.bnl.gov:1095] declared: No servers are available to read the file.(error code: 3011)
230609 13:52:53 24687 Xrd: Open: Error opening the file /home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_adc_15077004_raw_2000013.picoDst.root on host xrdstar04.rcf.bnl.gov:1095
230609 13:52:53 24687 Xrd: Open: Open failed for unknown reason.
Error in <TXNetFile::CreateXClient>: open attempt failed on root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_adc_15077004_raw_2000013.picoDst.root
StInfo: Read in picoDst file root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_adc_15077004_raw_1500013.picoDst.root
230609 13:52:59 24687 Xrd: CheckErrorStatus: Server [xrdstar.rcf.bnl.gov:1095] declared: No servers are available to read the file.(error code: 3011)
230609 13:52:59 24687 Xrd: Open: Error opening the file /home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_adc_15077004_raw_5000013.picoDst.root on host xrdstar03.rcf.bnl.gov:1095
230609 13:52:59 24687 Xrd: Open: Open failed for unknown reason.
Error in <TXNetFile::CreateXClient>: open attempt failed on root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_adc_15077004_raw_5000013.picoDst.root
230609 13:53:03 24687 Xrd: CheckErrorStatus: Server [xrdstar.rcf.bnl.gov:1095] declared: No servers are available to read the file.(error code: 3011)
230609 13:53:03 24687 Xrd: Open: Error opening the file /home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_1000022.picoDst.root on host xrdstar03.rcf.bnl.gov:1095
230609 13:53:03 24687 Xrd: Open: Open failed for unknown reason.
Error in <TXNetFile::CreateXClient>: open attempt failed on root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_1000022.picoDst.root
230609 13:53:06 24687 Xrd: CheckErrorStatus: Server [xrdstar.rcf.bnl.gov:1095] declared: No servers are available to read the file.(error code: 3011)
230609 13:53:06 24687 Xrd: Open: Error opening the file /home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_5000012.picoDst.root on host xrdstar04.rcf.bnl.gov:1095
230609 13:53:06 24687 Xrd: Open: Open failed for unknown reason.
Error in <TXNetFile::CreateXClient>: open attempt failed on root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_5000012.picoDst.root
StInfo: Read in picoDst file /tmp/licenrob/40DEAA0A4347821C32A01647C174D2E3_3/INPUTFILES/st_physics_15077004_raw_5000005.picoDst.root
StInfo: Read in picoDst file root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_5500019.picoDst.root
230609 13:53:08 24687 Xrd: CheckErrorStatus: Server [xrdstar.rcf.bnl.gov:1095] declared: No servers are available to read the file.(error code: 3011)
230609 13:53:08 24687 Xrd: Open: Error opening the file /home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_4500008.picoDst.root on host xrdstar03.rcf.bnl.gov:1095
230609 13:53:08 24687 Xrd: Open: Open failed for unknown reason.
Error in <TXNetFile::CreateXClient>: open attempt failed on root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/AuAu_200_production_2014/ReversedFullField/P18ih.SL20d/2014/077/15077004/st_physics_15077004_raw_4500008.picoDst.root
StInfo: Read in picoDst file /tmp/licenrob/40DEAA0A4347821C32A01647C174D2E3_3/INPUTFILES/st_physics_15077004_raw_2500006.picoDst.root
StInfo: Total 5 files have been read in.
You can see that some files have been read in with no problems, while some were not read due to the 3011 error and "unknown reason".
The file /direct/star+u/licenrob/licenrob/jets/errors.log contains the result ofgrep -i "Error opening the file" *.logand you can see that the error appears for around 91k files, which is around a third of the statistics.This already looks pretty concerning to me. To me this looks like either the files or the servers are somehow corrupted.However, if Tanmay is able to find and analyze all the files, I guess the problem is somewhere on my side and my request to restore the files is therefore invalid.I apologize for that.
Thanks and happy 4th of July,
Robert
_______________________________________________On Mon, Jul 3, 2023 at 6:03 PM Tanmay Pani via Star-hp-l <star-hp-l AT lists.bnl.gov> wrote:
Hi Robert and all,
Could you please let me know which productions out of low, mid, or high luminosity have reduced dataset? I was running over mid lumi till last week, and I didnt notice any reduced dataset.
But I can rerun getfilelist to recheck
Thanks,
Tanmay
_______________________________________________On Mon, Jul 3, 2023 at 10:34 AM Barbara Trzeciak via Star-hp-l <star-hp-l AT lists.bnl.gov> wrote:
Hi Robert, All,
that should be the same data that Tanmay plans to use for this analysis aiming for the QM talk.If indeed a large part of it is not available, it's important to restore it.
Cheers,
Barbara
_______________________________________________On Mon, Jul 3, 2023 at 4:30 PM Nihar Sahoo via Star-hp-l <star-hp-l AT lists.bnl.gov> wrote:
Hi Robert,
Can you please also prepare file list or RunId of missing files (as Rosi
suggested also)?
And let's discuss this at hp-pwg meeting this week.
Best
Nihar
On 2023-07-03 19:39, Robert Líčeník wrote:
> Hi Nihar,
>
> this is the dataset which we have been using from the beginning. We
> have used these picoDsts since they were created.
> Only recently we have discovered that a significant part of the
> statistics is no longer available. We need these picoDst files so we
> can achieve higher precision of our results.
> I am wondering whether anyone else is using this dataset, so they can
> confirm that what we see is true.
>
> Thanks,
> Robert
>
> On Mon, Jul 3, 2023 at 10:44 AM Nihar Sahoo <nihar AT rcf.rhic.bnl.gov>
> wrote:
>
>> Hi Robert,
>>
>> Is this a new data set you are going to use ? Have you used mudst
>> files
>> earlier or PicoDst for run14 ?
>> Can you please mention why do you need this PicoDst files?
>>
>> Thank you
>> Nihar
>>
>> On 2023-07-03 14:04, Robert Líčeník via Star-hp-l wrote:
>>> Hello HP conveners,
>>>
>>> we have noticed that a large part of the 2014 dataset picoDst
>> files
>>> are no longer available. This is the P18ih production with SL20d
>>> picoDst conversion of AuAu at 200 GeV without HFT in tracking.
>> Could
>>> you please officially request the restoration of this important
>>> dataset?
>>>
>>> Please let me know if you have any clarifying questions regarding
>> our
>>> request.
>>> Thank you,
>>> Robert
>>> _______________________________________________
>>> Star-hp-l mailing list
>>> Star-hp-l AT lists.bnl.gov
>>> https://lists.bnl.gov/mailman/listinfo/star-hp-l
_______________________________________________
Star-hp-l mailing list
Star-hp-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/star-hp-l
Star-hp-l mailing list
Star-hp-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/star-hp-l
Star-hp-l mailing list
Star-hp-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/star-hp-l
Star-hp-l mailing list
Star-hp-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/star-hp-l
-
[Star-hp-l] Run 14 data restoration,
Robert Líčeník, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Nihar Sahoo, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Robert Líčeník, 07/03/2023
- Re: [Star-hp-l] Run 14 data restoration, Rosi Reed, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Nihar Sahoo, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Barbara Trzeciak, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Tanmay Pani, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Robert Líčeník, 07/04/2023
- Re: [Star-hp-l] Run 14 data restoration, Mooney, Isaac, 07/04/2023
- Re: [Star-hp-l] Run 14 data restoration, Robert Líčeník, 07/06/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Robert Líčeník, 07/04/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Tanmay Pani, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Barbara Trzeciak, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Robert Líčeník, 07/03/2023
-
Re: [Star-hp-l] Run 14 data restoration,
Nihar Sahoo, 07/03/2023
Archive powered by MHonArc 2.6.24.