Skip to Content.
Sympa Menu

star-fcv-l - Re: [Star-fcv-l] [Starsoft-l] Hard to accessing full statics of datasets, any solution?

star-fcv-l AT lists.bnl.gov

Subject: STAR Flow, Chirality and Vorticity PWG

List archive

Chronological Thread  
  • From: "ChuanFu" <fuchuan AT mails.ccnu.edu.cn>
  • To: "Cameron Racz" <cracz001 AT ucr.edu>, "Ding Chen" <dchen087 AT ucr.edu>, "STAR Software issues of broad in" <starsoft-l AT lists.bnl.gov>
  • Cc: jeromel <jeromel AT bnl.gov>, STAR Flow,Chirality and Vorticity PWG <star-fcv-l AT lists.bnl.gov>
  • Subject: Re: [Star-fcv-l] [Starsoft-l] Hard to accessing full statics of datasets, any solution?
  • Date: Wed, 1 Dec 2021 13:32:27 +0800

Dear Racz and Ding,
                      I also meet the similar issue if the input picoDst is from local (root://xrdstar.rcf.bnl.gov:1095//home/starlib/home/starreco/reco/....).
The lost events will be reduced obviously when I used the following mothod ( for 3.85 GeV):
1) Get the full data list (~13000 picoDst) using the following code:
get_file_list.pl -keys path,filename -cond production=P19ie,library=SL20d,trgsetupname=production_3p85GeV_fixedTarget_2018,filetype=daq_reco_picoDst,filename~st_physics,storage=LOCAL -limit 0 -delim "/" > 3p85_local.list
2) Divide the full data list into 4 sublists (sublist1, sublist2, sublist3, sublist4)
3) Submit jobs using '.xml' (<job fileListSyntax="xrootd" maxFilesPerProcess="10" simulateSubmission="false">) with input sublist1,
after 1~2 hours (depond on how many your jobs are running, if your jobs do not start to run, we need to wait more times) 
then submit jobs with input sublist2, after 1~2 hours then submit jobs with input sublist3 ...
The purpose of this is to avoid many picoDst from local (such as more than 500 jobs) are being read at the same time.
Here is my submission scripts: /star/u/fuchuan/3_85FXT/Analysis/v0Tree_Proton_Lm/submitAll.sh (and submit.xml)
4) After all jobs are finished (about half day), you can find the input picoDst which is not read in your log files and resubmit those picoDst list.
Here is my script for finding the unreadable picoDst: /star/u/fuchuan/3_85FXT/Analysis/v0Tree_Proton_Lm/Find3011Err.sh

I am not sure the above method is useful for you, but you could try it if you have not better method.

Best regards,
Chuan
 
------------------ Original ------------------
From:  "Cameron Racz via Star-fcv-l"<star-fcv-l AT lists.bnl.gov>;
Date:  Wed, Dec 1, 2021 11:12 AM
To:  "Ding Chen"<dchen087 AT ucr.edu>; "STAR Software issues of broad in"<starsoft-l AT lists.bnl.gov>;
Cc:  "jeromel"<jeromel AT bnl.gov>; "STAR Flow,Chirality and Vorticity PWG"<star-fcv-l AT lists.bnl.gov>;
Subject:  Re: [Star-fcv-l] [Starsoft-l] Hard to accessing full statics of datasets, any solution?
 
To add some more data to this I’d just like to add that, for my analysis of production_3p85GeV_fixedTarget_2018 (library SL20d), the amount of the picoDsts I can access fluctuates wildly between every attempt to analyze it. I should see around 275M good events and my most recent attempt accessed less than 25M successfully.

Since my flow analysis requires multiple iterations over the same data it’s becoming difficult to get any meaningful results. I will also be needing to reliably access the 7.2 GeV data that Ding is mentioning to fully prepare for the Quark Matter conference and this data problem is really slowing progress down for that.

Cameron Racz
Graduate Student
Dept. of Physics & Astronomy
University of California, Riverside




On Nov 30, 2021, at 9:32 PM, Ding Chen via Starsoft-l <starsoft-l AT lists.bnl.gov> wrote:

Dear FCV and experts,

I want to complain it is hard to access the full statistics of many datasets and it's not just me.

For Run19 19.6 GeV, analyzers find themselves needing to re-submit more than 8 times to get more than 80% of the statistics.

For Run18 FXT 3 (3.85)  GeV data, analyzers find the statistics are 20% less, need re-submit multiple times to reach 90% of the statistics.

For Run18 FXT 7.2 (26.5) GeV data, I can only get less than 50% percent of full statistics.

Adding to that, many are bugged with the notorious "3011" error, which will kill the whole job if one file has such an error.

When the 7.2 GeV data was stored at NFS, I had no issue accessing the full statistics. I suspect that it's due to some issues on distributed disk (DD), or the communication with it, but I'm no expert on that.

Since it impacts many analyses. I'd like to know why this is the problem and more importantly if there's any solution to that?

Best regards,
Ding
--
Ding Chen
Graduate student - University of California, Riverside

_______________________________________________
Starsoft-l mailing list
Starsoft-l AT lists.bnl.gov
https://lists.bnl.gov/mailman/listinfo/starsoft-l




Archive powered by MHonArc 2.6.24.

Top of Page