Skip to Content.
Sympa Menu

sphenix-software-l - Re: [Sphenix-software-l] embedding filesize too large

sphenix-software-l AT lists.bnl.gov

Subject: sPHENIX discussion of software

List archive

Chronological Thread  
  • From: "Huang, Jin" <jhuang AT bnl.gov>
  • To: "Pinkenburg, Christopher" <pinkenbu AT bnl.gov>, "sphenix-software-l AT lists.bnl.gov" <sphenix-software-l AT lists.bnl.gov>
  • Subject: Re: [Sphenix-software-l] embedding filesize too large
  • Date: Wed, 7 Oct 2015 15:14:37 +0000

Hi, Chris,

Thanks for running the test production.

> The macro should contain the stripping of the truth info:
> /gpfs02/phenix/prod/sPHENIX/preCDR/pro.1-
> beta.5/embedding/emcstudies/pidstudies/spacal2d/G4Setup.C

Meanwhile, it appears the main production macro used PHG4ParticleGenerator,
which do not set the embedding flag for the new particle. This would drop the
truth info from the incoming simulation. A suggested way in pull request 66
is
https://github.com/blackcathj/macros/blob/SinglePart_master_embed_prod/macros/g4simulations/Fun4All_G4_sPHENIX.C

which use PHG4SimpleEventGenerator and set the embedding flag properly and
reuse the Hijing vertex.


> The next step would be to not save the absorber hits but then we loose all
> information about the energy in the absorber.

Yes. In my test production presented in pull request 66, the output size is
after stripping away absorber hits. It reduce to 30MB/event in my test
output. The macro is also the one mentioned above


> Basically we need a module
> which analyzes them and writes some summary (one G4Hit with all energy?).

In this particular embedding study (embedded emcstudies/pidstudies), the
absorber hits are not used at all. Data are processed as in experiment that
only hits in the scintillator volume are used.

A general purpose absorber-hit module is nice, but probably need more
discussion, depending on which embedding study would use that, probably after
pre-CDR.


> But in the end the whole approach we took is a dead end - we just multiply
> the size of the hijing input. We can either come up with a way to
> synchronize
> this so we read the original hijing files to get the hijing hits instead of
> saving
> them or we start to write out cells or towers which would reduce the output
> to a manageable size.

Saving SIM towers would save lots of space in particular for CEMC (save 15MB
out of 30MB event) and HCal IN (save 6 MB/event out of 30MB event). However,
this choice would also cut the truth tracing for the calorimeters at this
stage and the evaluator would not run with this tower output.

I prefer not to do so. However, in this particular embedding study (embedded
emcstudies/pidstudies), I could choose to drop that. Therefore, if the file
size is still too large after dropping the absorber hits, I will be happy to
make&verify a macro that further skim the output size by saving only towers
for the calorimeters.

Another choice would be strip away PHG4Hit in calorimeter that is not
embedded at the end of production. Then these hits have to be recovered in
the first step of the analysis stage using a new module to merge in the
higjing hits. I am OK with this choice too and I will be happy to make this
module.

I think the longer term solution would be saving the towers + a DST-storaged
truth association object that direct link calorimeter towers to the primary
particles. It make the output size compact.


Cheers,

Jin



______________________________

Jin HUANG

Brookhaven National Laboratory
Physics Department, Bldg 510 C
Upton, NY 11973-5000

Office: 631-344-5898
Cell: 757-604-9946
______________________________

> -----Original Message-----
> From: sphenix-software-l-bounces AT lists.bnl.gov [mailto:sphenix-software-l-
> bounces AT lists.bnl.gov] On Behalf Of pinkenburg
> Sent: Wednesday, October 7, 2015 10:07 AM
> To: sphenix-software-l AT lists.bnl.gov
> Subject: [Sphenix-software-l] embedding filesize too large
>
> Hi Jin,
>
> (I just send this to the list since this is a real issue with the embedding
> for
> everyone)
>
> after seeing what size 10 events produce I started just a few jobs yesterday
> evening. The output size just blows the bank (450 events so far with a 60GB
> output file) and - probably because of the i/o - is glacially slow (for G4
> these
> are single particle sims, the 5000 events should have been done by now). The
> macro should contain the stripping of the truth info:
>
> /gpfs02/phenix/prod/sPHENIX/preCDR/pro.1-
> beta.5/embedding/emcstudies/pidstudies/spacal2d/G4Setup.C
>
> The next step would be to not save the absorber hits but then we loose all
> information about the energy in the absorber. Basically we need a module
> which analyzes them and writes some summary (one G4Hit with all energy?).
> We should also just remove all the saving of the forward stuff (steel doors,
> black holes). We should go over the original hijing files, since they are
> the
> vast majority of what is in the embedding output we can see where we get
> the biggest savings.
>
> But in the end the whole approach we took is a dead end - we just multiply
> the size of the hijing input. We can either come up with a way to
> synchronize
> this so we read the original hijing files to get the hijing hits instead of
> saving
> them or we start to write out cells or towers which would reduce the output
> to a manageable size.
>
> Chris
>
> P.S. so far we are using 230TB of space on a filesystem which is meant as
> buffer for the production. This puts us where this becomes an issue for the
> production we want to start. What saves us is that the working groups are
> not using the space they have requested. There are still
> 50000 jobs in the queue. We have to do something.
>
> --
> *************************************************************
>
> Christopher H. Pinkenburg ; pinkenburg AT bnl.gov
> ; http://www.phenix.bnl.gov/~pinkenbu
>
> Brookhaven National Laboratory ; phone: (631) 344-5692
> Physics Department Bldg 510 C ; fax: (631) 344-3253
> Upton, NY 11973-5000
>
> *************************************************************
>
> _______________________________________________
> Sphenix-software-l mailing list
> Sphenix-software-l AT lists.bnl.gov
> https://lists.bnl.gov/mailman/listinfo/sphenix-software-l




Archive powered by MHonArc 2.6.24.

Top of Page