The organization of the project will be centered on four milestones covering 6 stages detailed in Work Plan

**Milestone1. Monte Carlo simulations of anisotropic flow **

Using hydrodynamic approach in the study of the expansion of nuclear matter resulted from high energy collision of nuclei started with Landau in 1953. Because of the simplicity and beauty of this models offered the possibility to calculate many observables for different types of interactions. Evolution of a hydrodynamic system can be described by the initial conditions and the equation of state. Equation of state is a function that links the energy density to the baryonic pressure density. EoS describes the effect of the pressure gradient on the fluid flow.

A phase transition from QGP to hadronic gas softens the equation of state: as the temperature crosses the critical point, the energy density and entropy density rises abruptly in the same time with a pressure growth. In peripheral collisions, the system will be expanded in the direction of the reaction plan rather than perpendicular to it, because of the asymmetry of the pressure gradient. As the system expands it will become less and less deformed.

In order to generate simulated data we will use several Monte Carlo simulation codes. One of them will be Ultra relativistic Quantum Molecular Dynamics (UrQMD). UrQMD is a simulation tool developed for the study of multifragmentation, collective flow, particle correlation etc. It has a hydrodynamic model based on a nuclear Equation of State. Initial condition of the collision sets the distribution of nucleons to a Gaussian probability distribution. Nucleon- nucleon interactions are based on a Skyrme classical state equation, including additional Yukawa and Coulomb potentials. For the simulation of the anisotropic flow several interactions of nucleus-nucleus at different energies. Also for a comparative study proton-proton interaction will be done.

Another Monte Carlo simulation code that will be used in comparison with UrQMD for data generation will be HIJING. HIJING is a Monte Carlo simulation package used for the study of p-p, p-A, and A-A interactions. Parton production in nucleus-nucleus interaction is modeled using a string fragmentation model. Initial condition are set using Glauber geometric model.

**Milestone2. Developing numerical methods and algorithms on GPU for the study of nuclear flow **

The study of nuclear flow in high-energy physics implies running a set of algorithms to determinate interested parameters. Usually, these parameters are determined from the Fourier expansion of azimuthal distribution. The most interesting parameters obtained from the azimuthal distribution are direct flow – v1, anisotropic flow v2 and rarely calculated, but sometimes used v4. Currently, algorithms for the study of nuclear flow exist to compute these parameters, but they run only on CPU. We believe that an implementation of these algorithms on GPU will most likely improve the performance and dramatically decrease the time required for the study. Also, there is no national or international effort in the implementation of these algorithms on GPU.

On CPU, the algorithms used to analyze data depend, of course, how the data is obtained. Such that, algorithms used to analyze experimental data are, in general, bigger and more complex. This is because all data is in laboratory system and the algorithms must do prior computing in the determination of flow parameters (v1, v2, v3, v4): estimate the reaction plane, acceptance correlations, particle distribution with respect to reaction plane, etc. If the analyzed data is obtained from a simulation code, the algorithms are much simpler. This is because the particles coordinate system is in the reaction plane and no other computations are needed.

Such that, a reasonably first implementation of the algorithms on GPU is their use to process simulated data. After this, the development of these algorithms will proceed further and as such they will be modified to support analysis of experimental data.

In the development of the algorithms on GPU, the programming and execution model is SIMD (Single Instruction Multiple Data). This means, only one function (running in multiple threads or instances) can run on the GPU at one time processing the data. In CUDA terminology, this function is named a kernel.

A first approach is to use the simple case possible in the implementation of the algorithms on GPU: the obtained data (simulated or experimental) is transferred into RAM (Random Access Memory), then copied into device RAM (DRAM), after this the CUDA kernel is launched to process it. After all data is processed, the final stage is to retrieve the results back in RAM.

This naive implementation may not bring a great increase in the performance of GPU algorithms and some optimization techniques will be considered.

The most important performance challenges for CUDA developers is the optimization of device memory accesses. One may consider three ways to optimize access to DRAM memory spaces: global, shared and page-locked memory. Different memory spaces have different bandwidths and latencies and this has impact on the design, development, optimization of the algorithms and their performance on GPU.

**Milestone3. Interfacing Grid with GPU **

This task aims to develop the software tools and to design the computer architecture in order to use the power of the Grid – large storage capacity – and the power of GPU for running parallel processes. The two computing architecture are very promising but the Grid computing in HEP, no matter is AliEn or gLite, has limitations for advanced parallel computing and the GPUs have limitations for storing large quantities of data. Present project proposes an implementations of GPUs on Grid in order to enforce the two computation architecture for researches in HEP and in particular for nuclear flow in nuclear collisions.

The working group of this project has members that are experts in working with large quantities of data in Grid and also for managing large numbers of jobs running on it for ALICE, H1 and ATLAS Collaborations. We are intended to use the local and distributed storage capacity on the Grid.

There are two major tasks to be done in achieving this goal. Firstly is to implement in CUDA the process of storing of large numbers of big files and transferring them on the Grid. Secondly is to develop a framework based on a classical programming language for managing the big files stored on the local disk and the Grid, and also to manage transferring process on the GRID. This not at all a trivial task and for it the LHC collaborations paid a large time and effort. Until now such a work for implementations of GPU on GRID wasn’t done and much work has to be paid for developing the CUDA API for working in GRID.

This task is split in three subtasks:

• GPU server configurations for working with gLite middleware

• Developing software tools for managing the jobs that are running on GPU and the files store on disks

• Developing software tools for storing and retrieving files in/from gLite.

**Milestone 4. Benchmarking and comparison study **

In order to have a fair evaluation of the obtained results, tests and benchmarking must be done.

The methods and algorithms will be tested against data obtained from UrQMD simulations and experimental data from international experiments at RHIC: BRAHMS, STAR. In order to measure the performance on GPU we measure the execution time of the algorithms compared for different nuclear collisions (gold-gold and proton-proton) on different architectures: CPU vs GPU.

Tests for the validity of the developed algorithms will include histograms and graphics of interested parameters. These are necessary to compare them to already obtained results in the heavy ion physics field. Examples of such histograms include: flow parameters vs. transversal momentum v1,2(pt), flow versus rapidity v1,2(y), particle multiplicity (protons, pions) vs. transversal momentum.

Although, the results and comparisons will be done with simulated data and experimental data of past experiments, they might offer insights and provide the basic building blocks in the analysis of nuclear flow for the present and future experiments, such as ALICE and CBM.