Concept and objectives:
Project background and concepts
In this project we aim to create a library collection of methods, algorithms for several application with the help of new tools developed using GPU(Graphics Processing Unit) programming languages connected with GRID technology through PKI (Public Key Infrastructure) certificate.
Logical Structure of the GUIDE project
The overall objective is outlined below.
GUIDEwill involve and address specific needs of a number of new multi-disciplinary scientific communities (space science, high energy physics, computational chemistry, IT security, etc.) and thus stimulate the use and expansion of the emerging new national HPC infrastructure and its services.
GUIDEwill capitalize on the existing human network and underlying research infrastructure to further strengthen scientific collaboration and boost more effective high-quality research and cooperation among participating research community and SMEs (Small and medium-sized enterprises) communities. The inclusion of the new Virtual Science Communities and the setting up of the infrastructure, together with a set of coordinated actions aimed at setting up HPC (High Performance computing), in Romania.
Objective 1 – Creating multi-disciplinary virtual organizations communities
A number of applications from diverse end-user communities are already running on the partner computing infrastructure. These applications are typically data-intensive and (not) massively parallel. The core objective of the GUIDE project is to engage multi-disciplinary research communities from the partners in close collaboration in a number of scientific fields with specific needs in massively parallel execution on powerful computing resources. The project aims to enable application porting and support for these major scientific fields on the partner HPC infrastructure. This strategy is envisaged to have a structuring effect for crucial partners communities, and also pave the way for new communities’ involvement in HPC usage. The target applications are from the fields typically in need for massive processing capabilities – space science, high energy physics, computational chemistry, IT security. Furthermore, a number of other disciplines will be supported. Opening up access to the partners infrastructure by beneficiary partners who do not possess HPC facility is seen as crucial for enabling new large-scale science and research competitiveness. The metric for this sub-objective is the number of applications deployed: 4 applications (methods, algorithms) – this will be achieved by the end of the project.
Overall, this project objective relates strongly to the following objective of the topic of the call:
Deployment of GRID and GPU in research communities in order to enable multidisciplinary
collaboration and address specific needs.
Objective 2 – Deploying integrated infrastructure for virtual research communities GUIDE will provide and operate the integrated GRID and GPU infrastructure and specifically the HPC eInfrastructure for partners. In the project context this focuses on operating the HPC infrastructure and specific end-user services for the benefit of new user communities.
The aim in this context is to form the partnership infrastructure, primarily consisting of current ISS (Institute of Space Science), coodinator (CO), resources, and to support integration of upcoming large HPC procurements in TRANS SPED Ltd partner 1(P1) and into the partnership infrastructure. The purpose of the infrastructure is to serve the new user communities, which typically do not have easy access to eInfrastucture facility. Moreover this objective includes the effective integration of the partnership end-to-end eInfrastructure (e-Infrastructure is the term used for the technology and organisations that support research undertaken in this way. It embraces networks, grids, data centers and collaborative environments, and can include supporting operations centers, service registries, single sign-on, training and help-desk services. Most importantly, it is the integration of these that defines e-infrastructure.). Special attention will be paid to achieving management tools integration and automation so as to minimise the human overhead, and enable transparent management of the partnership resources. The know-how regarding operations will be shared with other communities.
Overall, this project objective addresses the objectives of the topic of the call:
– Deploying end-to-end eInfrastructure services and tools, in support of virtual organisations in order to integrate and increase their capacities.
– Building virtual research facilities by coalition of existing resources (in project context this
refers to network, High Performance Computing Resources and the user layer), in order to
augment the capacities of research communities for experimentation.
State of the art:
The GRID activities in Europe are reaching maturity levels that could not be imagined only half a ten years ago.
The National Institutes for R&D and public institutions are interconnected between themselves with a powerful backbone of up to 10GB/s, connecting together research and educational institutions, together forming a RoEDU Net part from pan-European GÉANT network.
Grid activities in Romania have passed through the early phases of proof-of principle demonstrations, through setting up of production Grid operations, and are currently in the process of maturing towards a longer-term, sustainable pan-European model of hierarchical Grid organisation (European Grid Initiative, EGI) mirroring that of GÉANT in the field of research networking.
The data-intensive user communities, as well as those requiring high throughput, are catered for in the Grid – while applications with requirement of massively parallel execution are not fully catered for, thus requiring High-Performance Computing platforms (especially GPU-Graphic Processing Units).
Since 2006 a new technology has begun to emerge with the help of Graphics Processing Unit (GPU). NVIDIA launched that year an Application Programming Interface (API) named Compute Unified Device Architecture (CUDA) for NVIDIA graphical cards. After one year, in 2007, an open project begun to emerge, Open Computing Language (OpenCL) that can be used on different computing platforms of CPU and GPU (NVIDIA, ATI, IBM etc). The main advantage of using GPU programming over traditional CPU one is that graphical cards bring a lot of computing power at a very low price. Today a huge number of application (scientific, financial, etc) begun to be ported or developed for GPU usage, including Monte Carlo tools for data analysis tools for High Energy Physics, Computational Materials Science, Space Imagery, Information Security.
OpenCL(Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs and other processors, giving software developers portable and efficient access to the power of those heterogeneous processing platforms. OpenCL includes a language (based on C99) for writing kernels (functions that execute an OpenCL devices), plus API’s that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism.
OpenCL gives any application access to the graphics processing unit for non-graphical computing and is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL is managed by the non-profit technology consortium Khronos Group.
Modern processor architectures have embraced parallelism as an important pathway to increased performance. Facing technical challenges with higher clock speeds in a fixed power envelope, Central Processing Units(CPUs) now improve performance by adding multiple cores. Graphics Processing Unit (GPUs) have also evolved from fixed function rendering devices into programmable parallel processors.
As today’s computer systems often include highly parallel CPUs, GPUs and other types of processors, it is important to enable software developers to take full advantage of these heterogeneous processing platforms.
For this reason with the help of several partners, Khronos Group has released OpenCL.
OpenCL was initial developed by Apple Inc., with holds trademark rights, and refired into an initial proposal in collaboration with technical terms at AMD, IBM, Intel and Nvidia. Apple submitted this initial proposal to the Khronos Group. On June 16 2008 the Khronos Compute Working Group was formed with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18 2008. This Technical specification was reviewed by the Khronos members and approved for public release on December 8 2008.
The advantage of using high performance computing with GPU is that one can have the same computing power as a regular CPU cluster but at a lower cost and power consumption. The most important milestones in the development of these algorithms will be the identification of the most successful way in which a sequential CPU algorithm can be parallelized and the benchmark tests using 3 different computing architectures: one based on classical CPU, one based on GPU and a hybrid one.
GUIDE will build on the state of the art in networking and Grid developments in Romania, by using the equivalent approach to support the national for inclusion into the pan-European HPC ecosystem and empowering the regional user communities in the use of HPC.
GUIDE will go beyond the current state-of-the-art in national eInfrastructure development and use by building a national HPC facility, empowering the users in the region to use the facility as well as interface to pan-European user communities, and helping form stable, sustainable private-academic GRID and HPC initiatives in Romania, with the final result being the provision of fully integrated, state-of-the-art eInfrastructure services on the private-academic level.
Furthermore, GUIDE will enable HPC adoption in crucial strata of research communities in Romania, especially in the context of the specific applications, thus creating culture of cooperation of researchers and SMEs across different fields and allowing coordination of high-quality research in target research fields which benefit from HPC Infrastructure use.
Specifically on the user-community level, the project will have a structuring effect above all on the computational physics, computational chemistry, and life sciences fields in the region, fostering data sharing, common application deployment, and exploitation of scientific results. Currently, although a number of scientific collaborations exist in these target fields, the actual deployment of scientific services and target applications on the available HPC is very limited.
In the following paragraphs we briefly present the issues of coordination of high-quality research in the context of specific new user communities and added value gained from their structuring on the national level and their usage of the eInfrastructure.
GPU technologies (CUDA and OpenCL) implemented at the Institute for Space Sciences will be used in a number of ways. Geometric modeling of natural surfaces (described by fractal geometry) will be used for the evaluation of the scattered field of electromagnetic waves from these surfaces . Second application field will be the study of N-body problems in nuclear physics at low, high energies and astrophysics; The third will be Monte Carlo simulations of nucleus – nucleus interaction in high energy physics, and the fourth will be running parallel computing algorithms for astrophysics, cosmology and High Energy Physics.
In the context of the astrophysical community, the main aim will be to bring the existing occasional research contacts to the full collaborative research effort in the international collaboration (ESA), and to establish the community which will exchange people (both students and experienced researchers) and know-how, share resources that already exist or will be deployed during the project, and work together on implementing and using HPC algorithms and codes developed. The partnership on efficient implementation, deployment and porting of widely used HPC libraries will be also one of the important aims of the high energy physics community.
Huge quantities of satellite images are available from various Earth Observation sites. These archives enable the creation of Satellite Image Time Series (or SITS) which are a set of satellite images describing the same scene at different acquisition times. Such SITS are large, complex data sets, embedding spatial, spectral and temporal information. The development of effective methodologies for analysis of SITS is a challenging issue.
The recent technological developments have made possible the continuing rise of temporal, spatial and spectral resolutions of the acquired satellite data. Hence, in addition to increasing the accuracy of the description of objects and phenomena from the ground, an increase in the volumes of satellite data is manifest, too. This requires an automatic data processing for the extraction of useful information from SITS. This kind of data can be investigated using computing techniques such as data mining for discovering relationships and regularities, which are not explicitly contained in data, by extraction of spatial, temporal and spatio-temporal patterns. They allow the revelation of an “organization” of variables in the spatial, temporal and spectral fields resulting from the structure and evolution of terrestrial coverage, objects or phenomena.
To allow a complete and systematic exploitation of the huge data mass in satellite data archives, it is necessary to develop specific processing techniques and to take advantage from the facilities of modern computing systems adapted to the analysis of large amounts of data.
For the analysis of SITS, we developed an original data mining approach for characterization of pixel evolutions and sub-evolutions by extracting grouped frequent sequential patterns (GFSP) from the sequences generated by a SITS, implemented in the SPATPAM prototype. Generally this kind of analysis is characterized by a large amount of input data, a search space that grows exponentially with the number of acquisition moments and a lot of solutions exceeding typical human ability of interpretation. In order to reduce the search and solution spaces, we introduced anti-monotone constraints, such as frequency and pixels connectivity, to extract GFSP that cover a minimum surface and affect pixels that are sufficiently connected.
In the frame of this project, we propose an improvement of the modality to adapt SITS data to the input data format of the data mining pattern extraction algorithm. This improvement makes more efficient this preprocessing stage of our approach. In order to speed up this stage, we propose a parallel processing by dividing the satellite input images in several mini-images. We will use the parallel processing framework PROOF of the ROOT system developed for High Energy Physics applications and which is able to handle and analyze large amounts of data. Furthermore we intend to modify the preprocessing stage for changing the former input format of the data mining algorithm (an ASCII file that implied a time and memory consuming processing) by storing the sequence database of SITS directly in the memory.
P1 (TRANS SPED)
Trans Sped is the official partner in Romania of major suppliers of products and services from the field of information technology. The quality, feasibility, international acknowledgement and reputation of such products and services come together under the Trans Sped name, the only local supplier for such solutions. Trans Sped concerns to develop in partnership applications to test the vulnerabilities of the PKI (Public Key Infrastructure).
TRANS SPED is accredited certification authority , issuing digital certificates to qualified electronic signature according to Law 455/2001, GD 1259/2001 and the European Union Directive 1999/93/EC.
Trans Sped is the most experienced provider of security solutions based on qualified digital certificates, working on the local market since 2004. Trans Sped services are offered in collaboration with the German company TC TrustCenter (www.trustcenter.de), a global leader in providing certification services. This gives qualified certificates issued by Trans Sped international recognition and guarantees interoperability with various types of applications.
Trans Sped has an agreement with ChosenSecurity Inc. from U.S. company, but also with leading manufacturers worldwide – E-Lock, Gemalto, Verizon, with the support which we can offer quality products and services internationally through local expertise.
Trans Sped provides integration solutions for digital certificates in local networks in web applications, Microsoft Exchange / Lotus Notes or VPN networks, and a wide range of applications, from desktop software wizard, which allows the generation of electronic signatures and their verification, signing and encryption software for documents of any format, up to complex software solutions and integration services in the electronic signature applications of any kind.
Trans Sped realized the most important implementation of certification solutions from a Romanian organization in the competitive U.S. market. Trans Sped certificates become part of Science Applications International applications offered Corporation – SAIC ( www.saic.com ).
A major project, developed by this company is for the biopharmaceutical industry and medical. Thus, with ChosenSecurity, It was selected by SAIC (Science Applications International Corporation) to provide qualified certificates to the largest manufacturers of drugs, SAFE-BioPharma Association member. SAFE-BioPharma creates and manages identity in the virtual environment and electronic signature standards for pharmaceutical and medical industry ( www.safe-biopharma.org ). Certificates issued under this project are used for signing the laboratory, the test results or medical records.
Recognizing the quality of services and technologies on Trans Sped is that its authority was cross-certified with the Federal Bridge of U.S., which gives reliable electronic signature certificates issued by it to be accepted in relation to all government institutions in the U.S..
As seen in Figure 2 has been a group of four bridges in a network certification of trust.
- Federal Bridge of U.S. – includes all federal agencies:
- Department of State
- State of Illinois
- ACES / Digital Signature Trust
- U.S. Patent & Trademark Office
- Wells Fargo
- Government Printing Office
- Department of Justice
- United States Postal Service
- Department of Defense
- CertiPath (aerospace and defense industry) – include:
- Northrop Grumman
- Lockheed Martin
- HEBCA – Higher Education Bridge CA
- SAFE Bio-Pharma (which is found Sped Trans authority) – the biopharmaceutical and medical
To summarise, integrated state-of-the-art services will be provided on the scientific level by integrating the HPC infrastructure, the underlying network, middleware, and end-user applications software, to provide a seamless environment to research and private communities.
On the other hand co-ordination of high-quality research in target research fields which benefit from eInfrastructure use will be carried out by bringing together a critical mass of researchers from diverse fields and providing them with access to state-of-the-art integrated Infrastructure services.
The proposal writers have been active in the development of GPU and GRID technologies and one of the partener is SMEs (P1).
Aligning the project activities with ESA programs:
The present project has proposed to development a new data centre facility by integrating the two computing technology GPU and GRID from ISS.
EUCLID mission could take advantage of the new facility for space imagery, a high performance data processing and data storage. The ISS is member of EUCLID mission.
The project is also in accordance with the Space Situational Awareness (SSA) Preparatory Programme to “develop pilot data centers to prepare for the provision of future services in different areas …” (http://www.esa.int/esaMI/Operations/SEMFSG6EJLF_0_iv.html). The new facility will take advantage of the two technologies: gridification of jobs in GRID and parallelization with GPU.