Big data analytics as a service infrastructure: challenges, desired properties and solutions [Version 0]
12 March 2016 | By
CERN’s accelerator complex generates a very large amount of data. A large volumen of heterogeneous data is constantly generated from control equipment and monitoring agents. These data must be stored and analysed. Over the decades, CERN’s researching and engineering teams have applied different approaches, techniques and technologies for this purpose. This situation has minimised the necessary collaboration and, more relevantly, the cross data analytics over different domains. These two factors are essential to unlock hidden insights and correlations between the underlying processes, which enable better and more efficient daily-based accelerator operations and more informed decisions. The proposed Big Data Analytics as a Service Infrastructure aims to: (1) integrate the existing developments; (2) centralise and standardise the complex data analytics needs for CERN’s research and engineering community; (3) deliver real-time, batch data analytics and information discovery capabilities; and (4) provide transparent access and Extract, Transform and Load (ETL), mechanisms to the various and mission-critical existing data repositories. This paper presents the desired objectives and properties resulting from the analysis of CERN’s data analytics requirements; the main challenges: technological, collaborative and educational and; potential solutions.