Computing

Data acquisition, processing, archiving and access

The CTA Observatory (CTAO) would not work without the software, powerful computers and vast amounts of storage required to combine and process what the individual telescopes see into usable scientific results. The challenge faced by the CTA Computing Department is to design and implement a system that supports everything from accepting observation proposals to scheduling observations, controlling the telescopes, processing and archiving the data at all levels, and disseminating data products and science tools to the public using open standards and FAIR (findability, accessibility, interoperability, and reusability) principles.

 

Because such a computing system does not exist as a stand-along product, the work of the Computing Department covers all steps from architectural design to construction, validation, deployment and maintenance. The technical challenges and long lifetime of the Observatory will necessitate the development and adoption of new techniques and technologies to meet the scientific demands. Even when built, long-term maintenance will not be simple: the software and hardware systems will need to be operated over the thirty-year lifetime of the Observatory, and the science data archive will continue to be operated for a further ten years after that. This means software systems engineering activities are as important as the code itself.

 

Since much of the software will be developed by diverse and distant teams in CTA institutions and industrial partners, this requires the coordination and release of a wide range of in-kind, in-house and externally developed contributions. With the fundamental principle of “Several Sites – One Observatory” the software will not be specific to a particular site but will form a distributed and centrally coordinated system.

The Computing Department will coordinate with multiple off-site data centre partners for its data processing and simulation needs and is directly responsible for the installation of the on-site data centres and control rooms at the two array sites. It will manage these activities centrally from the Science Data Management Centre (SDMC,) which will be located in a new building complex on the DESY campus in Zeuthen, just outside Berlin.

»The Observatory will generate hundreds of petabytes (PB) of data in a year (~3 PB after compression).«

The responsibilities of the Computing Department software include:
  • Control of telescopes and other array elements on the sites, including the collection and monitoring of data
  • Handling of astronomical transient events
  • Long-term preservation of data products
  • Processing of data into a form suitable for scientific analysis by Observatory science users
  • Distribution and user support of high-quality scientific data products and software tools necessary for both data access and data analysis
  • Production of up-to-date simulated data required for data processing during all phases of the Observatory
  • Contributing to the preparation of early science operation and scientific validation of the Observatory
  • Managing the lifecycle of observation proposals, from the web platform for their submission and implementing the optimal scheduling of observations for an accepted scientific program, to the execution of related observations and the processing and delivery of the data products to the science user.

 

The telescopes on the two CTA array sites will produce data of many hundred petabytes (PB) per year. They will be written after compression to a few PB per year to the off-site data centres for processing and storage. Additionally, a few tens of PB of simulated data will be produced and processed.

 

A schematic view of the control, data flows and computing environment is shown in the figure below.

»A workforce of a hundred software experts over five years is needed to build the software systems.«

The Computing Department is working to develop a package of hardware and software products to support this flow of data:

 

Software systems:
  • Array Control and Data Acquisition System (ACADA)
  • Data Processing and Preservation System (DPPS)
  • Science User Support System (SUSS)
  • Science Operations Support System (SOSS)

 

Hardware systems:
  • Off-Site Information and Communications Technology (Off-site ICT)
  • On-Site Information and Communications Technology (On-site ICT)
  • Array Clock System

 

Read more about these systems in the sections below.

Software Systems

 

The overview diagram below illustrates how the different systems interact with the primary processes behind the observatory’s science operations: the submission, execution and return of processed data associated with a scientific proposal.

ACADA – Array Control and Data Acquisition System

 

The ACADA encompasses all of the software responsible for the supervision and control of telescopes and calibration instruments at both CTA array sites, including the efficient execution of scheduled and dynamically triggered observation. The system will manage the data acquisition and compression of the raw data, as well as the generation of automatic science alerts. The ACADA also provides the user interface for the site operators and astronomers. Its sub-systems are as follows:

 

  • Resource Manager and Central Control
  • Human Machine Interface
  • Array Data Handler
  • Science Alert Generation Pipeline
  • Short-Term Scheduler
  • Transients Handler
  • Monitoring and Logging Systems
  • Array Alarm System
  • Array Configuration System
  • Reporting System

 

DPPS – Data Processing and Preservation System

 

The main purpose of the DPPS is to transform raw data products generated by ACADA into low-level science data products appropriate for science analysis, which are delivered to the SUSS for dissemination. It must ensure that all data products are preserved (replicated to at least two off-site data centres), of traceable and reproducible provenance and of the highest scientific quality. The latter is achieved by planning for the periodic re-processing of all data using updated techniques. The DPPS also provides continuous monitoring and quality reporting for its sub-systems and produces high-level science quality metrics and reports related to the services provided. Finally, the DPPS system provides the user interface to all of its sub-systems for specialists, such as the site operators and site astronomers. The DPPS will be implemented as a distributed system, deployed as a set of Data Processing and Preservation Nodes, which will run at the CTA-North and CTA-South data centres, on three CTA off-site data centres and at the SDMC Data Centre. The DPPS sub-systems are as follows:

 

  • Operations Management System
  • Computing Workload Management System
  • Bulk Archive and File Transfer Management System
  • Data Processing Pipeline System
  • Calibration Production Pipeline System
  • Data Quality Pipeline System
  • Simulation Production Pipeline System
  • Common Software Frameworks

 

SUSS and SOSS – Science User Support System and Science Operations Support System

 

The SUSS manages the software system for the high-level science operations workflows, from proposals to data delivery and user support, and is the main access point for the exchange of science-related products with the science users. It also provides the software for the observation planning with long-term to mid-term schedules, for the automatic generation and verification of high-level science data products, the Science Archive, the Science Analysis Tools and the Science Portal through which the software applications, services, data and software products are accessible. The SUSS sub-systems are as follows:

 

  • Proposal Handling
  • Long-Term and Mid-Term Scheduling
  • Automatic Data Product Preparation and Verification
  • Science Analysis Tools
  • Science Archive
  • Science Portal
  • Help Desk and User Support
  • Reporting/Diagnosis

 

The SOSS is a collection of software tools that support the systems involved in science operations workflows, such as ACADA, DPPS and SUSS. It allows the respective systems to access and share science operations-related information and configurations. It includes the means to track the state of the proposals and observations throughout their life cycle and the state of the CTA Observatory throughout the science operations workflow and science performance.

»Approximately 2000 state-of-the-art computer cores are required to handle data online.«

Hardware Systems

 

ICT – Information and Communications Technology

 

The ICT work packages are responsible for the on-site data centres and networks, the data transfer to and from the off-site data centres and the data processing and storage at those centres.

 

Array Clock

 

The Array Clock is the central hardware and software providing a distributed clock at each of the two sites and for the monitoring of the synchronization of the distributed timing system at the location of each array element. The precision clock is based on the White Rabbit standard.

Organisation

 

For each of the systems described above, there is a work package that comprises all the personnel and activities needed for the systems’ realization, integration and maintenance. The CTAO Computing Department is growing to meet the Observatory’s needs in constructing the CTA software products, establishing the necessary computing infrastructure for CTA’s data management and organizing the support for observation planning, data processing and simulations, data archiving and science user support. The organigram of the Computing Department is shown here:

Computing Contacts:

CTA Computing Coordinator: Stefan Schlenstedt

ACADA Work Package Coordinator: Igor Oya

DPPS Work Package Coordinator: Karl Kosack

SUSS/SOSS Work Package Coordinator: Matthias Füßling

Off-site ICT Coordinator: Nadine Neyroud