ICOS

ICOS MetaOS. Yes but, what is a MetaOS?

by UPC | January 2025

Operating Systems (OS) were conceived in the mid-1960s and had originally been designed to address two main challenges: providing transparent, efficient, robust and secure resource management, as well as facilitating computer usage while enabling their portability. Note that before the advent of OS, computers could also run programs, but all resource management and fault tolerance tasks had to be programmed specifically and included as part of the code itself, which added a huge and complex burden to programmers and severely limited the portability of programs.

A Meta-Operating System (MetaOS) for the continuum can thus be defined as a system of OS across the continuum. Similar to OS, the goal of a MetaOS is to provide transparent, efficient, robust and secure management, facilitating the use of the continuum while enabling portability (as well as interoperability in this case). Note that the cloud continuum is already being widely utilized today; however, the programmer (or the operator) must build ad hoc solutions that use a plethora of technologies that are not always compatible or easy to interconnect.

In this blog entry, we describe the complexity of the cloud continuum management and present ICOS, an intelligent and secure solution of MetaOS for the continuum.

The complexity of the cloud continuum

A typical cloud continuum scenario is illustrated in Fig. 1. This includes a certain number of cloud computing facilities, either public or private, and a more or less large number of devices at the edge, with more (servers, datacenters) or less (low-power devices) computing capacity. Cloud computing facilities provide unlimited and ubiquitous computing and storage capacity, but their remoteness become a challenge: user data must be transported to the cloud and, therefore, applications expose high latency inherent to distance, while users lose their privacy. Alternatively, devices at the edge (for instance, local datacenters, servers, or even nearby computing devices such as smartphones, in-car computers, or embedded computing facilities around the city) provide limited computing capacity, but their neighboring location reduces network data transfer and latency, while preserves data privacy.

A balanced combination of the capabilities of both technologies, cloud and edge computing, according to the specific requirements of the workloads considered, is essential for an efficient and effective management of the continuum. But this is not an easy task: several critical and complex challenges must be addressed. For instance, and just to name a few:

  • The continuum is made up of a large number of heterogeneous computing devices and systems, some of which are mobile, and some of which are dynamically joining and leaving the system. This requires a sophisticated resource management mechanism, providing discovery, as well as onboarding and disconnecting capabilities. It also requires maintaining up-to-date information about the resource catalog, their technological characteristics, their interconnection topology, and their real-time availability.
  • Containerization is the trending technology that allows an application to run on different nodes with different architectures and operating systems. However, there are different platforms for managing containers at runtime, with Kubernetes and Docker being the most popular, but not the only existing technologies. Transparent application execution across the continuum requires that one application be executable on any node, regardless of the containerization technology on that node.
  • This problem becomes more challenging when dealing with multicomponent applications, where different components of the same application might eventually be allocated on different nodes with different technologies. Each component should have the option to be offloaded to any node, and all components should be able to communicate and coordinate transparently, regardless of the local host technology.
  • Data can be produced, consumed, and stored, at any node in the continuum. Furthermore, data will be live, so they could be transferred, transformed and/or replicated along the continuum, as long as privacy constraints are guaranteed. Effective data management mechanisms must be provided to keep track of the data assets, implement effective and efficient data transfer operations, and provide users with easy and transparent data access interfaces.
  • In such a complex and heterogeneous scenario, deciding the optimal workload placement (scheduling) along the continuum is a major challenge. Solutions need to balance the advantages of the cloud (more capabilities) and the edge (closer data), consider the interrelationship between the different application components, and consider the flow of data between the distributed nodes during runtime. Furthermore, solutions should be dynamic (rescheduling) in order to react in case some execution issue limits the expected performance.
  • In the continuum there are a large number of devices generating huge amounts of data. In this context, a large number of decisions must be constantly taken. This is a perfect scenario where intelligence might be generated and consequently used to support the multiple decision-making processes along the continuum. Therefore, a seamless integration of an intelligence layer in the cloud continuum is of outmost importance.
  • And finally, by conceiving an open, dynamic, mobile, and highly alive environment like the continuum, the door is opened to a huge variety of unforeseen security risks. A thorough security architecture design and seamless integration with the continuum will be critical for the secure success of a metaOS in the continuum.

ICOS MetaOS: Our intelligent and secure solution

ICOS has been conceived as a meta-operating system for the cloud continuum, i.e., to facilitate easy and transparent use of the system, from the cloud to the edge, while providing efficiency, interoperability, robustness, and ensuring security. It has been designed to address the above-mentioned challenges and therefore offers the following features:

  • Comprehensive resource management subsystem with capabilities to dynamically onboard infrastructure, providing detailed information about the resource features and runtime availability. This information has been enhanced to include forecasted behavior (see more features below).
  • Sophisticated runtime subsystem designed to be technology-agnostic and interoperable. In the current implementation, Kubernetes and Docker containers are considered. Additionally, interoperability is allowed intra-cluster and inter-cluster for Kubernetes containers.
  • Effective orchestration subsystem for multicomponent applications, selecting the most appropriate nodes along the continuum for workload execution and considering intercomponent runtime affinity as well as the location of data assets.
  • Complex, user-specified policy management subsystem that allows performance requirements to be specified, and remediation actions to be taken in case of underperformance. The remediation actions include rescheduling application components to mitigate performance effects in the original layout.
  • Efficient data management subsystem to facilitate data access and optimize internal data flow across the continuum. The subsystem also considers potential data privacy constraints and performs optimized data transfer along the continuum.
  • Thorough telemetry management and intelligence subsystems with capabilities to train ML models, both centralized and federated, to provide intelligence in the decision-making procedures as well as forecasted system characteristics. Furthermore, sophisticated mechanism for user-defined telemetry generation and federated training to generate a new user-level intelligence model on-the-fly.
  • Accurate security subsystem to ensure safe and seamless system operation.

The ICOS metaOS is yet in a preliminary stage and some of its functionalities offer limited scope, so there is still much work to be done. However, we believe this is one step forward in paving the way towards fully functional and operational meta-operating systems for the cloud continuum.

Summary photo
UPC
Funded by European UnionPart of EUCloudEdgeIoT.eu

This project has received funding from the European Union’s HORIZON research and innovation programme under grant agreement No 101070177.

©2025 ICOS Project