Skip to main content

Functional Specification Document

1. Introduction

1.1 Purpose

This Functional Specification Document (FSD) describes the functional capabilities, features, user roles, and service offerings of Cirrus Cloud Platform (CCP) as deployed for the Client's sovereign cloud platform. It defines what the system does from a functional perspective and serves as the reference document for product understanding, testing, and stakeholder alignment.

1.2 Scope

This document covers the functional specification of the following CCP components:

  • Self-Service Console
  • Admin Console
  • Coredge Platform Services
  • Identity and Access Management (IAM)
  • Cluster Controller and Cluster Agent (Kubernetes)
  • CCP Core Microservices
  • Service Catalogue (MVP1 / MVP2 / MVP3)
  • User Onboarding and Platform Hierarchy
  • Pre-defined User Roles and Service-Specific Roles
  • High Availability, Regional Architecture, and Backup Strategy

1.3 Intended Audience

AudiencePurpose
Cloud Operations TeamUnderstand platform capabilities and service offerings
Product / Business TeamValidate functional requirements against business needs
QA / Testing TeamBasis for functional test case development
Security TeamReview of access control and identity management functions
Infrastructure TeamUnderstand pre-requisites and deployment constraints

1.4 Definitions and Acronyms

Term / AcronymDefinition
CCPCirrus Cloud Platform – Cloud Management Platform
CCPCirrus Cloud Platform – IaaS Orchestrator
CMPCloud Management Platform
BSS PortalBusiness Support System Portal – the Client's customer-facing subscription and identity platform
AZAvailability Zone
IAMIdentity and Access Management
RBACRole-Based Access Control
VPCVirtual Private Cloud
MaaSMetal as a Service
GSLBGlobal Server Load Balancing
OpenFGAOpen Fine-Grained Authorization – AuthZ engine used within CCP
MVPMinimum Viable Product
HAHigh Availability
DRDisaster Recovery
mTLSMutual Transport Layer Security
SMTPSimple Mail Transfer Protocol
NTPNetwork Time Protocol
DMZDemilitarized Zone
ETCDDistributed key-value store used by Kubernetes
PVCPersistent Volume Claim
ADFSActive Directory Federation Services

2. System Overview

2.1 Background

The Client is building a sovereign cloud platform for government and enterprise customers in the India region.

A combination of Cirrus Cloud Platform (Cloud Management Platform), Cirrus Cloud Platform (IaaS Orchestrator) and Cloud Orbiter (Kubernetes Orchestrator) will provide a unified cloud services platform layer for the Client's internal teams (Day 2 operations, business unit, security, FinOps, and cloud governance) and customers for delivering and accessing various services.

2.2 Current State

The Client Cloud is a new deployment and Cirrus Cloud Platform would be used for the Cloud Management Platform layer.

2.3 Key Platform Capabilities

Cirrus Cloud Platform will deliver the following key features of the Cloud Management Platform:

  • Self-service access for automated provisioning and deployments
  • Visibility across environments
  • Centralized management
  • Improved compliance and security
  • Optimized cloud spends

3. Functional Components

The Cirrus Cloud Platform (Cloud Management Platform) / Cloud Orbiter (Kubernetes Orchestrator) / Cirrus Cloud Platform (IaaS Orchestrator) consists of the key functional components listed below.

3.1 Self-Service Console

Primary interface for end users. User friendly facing UI, allowing users to manage and provision various infrastructure resources like VMs, storage, load balancers, etc., often through intuitive interfaces like drag-and-drop or simple forms. Also, allows organization administrators to create new Projects/Cells and manage user access to project/cell, define access control policies (who can access what resources), and ensure proper resource allocation and usage.

3.2 Admin Console (For Service Provider Only)

Provides an overall view of the entire OpenStack environment. It provides administrative UI to manage OpenStack environments, allocate resources, and oversee system health. It provides a management view of all infrastructure resources like VMs, Volumes, Load Balancers, Container Namespaces etc. It also provides insights into the overall health of the OpenStack environment, allowing for proactive maintenance and troubleshooting.

3.3 Coredge Platform Services

The Coredge Platform Services is composed of several microservices, each responsible for a specific set of functionalities, and they communicate through well-defined REST APIs and internal routing mechanisms.

It provides a rich set of APIs for resource allocation, availability zones, VM flavors, and user images, empowering users to efficiently manage and allocate resources based on their specific needs.

Also includes specialized microservices for resource management, Kubernetes orchestration, and storage management. Platform has an in-built robust API gateway to provide centralized access control and API logging, ensuring secure and authorized access to platform resources.

3.4 IAM (Identity and Access Management) Server

Authentication server provides identity and access management to CMP cloud users. It provides federation with external Identity Providers (like BSS Portal, ADFS). This is by default multi-tenant and has the capability to allow customer specific identity provider federation, ensures secure and isolated access for every customer. For each customer organization, it creates a unique account to allow identity segregation.

3.5 Cluster Controller (For Kubernetes Only)

Central entity managing all Kubernetes platform functionality, connecting and orchestrating customer Kubernetes clusters. It enables communication with Cluster Agent over port 8030/8040. This enables communication with clusters and allows centralization of Kubernetes APIs/CLI access.

3.6 Cluster Agent (For Kubernetes Only)

Deployed on each target Kubernetes cluster to enable management via the Controller. Cluster Agent initiates outbound connection towards Cluster Controller and once handshake is complete, Controller can provide commands to cluster and act as proxy to Kubernetes CLI/APIs.

4. CCP Core Modules

The table below lists all core microservice modules that form the Cirrus Cloud Platform platform, along with their functional descriptions.

S.No.ModuleFunctional Description
1orbiter-apiAPI server for orbiter – exposes K8s/cluster APIs for K8s cluster management and application deployment
2orbiter-controllerController for orbiter which handles the runtime. Backend engine for orbiter-api
3observability-uiUI service for cluster observability. It exposes cluster metrics like CPU, RAM usage etc.
4frontendCluster management UI service. Interacts with orbiter-api to expose various cluster level operations to end user like registration/removal of K8s clusters, application deployment on K8s clusters, container registry etc.
5workflow-controllerWorkflow provider for internal CCP workflows
6consoleUI service for CCP
7admin-consoleAdmin UI for CCP
8platformPlatform APIs for CCP comprising of compute, volume, core-mgmt, network etc. functionalities
9admin-platformAdmin Platform APIs for CCP to manage flavors, images, AZs, regions and other virtual resources and constructs
10celeryMultiple Celery services for different tasks
11authKeycloak Authentication Service
12core-mgmtProject manager service to manage organizations, cells, user mapping etc.
13ordr_mgmtService to push CRUD events externally
14socketioSocketio service to push events/notifications to console service
15onboardingService to onboard users and organizations
16platform-celeryInternal service to handle tasks asynchronously
17notificationNotification service for sending notifications to external messaging platforms – SMS, email (SMTP) etc.
18orbiter-authAuthorization gateway for the system
19orbiter-termTerminal access for Kubernetes based shell for clusters
20storage-pluginFor providing storage capabilities while integrating with NetApp
21baremetal-pluginFor providing baremetal server management while integrating with MaaS
22client-pluginFor enablement of client-specific custom flows
23orbiter-meteringFor metering / showback / quota management and licensing
24kafkaInternal messaging queue for components communication
25OpenFGAAuthZ Database for CMP authorizations

4.1 Database Components

S.No.Database ComponentVersion
1Redis (Cache)7.2.5
2Redis (Session)6.2.5
3PostgreSQL15.7
4MongoDB5.0.3

5. Service Catalogue

Client Cloud requires delivery of the services below from the Cloud Management Platform (Cirrus Cloud Platform) in a phased manner:

CategoryServicePhase
Compute ServicesVirtual MachineMVP1
Container as a ServiceMVP1
BareMetal as a Service (BMaaS)MVP1
Storage ServicesBlock StorageMVP1
Object StorageMVP1
File StorageMVP1
Network ServicesApplication Load Balancer (HTTP / HTTPS)MVP1
Network Load Balancer (TCP)MVP1
VPN Gateway – Site-to-Site ConnectionMVP1
VPN Gateway – Point-to-Site ConnectionMVP1
FirewallMVP1
Public IPMVP1
NAT Gateway (Internet Gateway)MVP1
Network ServicesVPC (Virtual Private Cloud)MVP1
Monitoring ServicesLog AnalyzerMVP1
Operational Metric CollectionMVP1
Alarm ServiceMVP1
Notification ServiceMVP1
Support ServicesBasic Support ServicesMVP1
Enterprise Support ServicesMVP1
Database ServicesManaged Database as a Service (Oracle and MongoDB)MVP1
Security ServicesSecurity Incident and Event ManagementMVP1
Log MonitoringMVP1
Cloud Workload ProtectionMVP1
Web Application FirewallMVP1
Foundation ServicesIdentity and Access ManagementMVP1
SMTPMVP1
Identity FederationMVP1
Multi Factor AuthenticationMVP1
DNSMVP1
NTPMVP1
Privileged Access ManagementMVP1
IP Address ManagementMVP1
Active Directory ServicesMVP1
Dual / Multifactor AuthenticationMVP1
Managed ServicesManaged ServicesMVP1
Backup as a ServiceBackup as a ServiceMVP1
Storage ServicesArchival StorageMVP2
Database ServicesMicrosoft SQL-as-a-Service – Standard EditionMVP2
Microsoft SQL-as-a-Service – Enterprise EditionMVP2
Microsoft SQL-as-a-Service – Web EditionMVP2
Managed Database as a ServiceMVP2
Databases LicensesMVP2
Network ServicesContent Delivery NetworkMVP2
MPLS Connectivity (Partner Interconnect)MVP2
MPLS Connectivity (Dedicated Interconnect)MVP2
Security ServicesCloud Based Hardware Security ModuleMVP2
Distributed Denial of ServicesMVP2
TLS / SSL Certificate ManagementMVP2
Encryption ServicesMVP2
Digital ForensicsMVP2
Additional ServicesQueue Services (Kafka as a Service)MVP2
Network ServicesBandwidth as a Service (QOS) (BWaaS)MVP3
Database ServicesManaged Database as a Service MariaDBMVP3
Managed Database as a Service NoSQLMVP3
Disaster Recovery as a Service (DRaaS)Disaster Recovery as a Service (DRaaS)MVP3
Additional ServicesMessage Broker ServicesMVP3
💡
Note
The above list of services may change in accordance with the guidance provided by the Client Business team. The Service Description Document for the above services can be referred to for detailed information about each service.

6. User Onboarding and Platform Hierarchy

6.1 Onboarding Flow

Onboarding of the Client's customers will be initiated on the BSS Portal which starts with self-registration by customers or with help from the Client business team.

Step a

Customer will order/subscribe to CCP on the BSS Portal. Upon subscription, the BSS Portal will call CCP APIs for creation of organization. Cirrus Cloud Platform will automatically configure and create the resources below for the new organization:

  • Default User roles for an organization (Organization Administrator and Cell Administrator)
  • Default project / cell / VPC in default region
  • Default service catalogue

Step b

Mapping between the BSS Portal and CCP will be developed in accordance with the guidance provided by business teams and will be enforced for billing, governance, and resource hierarchy.

Step c

The BSS Portal will serve as user identity store and provide authentication services. All customer user accounts can be created, modified, and deleted in the BSS Portal only.

Step d

Each customer account will be mapped with only a single Tenant in CCP. Multiple cells can be created within a single Tenant. Nesting of Tenants and cells is not allowed currently.

Step e

Quotas can be applied at tenant and cell level. All cells will inherit quota by default.

Step f

Resource Hierarchy will be maintained as:

Tenant → Cell → Resources

The BSS Portal to CCP mapping is as follows:

BSS PortalCCP
Party
Billing Account (BA)
Logical Subscriber Identity (LSI)
TenantTenant
Cell
Resources

Step g

Pre-defined roles will be mapped with the user identities.

6.2 Pre-defined User Roles

RolePermissions / Description
Tenant Super Administrator

Root User
This is the top-level role which can manage everything within a Tenant
Can create other Tenant Super Administrators
Can create Tenant Administrators

Tenant Administrator

This role has highest privileges in each tenant
Create Cell(s) and custom roles
Assign Quota for the cells
Raise request for increasing tenant quota
Provide access requests for tenant and cells
Can access usage and quota details

Tenant ViewerRead Only Rights for specific organization(s). This role is required for auditing, compliance, and training purpose.
Tenant Billing AdminAccess to Quota Usage, metering and showback
Cell Administrator

Raise request for increasing Cell quota
Full access to all resources in the Cell

Cell ViewerRead Only Rights for specific cell(s). This role is required for auditing, compliance, and training purposes.
Cell User

Access all services mapped with a cell
No access to Cell quota requests

6.3 Service-Specific Pre-defined Roles

RolePermissions / Description
Cell VM Admin

Access to VM-as-a-service in a Cell
Access to create and manage Block storage
Access to create and manage VM Snapshots
Access to create and manage networks
Access to create and manage Backups

Cell VM Reader

Read access to VM-as-a-service in a Cell
Read access to Block Storage, networks, VM snapshots, Backups

Cell Block Storage AdminAccess to Block Storage-as-a-service in a Cell
Cell Object Storage AdminAccess to Object Storage-as-a-service in a Cell
Cell File Storage AdminAccess to File Storage-as-a-service in a Cell
Cell Backup Admin

Access to Backup-as-a-service in a Cell
Access to VM Snap

Cell Network Admin

Admin access to Network-as-a-service in a Cell
Admin access to VPC
Admin access to Firewall
Admin access to Public IP
Admin access to VPN

Cell Container Admin

Access to Container-as-a-service in a Cell
Access to create and manage Block storage and networks

Cell BareMetal Admin

Access to Bare Metal-as-a-service in a Cell
Access to create and manage Block storage
Access to create and manage File Storage
Access to create and manage networks
Access to create and manage Backups

Cell Database Admin

Access to Database-as-a-service in a Cell (includes all DBaaS services)
Access to create and manage Block storage
Access to create and manage VM Snapshots
Access to create and manage networks
Access to create and manage Backups

Cell InfoSec AdminAccess to Activity Logs, Audit logs
💡
Note
Additional roles can be created on a custom basis by customers on need. Custom roles and service roles are planned for a future release.

7. Solution Design

The proposed architecture ensures high availability, fault tolerance, and efficient management for a multi-region Cloud Management platform. The platform is designed to support a CCP application with dual clusters per region, robust failover mechanisms, and global services. The architecture aligns with business continuity goals and optimal resource utilization.

Each Region consisting of multiple AZs will run independent Cirrus Cloud Platform components per AZ for all the microservices to manage infrastructure in that AZ. Cirrus Cloud Platform Root account services will run globally, which is responsible for aggregating organization specific data like metering, quota, project management. Furthermore, each region has two Cirrus Cloud Platform global services running in active-passive mode with their databases also in active-passive mode. Postgres and MongoDB clusters will run on virtual machines and will have different DB clusters in each zone working as active-passive clusters.

7.1 Regional Architecture

Each region contains:

Cluster 1 (Primary) – Availability Zone 1

  • Hosts main application services
  • Web layer is deployed in 3 virtual machines hosted in DMZ. Web layer acts as reverse proxy to access application hosted in the application layer
  • Contains the primary MongoDB database
  • Serves as an active cluster during normal operations

Cluster 2 (Standby) – Availability Zone 2

  • Hosts replica application services
  • Web layer is deployed in 3 virtual machines hosted in DMZ. Web layer acts as reverse proxy to access application hosted in the application layer
  • Contains a replica of the MongoDB database
  • Remains ready to take over in case of failure in Cluster 1

Failover Mechanism

  • Traffic to be routed to passive cluster automatically and script to promote database into passive cluster if active cluster is down
  • MongoDB replica sets ensure data consistency during failover within a region

7.2 Global Services

Global service provides multi-region capabilities ensuring the following:

Organization Onboarding

  • Centralized onboarding process replicated across regions
  • Ensures consistent user experience and service availability

Metadata Management

  • Centralized metadata replicated across three regions

Metadata that is replicated as global service component:

  • Organization and Project Metadata Mappings to Region
  • Quota management
  • User and Organization Mapping information
  • Aggregation of Metering and Usage data for Reporting and Notional Invoice

Active Backup Failover

  • GSLB probe to detect right endpoint to connect from an external system, allowing fallback to Backup when Active cluster is unavailable
  • Internal Quorum based on 2n+1 system ensuring correct identification of Active cluster being down

Disaster Recovery

  • Acts as a coordination point for global failover scenarios

7.3 Multi-AZ Failover

To ensure resilience within a region:

  • Both clusters are deployed across multiple availability zones (AZs)
  • If an AZ fails, services failover within the region without impacting on the overall operations
  • Load balancers and DNS routing ensure seamless redirection of traffic to active services

7.4 Extended Cluster for Global Databases

Global services are region specific and include mostly MongoDB collection which is storing Tenant/Project/User information hosted on clustered micro services with MongoDB Active-Active replication using change-stream.

OpenFGA Postgres and MongoDB, which will be DB backends for global AuthZ and global data service will have DB be running in Active-Passive mode between two regions. System will write to primary region OpenFGA by default as this is a read-heavy database.

There will be 3 VMs in each availability zone to form a 5-node cluster with an additional virtual machine which can be used as arbiter/etcd node to switch over in case of AZ failure. Deploying a 3+3 node setup distributes database responsibility evenly across two availability zones and ensures that no single AZ holds a disproportionate share of cluster's capacity or state. In the event of a failure in one AZ, the surviving AZ retains a full set of 3 nodes — ready to recover operations manually if quorum is lost. Even though quorum (typically 4/6) might break if an entire AZ fails, manual intervention allows safe failover and administrators can force reconfiguration (e.g., reinitiate leader election) in the surviving AZ.

Database Failover: A two-site solution for HA within a region has been considered due to unavailability of third region for deployment of arbiter node. Failover would be executed with help of script which will be developed in collaboration between Coredge and the Client.

7.5 Backup Strategy

Data from Active CCP cluster will be continuously backed up into a geo-replicated object storage bucket. Backup of north region CCP will be stored in south region and vice-a-versa. A scheduled backup job will be configured for incremental backup after every 30 minutes and full back up after every 24 hours with 3 months retention period. The backup data will consist of following files:

  • Keycloak PostgreSQL DB
  • Config Mongo DB
  • Metrics Mongo DB
  • ETCD DB of K8s cluster running CCP

Database clusters hosted in Virtual machines will be backed up using Veritas backup agent every 30 minutes and full back up after every 24 hours with 3 months retention period.

7.6 Implementation Considerations

1. Database Replication

  • MongoDB Replication: Cluster 1 hosts the primary database and Cluster 2 hosts a replica with automatic synchronization in real time
  • PostgreSQL Replication: Each region has an active standby database for Keycloak and CCP Application (using Logical Replication)

2. Networking

  • Intra-region: High-speed, low-latency networking between AZs ensures seamless failover and data synchronization
  • Inter-region: Dedicated network links or VPNs ensure secure and efficient communication between regions

3. Monitoring and Alerting

  • Integrated monitoring tools (e.g., Prometheus, Grafana) will track cluster and database health
  • Alerts will notify administrators of potential issues, triggering automated recovery workflows where possible

4. Security

  • Encryption in transit (mTLS) and at rest (AES-256) for all data
  • Role-based access control (RBAC) for applications and databases
  • Regular security assessments and compliance checks

8. Pre-Requisites

The pre-requisites below are required for deployment of CCP on Kubernetes cluster:

  • Wildcard SSL certificates for CCP hosting and dynamic customer account URLs
  • Load Balancer and VIPs for each CCP endpoint
  • DNS Server and credentials to create dynamic domains based on customer accounts
  • Accessible Container registry to store container images
  • Kubernetes compliant Storage with High IOPS performance
  • Connectivity and credentials for SMTP server for email integration
  • NTP and DNS server connectivity
  • Connectivity and APIs to integrate with the BSS Portal platform

9. Constraints and Dependencies

The Cloud Management Platform solution (i.e. Cirrus Cloud Platform / CCP / Cloud Orbiter) will be deployed in control planes of each availability zone. It should not be deployed in workload pod.

10. Exclusions

The following tasks are out of scope for Cirrus Cloud Platform:

  • Any hardware procurement and its deployment
  • Any software procurement and associated licensing (operating system, database, backup software, management software) and its deployment
  • Penetration Testing
  • Performance Testing for any other component other than CCP
  • Day2 operations for underlying infrastructure (Compute, Storage, and Network)
  • Any application / configuration changes in the BSS Portal

11. RACI Matrix

The below table provides a high-level view of key activities/tasks and corresponding stakeholders.

R = Responsible | A = Accountable | C = Consulted | I = Informed

#TaskRACI
1CCP Major / Minor UpgradeCoredgeCoredgeThe ClientThe Client
2OS patching and upgrades on CCP cluster VMsThe ClientThe ClientCoredgeCoredge
3CCP Kubernetes Cluster PatchingCoredgeCoredgeThe ClientThe Client
4Infrastructure for Management ClusterThe ClientThe ClientCoredgeCoredge
5Storage driver plugin details for PVCs in Management ClusterThe ClientThe ClientCoredgeCoredge
6SSL Certificates and LB configuration for all required domainsThe ClientThe ClientCoredgeCoredge
7Service DescriptionThe ClientThe ClientCoredgeCoredge
8Rate CardThe ClientThe ClientCoredgeCoredge