Functional Specification Document
1. Introduction
1.1 Purpose
This Functional Specification Document (FSD) describes the functional capabilities, features, user roles, and service offerings of Cirrus Cloud Platform (CCP) as deployed for the Client's sovereign cloud platform. It defines what the system does from a functional perspective and serves as the reference document for product understanding, testing, and stakeholder alignment.
1.2 Scope
This document covers the functional specification of the following CCP components:
- Self-Service Console
- Admin Console
- Coredge Platform Services
- Identity and Access Management (IAM)
- Cluster Controller and Cluster Agent (Kubernetes)
- CCP Core Microservices
- Service Catalogue (MVP1 / MVP2 / MVP3)
- User Onboarding and Platform Hierarchy
- Pre-defined User Roles and Service-Specific Roles
- High Availability, Regional Architecture, and Backup Strategy
1.3 Intended Audience
| Audience | Purpose |
|---|---|
| Cloud Operations Team | Understand platform capabilities and service offerings |
| Product / Business Team | Validate functional requirements against business needs |
| QA / Testing Team | Basis for functional test case development |
| Security Team | Review of access control and identity management functions |
| Infrastructure Team | Understand pre-requisites and deployment constraints |
1.4 Definitions and Acronyms
| Term / Acronym | Definition |
|---|---|
| CCP | Cirrus Cloud Platform – Cloud Management Platform |
| CCP | Cirrus Cloud Platform – IaaS Orchestrator |
| CMP | Cloud Management Platform |
| BSS Portal | Business Support System Portal – the Client's customer-facing subscription and identity platform |
| AZ | Availability Zone |
| IAM | Identity and Access Management |
| RBAC | Role-Based Access Control |
| VPC | Virtual Private Cloud |
| MaaS | Metal as a Service |
| GSLB | Global Server Load Balancing |
| OpenFGA | Open Fine-Grained Authorization – AuthZ engine used within CCP |
| MVP | Minimum Viable Product |
| HA | High Availability |
| DR | Disaster Recovery |
| mTLS | Mutual Transport Layer Security |
| SMTP | Simple Mail Transfer Protocol |
| NTP | Network Time Protocol |
| DMZ | Demilitarized Zone |
| ETCD | Distributed key-value store used by Kubernetes |
| PVC | Persistent Volume Claim |
| ADFS | Active Directory Federation Services |
2. System Overview
2.1 Background
The Client is building a sovereign cloud platform for government and enterprise customers in the India region.
A combination of Cirrus Cloud Platform (Cloud Management Platform), Cirrus Cloud Platform (IaaS Orchestrator) and Cloud Orbiter (Kubernetes Orchestrator) will provide a unified cloud services platform layer for the Client's internal teams (Day 2 operations, business unit, security, FinOps, and cloud governance) and customers for delivering and accessing various services.
2.2 Current State
The Client Cloud is a new deployment and Cirrus Cloud Platform would be used for the Cloud Management Platform layer.
2.3 Key Platform Capabilities
Cirrus Cloud Platform will deliver the following key features of the Cloud Management Platform:
- Self-service access for automated provisioning and deployments
- Visibility across environments
- Centralized management
- Improved compliance and security
- Optimized cloud spends
3. Functional Components
The Cirrus Cloud Platform (Cloud Management Platform) / Cloud Orbiter (Kubernetes Orchestrator) / Cirrus Cloud Platform (IaaS Orchestrator) consists of the key functional components listed below.
3.1 Self-Service Console
Primary interface for end users. User friendly facing UI, allowing users to manage and provision various infrastructure resources like VMs, storage, load balancers, etc., often through intuitive interfaces like drag-and-drop or simple forms. Also, allows organization administrators to create new Projects/Cells and manage user access to project/cell, define access control policies (who can access what resources), and ensure proper resource allocation and usage.
3.2 Admin Console (For Service Provider Only)
Provides an overall view of the entire OpenStack environment. It provides administrative UI to manage OpenStack environments, allocate resources, and oversee system health. It provides a management view of all infrastructure resources like VMs, Volumes, Load Balancers, Container Namespaces etc. It also provides insights into the overall health of the OpenStack environment, allowing for proactive maintenance and troubleshooting.
3.3 Coredge Platform Services
The Coredge Platform Services is composed of several microservices, each responsible for a specific set of functionalities, and they communicate through well-defined REST APIs and internal routing mechanisms.
It provides a rich set of APIs for resource allocation, availability zones, VM flavors, and user images, empowering users to efficiently manage and allocate resources based on their specific needs.
Also includes specialized microservices for resource management, Kubernetes orchestration, and storage management. Platform has an in-built robust API gateway to provide centralized access control and API logging, ensuring secure and authorized access to platform resources.
3.4 IAM (Identity and Access Management) Server
Authentication server provides identity and access management to CMP cloud users. It provides federation with external Identity Providers (like BSS Portal, ADFS). This is by default multi-tenant and has the capability to allow customer specific identity provider federation, ensures secure and isolated access for every customer. For each customer organization, it creates a unique account to allow identity segregation.
3.5 Cluster Controller (For Kubernetes Only)
Central entity managing all Kubernetes platform functionality, connecting and orchestrating customer Kubernetes clusters. It enables communication with Cluster Agent over port 8030/8040. This enables communication with clusters and allows centralization of Kubernetes APIs/CLI access.
3.6 Cluster Agent (For Kubernetes Only)
Deployed on each target Kubernetes cluster to enable management via the Controller. Cluster Agent initiates outbound connection towards Cluster Controller and once handshake is complete, Controller can provide commands to cluster and act as proxy to Kubernetes CLI/APIs.
4. CCP Core Modules
The table below lists all core microservice modules that form the Cirrus Cloud Platform platform, along with their functional descriptions.
| S.No. | Module | Functional Description |
|---|---|---|
| 1 | orbiter-api | API server for orbiter – exposes K8s/cluster APIs for K8s cluster management and application deployment |
| 2 | orbiter-controller | Controller for orbiter which handles the runtime. Backend engine for orbiter-api |
| 3 | observability-ui | UI service for cluster observability. It exposes cluster metrics like CPU, RAM usage etc. |
| 4 | frontend | Cluster management UI service. Interacts with orbiter-api to expose various cluster level operations to end user like registration/removal of K8s clusters, application deployment on K8s clusters, container registry etc. |
| 5 | workflow-controller | Workflow provider for internal CCP workflows |
| 6 | console | UI service for CCP |
| 7 | admin-console | Admin UI for CCP |
| 8 | platform | Platform APIs for CCP comprising of compute, volume, core-mgmt, network etc. functionalities |
| 9 | admin-platform | Admin Platform APIs for CCP to manage flavors, images, AZs, regions and other virtual resources and constructs |
| 10 | celery | Multiple Celery services for different tasks |
| 11 | auth | Keycloak Authentication Service |
| 12 | core-mgmt | Project manager service to manage organizations, cells, user mapping etc. |
| 13 | ordr_mgmt | Service to push CRUD events externally |
| 14 | socketio | Socketio service to push events/notifications to console service |
| 15 | onboarding | Service to onboard users and organizations |
| 16 | platform-celery | Internal service to handle tasks asynchronously |
| 17 | notification | Notification service for sending notifications to external messaging platforms – SMS, email (SMTP) etc. |
| 18 | orbiter-auth | Authorization gateway for the system |
| 19 | orbiter-term | Terminal access for Kubernetes based shell for clusters |
| 20 | storage-plugin | For providing storage capabilities while integrating with NetApp |
| 21 | baremetal-plugin | For providing baremetal server management while integrating with MaaS |
| 22 | client-plugin | For enablement of client-specific custom flows |
| 23 | orbiter-metering | For metering / showback / quota management and licensing |
| 24 | kafka | Internal messaging queue for components communication |
| 25 | OpenFGA | AuthZ Database for CMP authorizations |
4.1 Database Components
| S.No. | Database Component | Version |
|---|---|---|
| 1 | Redis (Cache) | 7.2.5 |
| 2 | Redis (Session) | 6.2.5 |
| 3 | PostgreSQL | 15.7 |
| 4 | MongoDB | 5.0.3 |
5. Service Catalogue
Client Cloud requires delivery of the services below from the Cloud Management Platform (Cirrus Cloud Platform) in a phased manner:
| Category | Service | Phase |
|---|---|---|
| Compute Services | Virtual Machine | MVP1 |
| Container as a Service | MVP1 | |
| BareMetal as a Service (BMaaS) | MVP1 | |
| Storage Services | Block Storage | MVP1 |
| Object Storage | MVP1 | |
| File Storage | MVP1 | |
| Network Services | Application Load Balancer (HTTP / HTTPS) | MVP1 |
| Network Load Balancer (TCP) | MVP1 | |
| VPN Gateway – Site-to-Site Connection | MVP1 | |
| VPN Gateway – Point-to-Site Connection | MVP1 | |
| Firewall | MVP1 | |
| Public IP | MVP1 | |
| NAT Gateway (Internet Gateway) | MVP1 | |
| Network Services | VPC (Virtual Private Cloud) | MVP1 |
| Monitoring Services | Log Analyzer | MVP1 |
| Operational Metric Collection | MVP1 | |
| Alarm Service | MVP1 | |
| Notification Service | MVP1 | |
| Support Services | Basic Support Services | MVP1 |
| Enterprise Support Services | MVP1 | |
| Database Services | Managed Database as a Service (Oracle and MongoDB) | MVP1 |
| Security Services | Security Incident and Event Management | MVP1 |
| Log Monitoring | MVP1 | |
| Cloud Workload Protection | MVP1 | |
| Web Application Firewall | MVP1 | |
| Foundation Services | Identity and Access Management | MVP1 |
| SMTP | MVP1 | |
| Identity Federation | MVP1 | |
| Multi Factor Authentication | MVP1 | |
| DNS | MVP1 | |
| NTP | MVP1 | |
| Privileged Access Management | MVP1 | |
| IP Address Management | MVP1 | |
| Active Directory Services | MVP1 | |
| Dual / Multifactor Authentication | MVP1 | |
| Managed Services | Managed Services | MVP1 |
| Backup as a Service | Backup as a Service | MVP1 |
| Storage Services | Archival Storage | MVP2 |
| Database Services | Microsoft SQL-as-a-Service – Standard Edition | MVP2 |
| Microsoft SQL-as-a-Service – Enterprise Edition | MVP2 | |
| Microsoft SQL-as-a-Service – Web Edition | MVP2 | |
| Managed Database as a Service | MVP2 | |
| Databases Licenses | MVP2 | |
| Network Services | Content Delivery Network | MVP2 |
| MPLS Connectivity (Partner Interconnect) | MVP2 | |
| MPLS Connectivity (Dedicated Interconnect) | MVP2 | |
| Security Services | Cloud Based Hardware Security Module | MVP2 |
| Distributed Denial of Services | MVP2 | |
| TLS / SSL Certificate Management | MVP2 | |
| Encryption Services | MVP2 | |
| Digital Forensics | MVP2 | |
| Additional Services | Queue Services (Kafka as a Service) | MVP2 |
| Network Services | Bandwidth as a Service (QOS) (BWaaS) | MVP3 |
| Database Services | Managed Database as a Service MariaDB | MVP3 |
| Managed Database as a Service NoSQL | MVP3 | |
| Disaster Recovery as a Service (DRaaS) | Disaster Recovery as a Service (DRaaS) | MVP3 |
| Additional Services | Message Broker Services | MVP3 |
6. User Onboarding and Platform Hierarchy
6.1 Onboarding Flow
Onboarding of the Client's customers will be initiated on the BSS Portal which starts with self-registration by customers or with help from the Client business team.
Step a
Customer will order/subscribe to CCP on the BSS Portal. Upon subscription, the BSS Portal will call CCP APIs for creation of organization. Cirrus Cloud Platform will automatically configure and create the resources below for the new organization:
- Default User roles for an organization (Organization Administrator and Cell Administrator)
- Default project / cell / VPC in default region
- Default service catalogue
Step b
Mapping between the BSS Portal and CCP will be developed in accordance with the guidance provided by business teams and will be enforced for billing, governance, and resource hierarchy.
Step c
The BSS Portal will serve as user identity store and provide authentication services. All customer user accounts can be created, modified, and deleted in the BSS Portal only.
Step d
Each customer account will be mapped with only a single Tenant in CCP. Multiple cells can be created within a single Tenant. Nesting of Tenants and cells is not allowed currently.
Step e
Quotas can be applied at tenant and cell level. All cells will inherit quota by default.
Step f
Resource Hierarchy will be maintained as:
Tenant → Cell → Resources
The BSS Portal to CCP mapping is as follows:
| BSS Portal | CCP |
|---|---|
| Party | |
| Billing Account (BA) | |
| Logical Subscriber Identity (LSI) | |
| Tenant | Tenant |
| Cell | |
| Resources |
Step g
Pre-defined roles will be mapped with the user identities.
6.2 Pre-defined User Roles
| Role | Permissions / Description |
|---|---|
| Tenant Super Administrator | Root User |
| Tenant Administrator | This role has highest privileges in each tenant |
| Tenant Viewer | Read Only Rights for specific organization(s). This role is required for auditing, compliance, and training purpose. |
| Tenant Billing Admin | Access to Quota Usage, metering and showback |
| Cell Administrator | Raise request for increasing Cell quota |
| Cell Viewer | Read Only Rights for specific cell(s). This role is required for auditing, compliance, and training purposes. |
| Cell User | Access all services mapped with a cell |
6.3 Service-Specific Pre-defined Roles
| Role | Permissions / Description |
|---|---|
| Cell VM Admin | Access to VM-as-a-service in a Cell |
| Cell VM Reader | Read access to VM-as-a-service in a Cell |
| Cell Block Storage Admin | Access to Block Storage-as-a-service in a Cell |
| Cell Object Storage Admin | Access to Object Storage-as-a-service in a Cell |
| Cell File Storage Admin | Access to File Storage-as-a-service in a Cell |
| Cell Backup Admin | Access to Backup-as-a-service in a Cell |
| Cell Network Admin | Admin access to Network-as-a-service in a Cell |
| Cell Container Admin | Access to Container-as-a-service in a Cell |
| Cell BareMetal Admin | Access to Bare Metal-as-a-service in a Cell |
| Cell Database Admin | Access to Database-as-a-service in a Cell (includes all DBaaS services) |
| Cell InfoSec Admin | Access to Activity Logs, Audit logs |
7. Solution Design
The proposed architecture ensures high availability, fault tolerance, and efficient management for a multi-region Cloud Management platform. The platform is designed to support a CCP application with dual clusters per region, robust failover mechanisms, and global services. The architecture aligns with business continuity goals and optimal resource utilization.
Each Region consisting of multiple AZs will run independent Cirrus Cloud Platform components per AZ for all the microservices to manage infrastructure in that AZ. Cirrus Cloud Platform Root account services will run globally, which is responsible for aggregating organization specific data like metering, quota, project management. Furthermore, each region has two Cirrus Cloud Platform global services running in active-passive mode with their databases also in active-passive mode. Postgres and MongoDB clusters will run on virtual machines and will have different DB clusters in each zone working as active-passive clusters.
7.1 Regional Architecture
Each region contains:
Cluster 1 (Primary) – Availability Zone 1
- Hosts main application services
- Web layer is deployed in 3 virtual machines hosted in DMZ. Web layer acts as reverse proxy to access application hosted in the application layer
- Contains the primary MongoDB database
- Serves as an active cluster during normal operations
Cluster 2 (Standby) – Availability Zone 2
- Hosts replica application services
- Web layer is deployed in 3 virtual machines hosted in DMZ. Web layer acts as reverse proxy to access application hosted in the application layer
- Contains a replica of the MongoDB database
- Remains ready to take over in case of failure in Cluster 1
Failover Mechanism
- Traffic to be routed to passive cluster automatically and script to promote database into passive cluster if active cluster is down
- MongoDB replica sets ensure data consistency during failover within a region
7.2 Global Services
Global service provides multi-region capabilities ensuring the following:
Organization Onboarding
- Centralized onboarding process replicated across regions
- Ensures consistent user experience and service availability
Metadata Management
- Centralized metadata replicated across three regions
Metadata that is replicated as global service component:
- Organization and Project Metadata Mappings to Region
- Quota management
- User and Organization Mapping information
- Aggregation of Metering and Usage data for Reporting and Notional Invoice
Active Backup Failover
- GSLB probe to detect right endpoint to connect from an external system, allowing fallback to Backup when Active cluster is unavailable
- Internal Quorum based on 2n+1 system ensuring correct identification of Active cluster being down
Disaster Recovery
- Acts as a coordination point for global failover scenarios
7.3 Multi-AZ Failover
To ensure resilience within a region:
- Both clusters are deployed across multiple availability zones (AZs)
- If an AZ fails, services failover within the region without impacting on the overall operations
- Load balancers and DNS routing ensure seamless redirection of traffic to active services
7.4 Extended Cluster for Global Databases
Global services are region specific and include mostly MongoDB collection which is storing Tenant/Project/User information hosted on clustered micro services with MongoDB Active-Active replication using change-stream.
OpenFGA Postgres and MongoDB, which will be DB backends for global AuthZ and global data service will have DB be running in Active-Passive mode between two regions. System will write to primary region OpenFGA by default as this is a read-heavy database.
There will be 3 VMs in each availability zone to form a 5-node cluster with an additional virtual machine which can be used as arbiter/etcd node to switch over in case of AZ failure. Deploying a 3+3 node setup distributes database responsibility evenly across two availability zones and ensures that no single AZ holds a disproportionate share of cluster's capacity or state. In the event of a failure in one AZ, the surviving AZ retains a full set of 3 nodes — ready to recover operations manually if quorum is lost. Even though quorum (typically 4/6) might break if an entire AZ fails, manual intervention allows safe failover and administrators can force reconfiguration (e.g., reinitiate leader election) in the surviving AZ.
Database Failover: A two-site solution for HA within a region has been considered due to unavailability of third region for deployment of arbiter node. Failover would be executed with help of script which will be developed in collaboration between Coredge and the Client.
7.5 Backup Strategy
Data from Active CCP cluster will be continuously backed up into a geo-replicated object storage bucket. Backup of north region CCP will be stored in south region and vice-a-versa. A scheduled backup job will be configured for incremental backup after every 30 minutes and full back up after every 24 hours with 3 months retention period. The backup data will consist of following files:
- Keycloak PostgreSQL DB
- Config Mongo DB
- Metrics Mongo DB
- ETCD DB of K8s cluster running CCP
Database clusters hosted in Virtual machines will be backed up using Veritas backup agent every 30 minutes and full back up after every 24 hours with 3 months retention period.
7.6 Implementation Considerations
1. Database Replication
- MongoDB Replication: Cluster 1 hosts the primary database and Cluster 2 hosts a replica with automatic synchronization in real time
- PostgreSQL Replication: Each region has an active standby database for Keycloak and CCP Application (using Logical Replication)
2. Networking
- Intra-region: High-speed, low-latency networking between AZs ensures seamless failover and data synchronization
- Inter-region: Dedicated network links or VPNs ensure secure and efficient communication between regions
3. Monitoring and Alerting
- Integrated monitoring tools (e.g., Prometheus, Grafana) will track cluster and database health
- Alerts will notify administrators of potential issues, triggering automated recovery workflows where possible
4. Security
- Encryption in transit (mTLS) and at rest (AES-256) for all data
- Role-based access control (RBAC) for applications and databases
- Regular security assessments and compliance checks
8. Pre-Requisites
The pre-requisites below are required for deployment of CCP on Kubernetes cluster:
- Wildcard SSL certificates for CCP hosting and dynamic customer account URLs
- Load Balancer and VIPs for each CCP endpoint
- DNS Server and credentials to create dynamic domains based on customer accounts
- Accessible Container registry to store container images
- Kubernetes compliant Storage with High IOPS performance
- Connectivity and credentials for SMTP server for email integration
- NTP and DNS server connectivity
- Connectivity and APIs to integrate with the BSS Portal platform
9. Constraints and Dependencies
The Cloud Management Platform solution (i.e. Cirrus Cloud Platform / CCP / Cloud Orbiter) will be deployed in control planes of each availability zone. It should not be deployed in workload pod.
10. Exclusions
The following tasks are out of scope for Cirrus Cloud Platform:
- Any hardware procurement and its deployment
- Any software procurement and associated licensing (operating system, database, backup software, management software) and its deployment
- Penetration Testing
- Performance Testing for any other component other than CCP
- Day2 operations for underlying infrastructure (Compute, Storage, and Network)
- Any application / configuration changes in the BSS Portal
11. RACI Matrix
The below table provides a high-level view of key activities/tasks and corresponding stakeholders.
R = Responsible | A = Accountable | C = Consulted | I = Informed
| # | Task | R | A | C | I |
|---|---|---|---|---|---|
| 1 | CCP Major / Minor Upgrade | Coredge | Coredge | The Client | The Client |
| 2 | OS patching and upgrades on CCP cluster VMs | The Client | The Client | Coredge | Coredge |
| 3 | CCP Kubernetes Cluster Patching | Coredge | Coredge | The Client | The Client |
| 4 | Infrastructure for Management Cluster | The Client | The Client | Coredge | Coredge |
| 5 | Storage driver plugin details for PVCs in Management Cluster | The Client | The Client | Coredge | Coredge |
| 6 | SSL Certificates and LB configuration for all required domains | The Client | The Client | Coredge | Coredge |
| 7 | Service Description | The Client | The Client | Coredge | Coredge |
| 8 | Rate Card | The Client | The Client | Coredge | Coredge |