· Leo · 4 min read
Cloud is the new computer
Cloud should be treat as a single computer with a complete OS

In our previous discussion on FinOps Sucks, FinArch Works we mentioned that the cloud should be viewed as a single supercomputer. With the widespread adoption of high-performance networks, the bandwidth between cloud nodes now surpasses the local disk read/write speeds of the previous generation of physical servers. From this point on, the cloud should no longer be seen as a cluster of virtual machines but as a supercomputer with virtually unlimited resources. (This is actually a consensus among cloud computing practitioners worldwide.) However, many practitioners might only understand this superficially. Understanding how to view the cloud is the key difference in designing cloud-based software.
Treating the cloud as a single computer versus treating it as a cluster of virtual machines makes a huge difference. Let’s explore these differences.
Architecture Design
In a distributed cluster, services are deployed, whereas, in a single operating system, programs are installed. Services are always running, while programs are triggered by events. Here are the details:
Deployment vs. Startup
Services remain online once deployed, while programs start only when triggered. Services focus on runtime performance without concern for startup speed, whereas programs need to start quickly. For example, your grep command shouldn’t take 5 seconds to warm up.
Stateful vs. Stateless
Services are stateful and provide services through interfaces; programs are stateless and operate via input and output.
Scaling Resources Proportionally vs. On-Demand Allocation
Even with Kubernetes, services can only scale proportionally, with each unit needing equal resources. Programs, however, can scale resources on-demand without proportionality.
Optimizing for Best Performance vs. Minimizing Cost
Services aim to optimize single-instance performance as they occupy static resources, while programs aim to minimize total resource usage over time since they don’t persistently consume resources.
Installation and Deployment
Maintaining a set of distributed services versus maintaining a program on an operating system differs significantly:
Cross-Cluster Compatibility vs. Cross-OS Compatibility
For cross-cluster applications, we maximize commonalities to avoid code changes during deployment. Cross-OS, we utilize different APIs provided by the OS, encapsulating logic to handle differences. For example, high-performance network programs use epoll on Linux, kqueue on BSD, and IO completion ports on Windows.
Treating VMs as Computers vs. Treating VMs as Processes
In a cluster view, VMs are like servers and should be treated like physical servers, hosting multiple services. However, if viewed as a process, a VM should host a single service with minimal configuration.
Cloud Functions as Processes vs. VMs as Long-Lifecycle Cloud Functions
From a process perspective, AWS Lambda functions start quickly but typically have lower hardware specs. VMs should be seen as processes with more compute and memory resources, stronger local storage, and longer runtimes.
Updating a Set of Services vs. Updating an Application
Code is constantly updated. Upgrading running services involves preparatory and follow-up tasks, whereas upgrading a program is simpler, akin to software upgrades on an OS. Since programs have limited lifecycles, starting with new code suffices for upgrades.
Security and Data Privacy
Interface Security vs. Data Security
Services ensure the security of interface calls, often by restricting IP access—a simplistic approach prone to leaks as deployments change. A temporary application program secures itself and its dependent data permissions to ensure security.
Logical Multi-User Isolation vs. Physical Multi-User Isolation
Services require extensive code-level isolation to handle multiple users sharing the same service, ensuring one user doesn’t monopolize resources. In contrast, each user’s application operates as its own process, with inherent data and resource isolation provided by the cloud OS.
Asset Management
CMDB vs. Billing Management
Clusters manage assets and resource-to-business relationships using CMDB. However, viewing the cloud as an OS means businesses can’t provide static server states for CMDB management. Resource usage is dynamic, tracked through logs and billing services.
Default Exclusive Resource Use vs. Default Low Resource Use
Services typically monopolize the server/container’s resources. In an OS, even a daemon program uses minimal resources, only allocating what’s needed on demand.
Conclusion
Viewing from Different Angles
Abstracting and analogizing are steps we use to understand new concepts through our experiences. Our cognitive abilities determine how we abstract and analogize.
Given that operating systems are one of the greatest abstractions in computer science history, designing cloud systems from the “cloud is the new computer” perspective fully leverages cloud elasticity and native services, minimizing resource waste. The What is the a CloudFirst architecture? proposed in Cloud-First Architecture is based on this abstraction. Interestingly, the cloud is not just a computer but a NUMA-architecture computer. Best practices for using the cloud are akin to programming on a NUMA-architecture supercomputer. We’ll delve into more details in later articles in this series.



