This post is part of the "Automation-Orchestration" architecture series. Posts of this series together comprise a whitepaper on Automation and Orchestration for Innovative IT-aaS Architectures.
Scalability criteria are often referred to as “load scalability”, actually meaning the ability of a solution to automatically adopt to increasing and decreasing consumption or execution demands. While the need for this capability is obviously true for an automation solution, there’s surely more that belongs into the field of scalability. Hence, the following aspects will be discussed within this chapter:
- load scalability
- administrative scalability
- functional scalability
Back at a time where cloud computing was still striving for the one and only definition of itself, one key cloud criteria became clear very quickly: Cloud should be defined mainly by virtually infinite resources to be consumed by whatever means. In other words, cloud computing promised to transform the IT environment into a landscape of endlessly scalable services.
Today, whenever IT executives and decision makers consider the deployment of any new IT service, scalability is one of the major requirements. Scalability offers the ability for a distributed system to easily expand and contract its resource pool to accommodate heavier or lighter loads or number of inputs. It also determines the ease with which a system or component can be modified, added, or removed to accommodate changing load.
Most IT decisions today are based on the question of whether a solution can meet this definition. This is why when deciding on an automation solution – especially when considering it to be the automation, orchestration and deployment layer of your future (hybrid) IT framework – scalability should be a high priority requirement. In order to determine a solution’s load scalability capabilities one must examine the system’s architecture and how it handles the following key functions:
Expanding system resources
A scalable automation solution architecture allows adding and withdrawing resources seamlessly, on demand, and with no downtime of the core system.
The importance of a central management architecture will be discussed later in this paper; for now it is sufficient to understand that centralized engine architecture develops its main load scalability characteristics through its technical process architecture.
As an inherent characteristic, such an architecture is comprised of thoroughly separated working processes each having the same basic capabilities but – depending on the system’s functionality – acting differently (i.e. each serving a different execution function). A worker process could – depending on technical automation requirements – be assigned to some of the following tasks (see figure below for an overview of such an architecture):
- worker process management (this would be kind-of a “worker control process” capability needed to be built out redundantly in order to allow for seamless handover in case of failure)
- access point for log-on requests to the central engine (one of “Worker 1..n” below)
- requests from a user interface instance (“UI worker process”)
- workload automation synchronization (“Worker 1..n”)
- Reporting (“Worker 1..n”)
- Integrated authentication with remote identity directories (“Worker 1..n”)
At the same time, command and communication handling should be technically separated from automation execution handling and be spread across multiple “Command Processes” as well – all acting on and providing the same capabilities. This will keep the core system responsive and scalable in case of additional load.
Scalable and high available automation architecture
The architecture described in the figure above represents these core differentiators when it comes to one of the following load change scenarios:
Dynamically adding physical/virtual servers to alter CPU, memory and disk space or increasing storage for workload peaks without downtime
With the above architecture, changing load means simply adding or withdrawing physical (virtualized/cloud-based) resources to the infrastructure running the core automation system. With processes acting redundantly on the “separation of concern” principle, it is either possible to provide more resources to run other jobs or add jobs to the core engine (even when physically running on a distributed infrastructure).
This should take place without downtime of the core automation system, ensuring not only rapid reaction to load changes but also high resource availability (to be discussed in a later chapter).
Change the number of concurrently connected endpoints to one instance of the system
At any time during system uptime it might become necessary to handle load requirements by increasing the number of system automation endpoints (such as agents) connected to one specific job instance. This is possible only if concurrently acting processes are made aware of changing endpoint connections and are able to re-distribute load among other running jobs seamlessly without downtime. The architecture described above allows for such scenarios where a separated, less integrated core engine would demand reconfiguration when adding endpoints over a certain number.
Endpoint reconnection following an outage
Even if the solution meets the criteria of maximum availability, outages may occur. A load scalable architecture is a key consideration when it comes to disaster recovery. This involves the concurrent boot-up of a significant number of remote systems including their respective automation endpoints. The automation solution therefore must allow for concurrent re-connection of several thousand automation endpoints within minutes of an outage in order to resume normal operations.
While load scalability is the most commonly discussed topic when it comes to key IT decisions, there are other scalability criteria to be considered as differentiating criteria in deciding on an appropriate automation solution. One is “Administrative Scalability” defined as the ability for an increasing number of organizations or users to easily share a single distributed system.
Organizational unit support
A system is considered administratively scalable when it is capable of:
- Logically separating organizational units within a single system. This capability is generally understood as “multi-client” or “multi-tenancy”.
- Providing one central administration interface (UI + API) for system maintenance and onboarding of new organizations and/or users.
Endpoint connection from different network segments
Another aspect of administrative scalability is the ability of an automation solution to seamlessly connect endpoints from various organizational sources.
In large enterprises, multiple clients (customers) might be organizationally and network-wise separated. Organizational units are well hidden from each other or are routed through gateways when needing to connect. However, the automation solution is normally part of the core IT system serving multiple or all of these clients. Hence, it must allow for connectivity between processes and endpoints across the aforementioned separated sources. The established secure client network delineation must be kept in place, of course. One approach for the automation solution is to provide special dedicated routing (end)points capable of bridging the network separation via standard gateways and routers but only supporting the automation solution’s connectivity and protocol needs.
Seamless automation expansion for newly added resources
While the previously mentioned selection criteria for automation systems are based on “segregation,” another key decision criteria is based on harmonization and standardization.
An automation system can be considered administratively scalable when it is capable of executing the same, one-time-defined automation process on different endpoints within segregated environments.
The solution must be able to:
- Add an organization and its users and systems seamlessly from any segregated network source.
- Provide a dedicated management UI including those capabilities which is solely accessible securely by organization admin users only.
and at the same time
- Define the core basic automation process only once and distribute it to new endpoints based on (organization-based) filter criteria.
The architecture thereby allows for unified process execution (implement once, serve all), administrative scalability and efficient automation.
Functional Scalability, defined as the ability to enhance the system by adding new functionality at minimal effort, is another type of scalability characteristics that shall be included in the decision-making process.
The following are key components of functional scalability:
Enhance functionality through customized solutions
Once the basic processes are automated, IT operations staff can add significant value to the business by incorporating other dedicated IT systems into the automation landscape. Solution architects are faced with a multitude of different applications, services, interfaces, and integration demands that can benefit from automation.
A functionally scalable automation solution supports these scenarios out-of-the-box with the ability to:
- Introduce new business logic to existing automation workflows or activities by means of simple and easy-to-adopt mechanisms without impacting existing automation or target system functions.
- Allow for creation of interfaces to core automation workflows (through use of well-structured APIs) in order to ease integration with external applications.
- Add and use parameters and/or conditional programming/scripting to adapt the behavior of existing base automation functions without changing the function itself.
Template-based implementation and template transfer
A functionally scalable architecture also enables the use of templates for implementing functionality and sharing/distributing it accordingly.
Once templates have been established, the automation solution should provide for a way to transfer these templates between systems or clients. This could either be supported through scripting or solution packaging. Additional value-add (if available): Share readily-tested implementations within an automation community.
Typical use-cases include but are not limited to:
- Development-test-production deployment of automation packages.
- Multi-client scenarios with well-segregated client areas with similar baseline functionality.
Customizable template libraries with graphical modeler
In today’s object and service based IT landscapes, products that rely on scripting are simply not considered functionally scalable. When using parameterization, conditional programming, and/or object reuse through scripting only, scaling to augment the automation implementation would become time-consuming, complex, and unsustainable. Today’s functionally scalable solutions use graphical modelers to create the object instances, workflows, templates, and packages that enable business efficiency and rapid adaptation to changing business requirements.
Finally, consider the following question as a decision basis for selecting a highly innovative, cloud-ready, scalable automation solution:
What is the minimum and maximum workflow load for execution without architectural change of the solution? If the answer is: 100 up to 4-5 Mio concurrent jobs per day without change of principle setup or architecture, one’s good to go.
In other words: Scalable automation architectures support not only the aforementioned key scalability criteria but are also able to handle a relatively small scale of concurrent flows equally to an utterly large scale. The infrastructure footprint needed for the deployed automation solution must obviously adapt accordingly.