Development Operations or “DevOps” is a software engineering philosophy which espouses the automation of infrastructure provisioning and software deployment. Infrastructure includes everything needed to run your application – language runtimes, application servers, message queue managers, database systems, load balancers, LDAP services, SMTP servers, etc. In other words, the “Full Stack”. It is into this environment that the custom application is installed and tested once built.
Building out the runtime environment in this context involves starting with vanilla hosts, either physical or virtual and installing all the necessary software required to run one or more applications at scale, including the custom application we are responsible for building.
This activity can be divided into discrete phases: provision, deploy and orchestrate.
Provisioning is in the domain of operations and involves installing and configuring system software required to support the application. Traditionally this could be a JVM, application server, queue manager and database server. With the trend towards containerized applications this activity evolves to installing agents for a distributed orchestration system like Apache Mesos, Docker Swarm and Kubernetes and the Docker daemon.
In the past this was an entirely manual process performed by system administrators. As environments became virtualized this activity scaled to the point of self-service at least in the public cloud space. With the evolution towards cloud native architectures, the pool of hosts used is now into the hundreds if not thousands. This kind of scale demands the use of tools such as Chef, Puppet, Ansible and SaltStack to automate the provisioning process.
Generic vs Domain-specific vs Active Provisioning Tools
I’m struggling to think of appropriate labels for each of these three categories and I’m not entirely satisfied with what I have here, but for the time being this is what I’ll go with.
Generic or basic is really just shell scripts remotely manipulating a host’s configuration via SSH. This is just a generic remote execution method and not tailored to configuration management. This interaction can be wrapped by a Gradle SSH plugin or a tool like Fabric, but otherwise represents the lowest level of tool complexity – although the tool’s simplicity may result in complex scripts that can otherwise be accomplished by more sophisticated tools with a single directive.
Domain-specific is where a bulk of the provisioning tools find themselves categorized under, such as Chef, Puppet, Ansible, and SaltStack. These tools generally expose the use of a fully fledged programming language (like Ruby or Python) and/or provide a DSL (domain specific language) targeted to configuration management. The goal for all of them is “Infrastructure as Code” or to be able to commit instructions for infrastructure setup and configuration to source control. While all generic tools are agentless, these domain-specific suite of tools may use agents which are provisioned to each managed node.
BOSH and tools like it are a special breed of configuration management tools that cross over into the realm of active runtime monitoring and management, a scope that is generally reserved for IaaS (for VMs) or PaaS (for containers) domains. This is a pretty confined category of which I wonder if BOSH is the only member.
Matt Reider describes other Provisioning/Configuration Management tools verses BOSH pretty well. A diagram he tweeted really illustrates how much more sophisticated a tool like BOSH is compared to static recipes run once by the majority of provisioning tools.
In a sense it is a meta-tool since it can also punchout to static provisioning tools like Chef or Puppet which offer rich DSL configuration management-specific syntax which themselves can invoke basic shell scripts if needed. Inception meets Configuration Management.
Application deployment occurs when a stable build is installed into the runtime environment. In days past there was typically a “Build Master” who would gather up all changes, manage any needed merges and build the application to produce a binary which was then manually installed into the runtime environment. More recently this role has been replaced by automation which all falls under the banner of Continuous Integration (CI). CI has it’s own phases of evolution, with the penultimate being Continuous Deployment (CD). Of course not every commit triggers a new production or even test server deployment, but the concept of always being ready to do so is described as “Continuous Delivery”.
Build servers such as Jenkins or the Jazz Build Engine in conjunction with build scripts described with Gradle, Ant, Maven POMs, Grunt or Gulp become the new “build master”. Automated deployment can either be performed with deployment tools or provisioning tools.
Orchestration or runtime management at a minimum must ensure that the deployed application remains running and load is distributed among the healthy instances. This requires the ability to elastically scale the application, check that instances are healthy and restart those that are not (sometimes on a completely different physical node). Relocation of application runtime instances requires a dynamic service registration and discovery mechanism. Other base features include workload isolation and network management. Beyond these core features a full featured distributed runtime management system includes such things as security, operational analytics, log aggregation and persistent state management.
As a whole all these services can be described as a “cloud operating environment”. Applications need to be redesigned (use the 12 Factors as a guidepost) to be fully integrated into this environment. For Java applications, Spring Cloud and Netflix OSS (Hysterix) provide guidance. Examples of these distributed runtimes are Apache Mesos, Google’s Kubernetes, Docker Swarm, Hashicorp’s Nomad, Lattice, and to a certain respect CoreOS’s Fleet. In all of these cases additional supporting software is needed to fill in the functional gaps to provide log aggregation, metrics, service registration and discovery and security. Generally this involves integrating several open source offerings with some proprietary ones and offering it commercially like CoreOS’s Tectonic (CoreOS + Kubernetes), Mesosphere (Mesos + extras), RedHat’s OpenShift (Kubernetes + extras), Docker Datacenter (Swarm + extras) and Cloud Foundry (Lattice + extras).
From an IBM standpoint, Bluemix is its branded PaaS available publicly as as well as a dedicated hosted environment (Bluemix Dedicated) and on premise (Bluemix Local). IBM Platform Computing EGO (Enterprise Grid Orchestrator) is another offering which is more similar to Mesosphere and in fact offers integration with Mesos.
DevOps essentially encompasses two activities with as much automation as possible:
- Building and managing a runtime environment (infrastructure provisioning)
- Building, testing and installing a custom application into that environment (deployment)
Provisioning tools fall into three broad categories: generic, domain-specific, and active, with most of the tooling considered “domain-specific”.
Deployment is a continuum of Continuous Integration (CI) capabilities culminating in Continuous Deployment (CD). There are many language specific tools that support CI/CD. This toolchain is orchestrated by a CI server that integrates with a source control management system (SCM) launching tasks, building and packaging binaries, executing test suites, generating reports and documentation and finally automatically deploying.
Orchestration in this context specifically refers to a distributed workload management system. The orchestration system along with supporting software for network management, log aggregation, service registration and discovery, and optionally security and secret management as well as operational analytics together form a Platform as a Service (PaaS)