puppet-software

Model-Driven Infrastructure for Java Projects

,

Java software-development projects are our daily business. For us it’s a common task to setup different environments for dealing with the specific needs of our customers. While we treat every customer individually we want to keep setup time the lowest possible and also leverage synergies between projects. This article shows how we achieve that using Puppet for configuration-management.

Reusing infrastructure across projects

So how can we reuse infrastructure? Well, there may be several thoughts coming to our head like:

  • hiring a configuration-manager and admin to agree on a consistent setup and then prepare each environment manually
  • having one working environment which is used for making backups and then restoring that on each environment – eventually also using virtual-machine snapshots
  • have only one server and setup the infrastructure there once and then throw all projects at it

As you know each of these approaches has individual problems:

  • well, this is expensive and also takes long for preparing and rolling out. Also, it’s error-prone so when rolling out to more than one server you won’t be certain that all environments actually have the same state.
  • making a backup of an existing system doesn’t give you a well-defined state. You don’t know what is installed in which version on a high-level. You’ll have to login into that system and maybe ask your package-manager in order to find out. Also having such a complex artefact of a backup or snapshot it won’t be easy to extend or change it so it’s hard to use it for leveraging infrastructure reuse between projects.
  • this is obviously only a feasible approach for startups or proof-of concepts. Once your project grows you’ll need things like test-environments, failover and separation due to security reasons.

Each of these approaches are having drawbacks, so how do we tackle them?

  • we need to have a non-manual process to take care of the infrastructure-setup. That way we’ll have an efficient cost-saving way and reproducible results.
  • we need well-defined states, that can be changed or extended configurativly.
  • we should be able to reuse that state between installations in that these can be configurated differently and also don’t share ressources

That’s quite much – So how do we get there?

Model-driven infrastructure for java projects

Model-driven infrastructure combines several benefits when doing configuration-management. Models are describing the target state of a system so when applying a model to a system the actual difference will be calculated alongside with the changes needed to get to the target state. These steps are then executed in a final step to the system bringing it into a consistent and well-defined state.

Model-driven infrastructure is defining “infrastructure as code” alongside with configuration of this code. That way we get powerful concepts like a programming language and modularization to be used in configuration-management.

Having infrastructure defined as code makes sense in many ways to us because software-development is our daily business and therefore we can use all the development-tools like IDEs, version-control-systems, build-servers etc for developing and testing our infrastructure.

Our infrastructure code is modular and thus reusable and we use it for automatic rollout to our test or production environments. In fact it doesn’t matter if we use a model for 1 installations or 100 – the model defines a target state and the installation will be done automatically.

Once we have developed an infrastructure-model it can be shared between our developers and projects to parallelize development and rollout. Having similar setups between projects – our developers immediately find their way around even in totally new projects.

Most of the modules we use are Open-Source-Software which we make even better by contributing back. For stability we lock specific versions of these open-source projects because we must have stable and reproducable results.

Infrastructure as code

Ok, this modelling infrastructure sounds good but how do we actually do this? We’ll show you how with code-examples and some configuration.

As mentioned earlier we use Puppet for modelling our infrastructre. Puppet provides a declarative language that lets you define your “infrastructure as code”. With Puppet you can describe your configurations and the relationships between resources. Puppet is also unique in its ability to simulate deployments, giving you the agility to make small and big changes without disruption to your infrastructure.

Below you’ll see an example configuration of the sources where we pull the modules from. You’ll find either the puppet-internal “puppetforge” or  “github”. This configuration is then used by librarian-puppet to pull the modules from a repository server. Thus librarian-puppet can be seen as some kind package-manager for configuration-management respectivly Puppet:

[ruby]
forge "http://forge.puppetlabs.com"

mod 'puppetlabs/stdlib'
mod 'puppetlabs/apt'
mod 'rtyler/jenkins'
mod 'saz/pureftpd'
mod 'cloudfront/tomcat'
mod 'puppetlabs/mysql'

mod 'tomcat-manager',
   :git => 'git://github.com/UWS-Software-Service/puppet-tomcat-manager.git'
mod 'grails',
   :git => 'git://github.com/osoco/puppet-grails.git'
mod 'wget',
   :git => 'git://github.com/osoco/puppet-wget.git'
[/ruby]

We use quite some modules here for this sample model we want to address a quite common java development stack that is tomcat with mysql, jenkins and grails. Also we’d like to have a FTP-server running on that server for easier access. As you already noted there are some other modules used up there as transitive dependencies. BTW: as you have maybe seen, we also use our custom puppet-module called puppet-tomcat-manager. You can download, fork it or use via librarian-puppet. Pull requests are welcome.

Ok, that’s fine but what happens when there are changes on puppetforge or github? Below you can see how we do the version-locking for each module.

[ruby]
FORGE
  remote: http://forge.puppetlabs.com
  specs:
    cloudfront/tomcat (0.1.0)
    puppetlabs/apt (1.1.0)
      puppetlabs/stdlib (>= 2.2.1)
    puppetlabs/mysql (0.6.1)
      puppetlabs/stdlib (>= 2.2.1)
    puppetlabs/stdlib (3.2.0)
    rtyler/jenkins (0.2.3)
      puppetlabs/apt (>= 0.0.1)
      puppetlabs/stdlib (>= 2.0.0)
    saz/pureftpd (1.0.2)

GIT
  remote: git://github.com/UWS-Software-Service/puppet-tomcat-manager.git
  ref: master
  sha: 9e90e0cc0d5fa553ec04ebe1bae7ed6f74c6e3d4
  specs:
    tomcat-manager (0.1.0)

GIT
  remote: git://github.com/osoco/puppet-grails.git
  ref: master
  sha: 70ce24641a8e11bbfea8cda7b3bddcfcfd902117
  specs:
    grails (0.0.1)

GIT
  remote: git://github.com/osoco/puppet-wget.git
  ref: master
  sha: 6c48cc5e0d1de3c45c11c1f3d76369bac39b50cf
  specs:
    wget (0.0.1)
[/ruby]

We are locking specific versions of modules that we know they work in isolation and also in combination with other versions of other modules. That way those OSS geniuses can continue doing a great job and we can just continue using a specific locked version. And if we feel like it we try out newer versions in our environments first and let our customers benefit from the latest mature open-source modules.

Finally with these building blocks in place we can model our infrastructure that consists of nodes which have different configuration.

[ruby]
node basenode {
  include apt
  include stdlib
}

node 'project-1-DEV-virtual-machine' inherits basenode {
  include jenkins
  include mysql::server
  include mysql::java
  include pureftpd
  include tomcat
  include tomcat-manager

  grails { "grails-2.1.1":
    version => '2.1.1',
    destination => '/opt'
  }
}

node 'project-1-TEST-virtual-machine' inherits basenode {
  include mysql::server
  include mysql::java
  include tomcat
  include tomcat-manager

  $tomcatPort = "8181"

  exec { "provision ${tomcatPort}":
    command => "${tomcat::home}/provision.sh create ${tomcatPort}",
    cwd     => "${tomcat::home}",
    creates => "${tomcat::home}/${tomcatPort}",
    user    => $tomcat::user,
    require => [Class["tomcat"]];
  }

  file { "${tomcatPort}/bin/setenv.sh":
    path    => "${tomcat::home}/${tomcatPort}/bin/setenv.sh",
    owner   => $tomcat::user,
    replace => false,
    content  => "JRE_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre",
    require => Exec["provision ${tomcatPort}"]
  }

  exec { "stop ${tomcatPort}":
    command => "${tomcat::home}/run.sh stop ${tomcatPort}",
    cwd     => "${tomcat::home}",
    user    => $tomcat::user,
    require => [Exec["provision ${tomcatPort}"]];
  }

  exec { "run ${tomcatPort}":
    command => "${tomcat::home}/run.sh start ${tomcatPort}",
    cwd     => "${tomcat::home}",
    user    => $tomcat::user,
    require => [Exec["stop ${tomcatPort}"]];
  }

  grails { "grails-1.3.9":
    version => '1.3.9',
    destination => '/opt'
  }
}
[/ruby]

Like we said before with a programming language for configuration-management you have powerful ways of configuring your infrastructure. When looking at the code we see that it is quite readably due to concepts like inheritance, modularization and separation of concerns.
Also to get into a little bit more detail – when comparing “Project-1-DEV”-node to “Project-1-TEST”-node, the second models that a tomcat instance is always running on port 8181 which is quite useful for a test-system for example. The other node models that Jenkins is installed, so this can be seen more of a development-server.

Model-driven infrastructure for java projects with Open-Source-Software

In this article we have described what we do when we have to provision infrastructure for a java project. With the described solution we are able to automatically rollout setups, define modules and reuse them across projects and also use exisiting modules and extend them. All components used here are open-source so we won’t have any licence-costs at all.

Moreover this approach also works for virtualized environments as well as for full cloud-provisioning which consists of automatically provision and configure cloud instances from zero to fully operational, whether you’re using VMware’s private cloud or Amazon EC2′s public cloud.