Articles / Improving The Software Dist…

Improving The Software Distribution and Deployment Process

A more defined process is needed for development, distribution, and deployment of software. Specifically, we need to revise the current process which makes the end product of software development an archive file (gzipped tarball, Debian package, zip file, etc.) which is distributed on a CDROM or downloaded through the Internet via FTP or the Web and finally installed and configured. Software development, distribution, and deployment is a group activity carried out through collaboration over the Internet; it should include application developers, component developers, software users, and software testers, auditors, and reviewers, among others.

Currently, the interaction between these collaborators is ad hoc, carried out via email, FTP, CVS, newsgroups, and Web sites such as freshmeat and SourceForge.

This ad hoc process is both a symptom of, and reinforces, some of the unwanted features of existing software development practices. These include:

  • Monolithic applications
  • Unstable applications
  • Long-term use of old versions of deployed software
  • The effective exclusion from the process of potentially useful participants

I contend that formalization of this communication, particularly across the software development-distribution-deployment process, would grease the wheel of productivity by removing communication barriers and simplifying tasks that are frustrating and unnecessary.

Problems

The existing process has unnecessary manual steps, loses useful information, and undermines the dynamic nature of software development and deployment.

Using application software

The difficulties that can arise when attempting to use a perfectly good piece of software are outrageous. The most heinous case arises when the software fails due to an implicit dependency on a feature that is present in the development environment but not the deployment environment (such as a file location, a system library version, or a system call). Dependencies on software components are also a common and major source of pain. While the mechanisms exist to describe the dependencies and automatically resolve them, the current development process allows them to be incorrectly modeled. Why don't the tools that we use to develop software automatically capture the environment and software dependencies that are being used? There are many reasons, but no compelling ones.

The instability of deployed software is accentuated not only by the lack of modeling of the deployment environment and the incomplete capturing of dependencies, but also by the lack of versioning on all the artifacts involved in the process. Development tools do not support the ability to test the correctness of the deployment of software into a defined environment. Software is tested to work in the environment in which it is developed, and a great deal of extra work is required to ensure that it will work in any other instance of the deployment environment.

Reusing software

Similarly, finding a component that is suitable for use when developing an application is problematic. Although you may find a component that has the functionality you want, will it be compatible? Is it designed to run on the deployment environment you are programming for? Will it cause a software dependency conflict?

The result of these issues is that there are significant barriers to using components that are published by other developers for sharing. This leads to unnecessary software duplication and encourages monolithic applications.

Developer/user collaboration

Communication between the user of software and its developer (new feature requests, feedback, defect reports) is ad hoc at best and often not bothered with because it requires work on the user's part. Instead, it should be automatic, standard, and part of the user interface.

The proposal

The protocol

A protocol should be designed for the purpose of communication in the collaborative development-distribution-deployment process.

It should formalize:

  • What information is required by the deployment environment so it can reliably deploy software
  • The definitions of deployment environments
  • What information a software application developer needs to provide
  • What information a software component developer needs to provide
  • The method by which the user can communicate with the software developer
  • The responsibilities of each of the infrastructure components

The infrastructure

Infrastructure Diagram

The infrastructure components that use this protocol are:

  • A combined software repository server/publishing service
  • A software deployment service
  • Software development tools

How is this different from existing practice?

  • The distribution of software takes place as a specific transaction type as part of the protocol, rather than a download through a general file transfer protocol (such as HTTP or FTP).
  • Instead of packaging all the software artifacts into an archive format for download, they are stored on software repositories; individual artifacts are accessed as needed.
  • The software is not stored on the deployment environment (at least not conceptually).
  • Information about the software is accessible through the protocol.
  • Communication is made through the protocol, with each type of communication specified (feature request, defect report, design diagram), rather than through a user interface (such as SourceForge, Bugzilla, etc.).
  • Deployment environments are well-defined and verifiable (e.g., LSB 1.3, JVM 1.4, Perl 5.04).
  • Enough information is available to verify the compatibility of applications or components with each other or a particular deployment environment.

Clarifications

Where is my software?

The biggest departure from existing practice is the concept that software is not stored on the deployment environment. Consequentially, instantiation of a piece of software is dependent on:

  • The Internet (potentially slow and unreliable)
  • Another infrastructure component (the software repository)
  • The good will of software developers and maintainers to continue providing the service

Can anyone seriously consider not storing software in the deployment environment? Yes and No.

From a practical implementation perspective, No. There must be a mechanism which provides fast and reliable software instantiation. Given a transient network connection, there must be a local copy of the software.

From a conceptual perspective, Yes. Absolutely Yes. This is a deliberate and central idea.

Software should not be viewed as something that is copied and hoarded in an unconstrained and ill-formalized manner. In fact, it is the practical outcomes of this attitude that lead to many of the problems identified. The prevailing attitude fails to recognize a reality: The deployer does depend on the provider/maintainer already, whether you choose to recognize it or not. The failure to recognize this reality serves to magnify the timeframe and extent of problems when software ceases to be actively maintained.

The flip side of the software hoarding attitude is the "throw it over the wall" developer attitude. With this approach, software is developed and tested on a sole instance of the deployment environment, and when finished (and only then) is packaged and made available for installation on many instances of the deployment environment.

It is precisely because it's hard to deploy software to multiple instances of the target deployment environment that it's necessary to overcome the difficulty. This can be achieved by automating the process, doing it frequently, and improving proficiency, rather than leaving it as an unpalatable task at the "end" of the software development process.

The crucial point is to force the issue. That is, to force reliable instantiation in the generalized deployment environment as part of the default development process. If this isn't done, it creates dormant problems that show themselves in complex, obscure, and unresolvable ways, leading to ridiculous solutions, such as "Just reinstall the operating system".

Design principles

  1. The deployment environment is given full responsibility for providing reliable and repeatable software instantiation
  2. Published software must provide enough description to enable the first principle
  3. The software repository/software publishing service has the responsibility to continue to provide software once a deployment environment or another software publisher has registered use
  4. The software has the responsibility to manage the compatibility of application data across versions of software components
  5. The protocol must explicitly define the types of communication between participants (software component access, feature request, defect report, etc.)

Infrastructure

The component infrastructure proposed is analogous to the Web in that it is a client/server architecture. This analogy equates Web servers, browsers, and HTML editors to software repositories, software deployment environments, and software development tools.

Software repositories/software publishing service

The software repository stores the software. There are multiple physical software repositories that create a distributed global software repository, in the same way that Web servers create a distributed global document repository. Software dependencies can cross physical software repositories (just as hyperlinks can cross Web servers).

Software deployment service

The deployment service resides with (but is conceptually outside) the deployment environment. The software deployment service maintains the information required so that it can instantiate software.

Software development tools

Software development tools search software repositories for useful components. They automatically capture the deployment environment and software dependencies and communicate them to software repositories. It is part of the software development process to publish software prior to instantiation.

Features

While this essay does not intend to be prescriptive, but rather to communicate an idea, there are some key features that this mechanism would need in order to make the proposal workable.

Well-defined deployment environments

Having well-defined deployment environments is crucial to this mechanism of reliable software deployment. The features of the deployment environment must be defined. Software to be deployed in that environment can assume only those features and no others. Examples of deployment environments may include interpreters such as the Java Virtual Machine, Perl, or a particular version of an operating system distribution.

Verifiability of deployment environments

A deployment environment must be verifiable against its definition. For example, the Linux Standard Base (LSB) provides programs that can verify that a deployment environment conforms to LSB1.3.

Verifiability of software instantiation

Software must be verified as having no implicit dependencies. This can be achieved through automated testing of the deployment on a vanilla environment with only the explicit dependencies available.

Registration of artifact use

The deployment environment must register use of software so the software repository/software publishing service will continue to store that software (or a version of it) while it is needed.

Version control

Version control of all artifacts is a central design necessity if reliable software instantiation is required. Any change to an artifact must be reflected as a change in the version number of that software.

Support data management across software versions

It is the responsibility of the software to manage the data it requires on the deployment environment. However, support for the ability to automatically transform data or support old versions of interfaces needs to be factored into the protocol.

Authentication

Authentication must be part of the protocol, so that the software that is being used can be verified as being from a trusted (or at least known) source.

Details

The unreliable Internet

To implement this protocol using the Internet, a reliable mechanism will be required, probably involving caching on the local host or at least the local network segment. This is a crucial (and non-trivial) implementation detail because of the stance this proposal takes on the primary location of software storage.

Many approaches are possible to achieve the necessary reliability and speed, such as the approach used by the domain name system. It may be worth considering using Freenet as the central infrastructure component to deliver this requirement.

Protocol technology

The most obvious technologies to use for protocol definition are CORBA or XML/HTTP.

Information stored and published by the protocol

Information that is required by the deployment environment should be supplied by software developers.

Examples of information stored on the software repository and made available via the software publishing service are:

  • Software component name and version
  • Deployment environment name and version
  • References to software dependencies
  • Executable code
  • Source code
  • Licensing agreement
  • Requirements specifications (e.g., use case diagrams)
  • Design specification (e.g., class diagrams)
  • Test programs
  • Indication of the correctness of the software (alpha, beta, never created an exception, feature complete)
  • Maintainer details

The communication of the user back to the software developer

Examples of information created by users and stored on the software repository are:

  • Defect reports
  • Feature requests
  • Usability feedback

Development workflow

During development, the artifacts that make up any given software application or component are produced over time. The protocol must recognize and support software that is only partially complete, but also classify it as partially complete.

For example, a piece of software may have source code and an executable, but no unit test case. It would form part of the classification that it was incomplete but usable (assuming the slightly controversial nature of the example is accepted).

Further, the status of the software should be maintained during its lifetime. For example, if the software has a known defect (in this version), this must be made explicit.

The path

To implement this idea, support for the existing methods of software distribution and deployment must be part of the implementation. The protocol must support and provide a migration path for the existing archive files that are in use.

Stepping stones

Many of the features of the proposed deployment mechanism are transformations of features already available.

Much of the information required by the deployment environments is defined as fields in the RPM and Debian package file formats.

Much of the communication between developer and user has been decomposed by the SourceForge (Alexandria project) user interface. An implementation approach could start with creating a protocol (rather than a user interface) to access SourceForge functionality.

There is a correlation between the idea being put forward in this paper and work that has been done at Colorado University, the "Software Dock". There is plenty of analysis, published papers, and working code available from this project.

Summary

This essay essentially suggests the application of workflow automation and knowledge management disciplines to the software development-deployment-distribution process.

The main concrete outcome would be the creation of a protocol which would act as both a human interface and a machine interface between the developer and deployer of software.

This approach would result in less manual, more collaborative, and ultimately more productive software creation, leading to more reliable software.

Recent comments

03 Oct 2007 02:37 Avatar roblu

Software deployment on Windows
To be honest I am very rookie like on software deployment on the linux platform. On windows, I prefer using policy based software deployment/software distribution (http://www.specopssoft.com/products/specopsdeploy/) using Group Policy and for example the tool Specops Deploy.

A question, if you would do policy based software deployment on the linux platform, is there a way to do that today?

Best,
Rob

20 Dec 2006 05:01 Avatar Theimprover

Re: or not.


> admittedly, i breezed through the essay,

> but:

>

> what exactly is wrong with:

> cvs co whatever

> ./configure --some-option

> make

> make install

> whatever they are, abolishing them will

> put port maintainers out of a job :p

>

> "Why don't the tools that we use to

> develop software automatically capture

> the environment and software

> dependencies that are being used?"

> Well I don't even use vim's

> autocompletion features, I sure wouldn't

> want it trying to calculate all

> dependencies :p. I'm quite happing

> writing that into configure.ac.

>

> "... lack of modeling of the

> deployment environment..." Posix

> not good enough? That's why I write

> configure scripts (or rather have

> autoconf generate them for me). Why not

> let the host environment decide how IT

> wants things to be done, rather than

> doing this at the application level?

>

> "Will it cause a software

> dependency conflict?" not under my

> gentoo system ;)

>

> "developer/user collaboration"

> As a user, I'm quite happy to

> investigate how the developers want me

> to talk to them (whether it be mailing

> list, newsgroup, irc, bugzilla, forums).

> As a developer, I like the choice in

> choosing a communcation medium with

> users. If i'd rather help them

> interactively on irc, i'll do that. If

> I want bug-reports to be contributed via

> sourceforge style interface that sends

> an email somewhere I'll do that. This

> really doesn't have to be standardized

> IMHO.

>

> Verifiability of software instantiation:

> I couldn't agree more, but it doesn't

> take much for the developer to work this

> out and add it to configure.

>

> The essay resounds of what we're

> beginning to see and soon might have to

> suffer in the future M$ model (remote

> services, integrated automatic error

> reports, "registration of artifact

> use").

This might be just a matter of opinions.

18 May 2003 07:02 Avatar Puttel

Re: The problem I see
Good reply

26 Feb 2003 01:18 Avatar pij

Re: The problem I see

> The problem I see is that of maintaining
> configuration.
> First of all, if you want to have your
> piece of software installed in
> /u123/misc/ you have to specify the path
> each time you run configure.


You can easily setup a config.site with all common options you want to apply. This also enables caching of detected features across all installations.
The problem is, I'd have a hard job to teach my father about the difference between a source and a binary distribution, not to mention teaching him to build all his software by himself. Even at work at most 5% of the IT-Department is qualified to install software from source distribution, and even less would be willing to bother.

12 Feb 2003 14:50 Avatar mystran

The problem I see
I don't see a problem with getting the latest package, running configure for it, running make && make install...


The problem I see is that of maintaining configuration.
First of all, if you want to have your piece of software installed in /u123/misc/ you have to specify the path each time you run configure. This by far the easiest configure option to remember. Unfortunately the harder to remember options are also sometimes subject to change. What worked last time, doesn't necessarily work now, which means you can't really use a static script.


Another thing is "make install" which is supposed to install the compiled version. However, often this means your configuration files get overwritten with defaults, and sometimes old modules from the previous version are left floating in the filesystem, that might cause problems, when they don't work with the new version.


If I could have two things, I'd take a configuration repository for configure that remembers how it was called for the last version and the other thing would be that NO program EVER overwrite a config file on install if one is already present.


If you have to do these by hand everytime you update something, you start reading release notes and weighting the bugs against the pain of upgrading. No good.

Screenshot

Project Spotlight

Kigo Video Converter Ultimate for Mac

A tool for converting and editing videos.

Screenshot

Project Spotlight

Kid3

An efficient tagger for MP3, Ogg/Vorbis, and FLAC files.