For Better Software, Consider the Layers

(Adapted from a discussion I had earlier this year about why we don’t just “code the feature” and what software developers actually do)

layersSoftware is undeniably a many-layered thing… but I’ll spare you a bunch of weak comparisons with certain desserts.

Though most of the layers in software are invisible, their selection and our level of investment in them dictates a range of not-so-subtle factors that are apparent in the quality and experience of the end result, either immediately or over time.

On the surface, software interacts with the real world via a User Interface (UI), or perhaps by sending messages and emails to a user. These upper layers are primarily visual and need to mesh well with the user’s own ideas about what they are trying to achieve with the system. This is the world of User Experience (UX), and it is usually possible to describe these layers in terms of stories and to draw them as wireframes or more detailed designs.

Meanwhile, way, way below at some of the very lowest layers, we have code in various programming languages, formally compiled or interpreted, running as components on one or more machines, and communicating via network protocols. These lowest layers — though we may be forgiven for not realising it — are the actual software, as it executes. All of the layers above are simply abstractions; ways of hiding and dealing with the underlying complexity, to help us focus on portions of it and to relate it back to the real world.

What do those abstractions and layers look like? They are the architecture, modules, components, packages, subsystems and other sub-divisions we create to explain software. Though some may be drawn for us, we choose carefully where to draw these lines, and we try to name them so their impact on the real world result, or processing of it, is understandable and uses similar language. We often draw diagrams about them, talk about them and then build them in code.

Between the extremes of the top and the bottom layers, there is a journey… From the visual and less formal, towards achieving a result via the inevitable rigour and formality of the lower layers… and back again to the real world.

There simply is no way to avoid the need for that eventual lower-level formality, as the software must run on a machine that requires it and which understands no subtlety. It’s just a question of the layers / abstractions we choose to create en route and how much we decide to invest in them, that dictates the experienced functional and non-functional quality of the overall end result: Whether it works as intended, whether it always works as intended, how quickly it responds, whether it scales, how much further investment is required to change it later, etc.

It is almost as if our acknowledgement of and attitude to the non-obvious layers, formalities and rigour has an impact on the visible quality of the end result we achieve. Again, though we may be forgiven for not realising it.

Choice and investment in the layers isn’t about black-and-white decisions, but rather a series of compromises and considered choices, usually made entirely within the engineering team. These choices can sometimes be hard to explain or justify outside of that team, which is why I’ve found this explanation of layers can be useful.

It is possible not to invest much at all and to pick a more direct path so that end results are delivered quickly. This is often sensible to validate ideas before further investment. It is also possible to over-invest, so things are delivered slowly but where much of the effort never impacts the end result and is wasted.

Factors to consider investing in, that have a knock-on effect on the end result, are whether the chosen layers and abstractions are understandable (using language and concepts reflecting the real world), testable to minimise mistakes, reusable to minimise wasted code, loosely coupled with others (so changes in one don’t unnecessarily impact another), documented so further work is easier, and structured so that they can be adapted when the need for change arises, as it inevitably will.

Under-investment in various software layers, or their restructuring, can result in what we call “technical debt”. This is where the work required to make further changes at those levels is hampered either by illogical structure, redundant code, quick workarounds or lack of prior thought. Some degree of technical debt is understandable but, beyond a certain point, it will impact the overall result, either in terms of quality or the cost and timescales for making changes.

Luckily, with software, we can retrospectively invest in these layers and even change our choices about which layers to focus on. This allows us to begin lightly, with less investment, to get a result or to validate whether an idea has benefit, then pick layers to invest in for stability, quality, longer-term results, etc. This retrospective investment is what engineers sometimes call “hardening”, and the changes it implies are often referred to as “refactoring”. They may result in no apparent change, but they are retrospective investment that may drastically affect the quality of the end result, whether now or in the future.

Choice and investment in software layers, whether visible or non-visible, is an ongoing and never-ending process. It is also a discussion about resourcing, quality, deadlines, compromises and desired outcomes. All that is required is that we remember that the layers are (or should be) there in some form, and that time spent considering them affects the end result in very tangible ways.

Integration-Driven Development

It’s a familiar stereotype: The lone software nerd. Working in isolation. Rarely interacting with anyone.

In reality, this particular stereotype couldn’t be further from the truth. Everything we work on as software developers integrates with something or someone else.

cogsIt might be behind the scenes by providing or using an API, or by communicating with another module, an external system or a database. Or if we’re building a User Interface, that’s a visual integration with another person’s expectations and mental model of a task they want to perform. Granted, it’s much more creative and fluid than the ones mentioned earlier, but it’s an integration with another party nevertheless.

Nothing we develop exists in isolation: Integrations and interactions are crucial, and this usually means working with other people, either as end users or as the creators of the systems we need to integrate with. So much for that loner stereotype.

But we’re also human…

When we begin a development task, there’s a natural tendency to go depth-first and to work on a piece as we understand it at first. We like to work on the bit we feel in control of, and to feel “ready” before we expose it to others.

As it turns out, instead of this seemingly productive focus leading to our “best” work, once revealed, it can often be detrimental, both for us and for the project. Why…?

  • We’re making assumptions about the integration and interactions that are inevitably needed and, by not validating them, risking the need for re-work.
  • We’re delaying exposing our work to others at which point their own assumptions can be tested, risking re-work on their part.

The answer, contrary to our natural inclinations, seems to be to begin integrating and exposing our work as early as possible.

How to integrate early?

For non-visual work, we can actually build the integration, the API, the interface with the other module or system, as early as possible. This may involve a certain amount of mocking up and simulation to get it working, perhaps using fake data, but we can do this quickly and cheaply as those parts will be thrown away towards the end.

One crucial thing to remember… is to actually throw those mocks away and not to accidentally leave any behind. It’s a common mistake to go live with mock code remaining, so it’s best to mark it in some way or to use obviously fake data so as to be embarrassingly obvious.

For visual work, a lower-quality version with just the interactions working is a good starting point; a form of living wireframes, minimally demonstrable. Something people can play with before any visual polish is added.

There are undeniable challenges I’m ignoring here, as the incremental development from living wireframes to pixel-perfect renderings of designs can be a difficult transition that ignores fundamental differences between the two. Design work often leads to an internal restructuring of visual implementations. That said, these are problems worth tackling, as exposing something people can interact with early potentially prevents much rework anyway.

Validating assumptions about the complex and nuanced interaction with human beings is best done as early as possible.

Visual work also needs to integrate with and test assumptions about the data and other non-visual interactions upon which it will rely, if only informally, as mistakes here can also lead to rework.

What improves?

If we can convince ourselves to integrate early, and on an ongoing basis…

  • We expose our work earlier to others. It’s not only helpful but healthy. Perhaps time to get over our natural tendencies to perfect then reveal.
  • We often learn something unexpected that shapes how we build the bit we’re working on, and so avoids rework.
  • Expectations and assumptions can be tested; both ours and those of others.
  • A basic working version of the overall system is achievable earlier, whereby end-to-end assumptions can be tested, perhaps with stakeholders, and problems spotted.
  • Avoids the last-minute integration and “Oh, s***!” moments that so often lead to project delays.

What are the challenges?

Just like integration, change is inevitable.

Doesn’t this mean we should integrate later, once we “know everything”? Don’t early integrations just set us up for rework when change occurs?

I’d argue that early integration means we know who else is affected, and how, so we can swiftly make the change and fix the integration, rather than storing up the change for disclosure during late integration. It’s better to absorb the impact of the change early, including any knock-on effect. Learning can take place sooner and in all the places it is needed.

Due to earlier coupling, we do however need to be mindful of timings and upfront about the impact on other people, rather than just “breaking” integrations and waiting for others to discover them. Checking others are ready to absorb the change minimises disruption and provides a valuable way to communicate the change. We need to move sympathetically with the needs others.

Rather than looking on early integrations as inevitable victims of change, they are also a valuable source of it. Without early integration, we may be delaying discovering the need for the change and storing up the disruption for later.

 

If we do it early and on an ongoing basis, this breadth-first early approach to integration not only avoids late discovery of problems and minimises the need for re-work… it also makes software development quite an interactive activity.

So much for the loner stereotype!

“Don’t worry about the details”,… said no software developer ever

bubbles1bDevelopers spend our days deep in details, usually a thousand and one of them. Combining and orchestrating them up to the point where they collectively deliver something tangible to other — often non-technical — people.

Up and down the stack we go, from the bigger picture down to concrete implementation details, and back up again.

Good software developers can traverse that stack all day long; speaking to people or explaining concepts at all levels… in language appropriate to that level, using abstractions and terminology that hide the detail below, but always being aware that the detail is still there,… waiting to be dealt with.

We know the bottom line is that the details have to work, or there is no bigger picture.

The trick is learning the right times to get into those details, how deep to go, and when to acknowledge that problems at one level have an impact on a wider scope. What we try not to do is get lost in the details in ways that aren’t relevant to someone who just wants to know about the overall solution or a particular layer of it. We try not to baffle you or drag you down the rabbit hole with us,… and sometimes we succeed.

In most other professions, you operate at one level and you remain there. Not so with software, where we work with the details but orchestrate and explain them at a level appropriate to many audiences, whatever they might need. We deal with customers and other developers, and everyone in between.

So if you ask a software developer about a solution or talk to us about a problem… we’re usually traversing the stack as we talk, picking out what’s relevant… choosing language… hiding details, unless they impact you… storing up things irrelevant to you that need investigating later… trying to be open without being complex,… honest but not alarmist. And if we’re good, you can’t tell we’re doing it.

The details matter to all of us eventually. But it’s our job, as developers, to figure them out.

For a Coherent Software Product… Draw More Diagrams

Pencil, By Juliancolton (Own work) [Public domain], via Wikimedia Commons

Unless you’re building it entirely on your own — and even sometimes then — for a unified and coherent software product, you need prominently-shared diagrams of what you intend to build.

Most software products are built by more than one person; more than one mind. Most are built by more than one team and usually across more than one skill or discipline. That’s many backgrounds, many viewpoints… but still one intended product in which they all need to be unified.

No-one would imagine coding an API without a spec, yet we often build systems without a diagram of the architecture, components, boundaries or intended flows. Retrospective diagrams often never happen, or happen way beyond the point it became painfully obvious they were needed. We over-rely on the verbal signs that we are aligned, without backing it up with a diagram.

Without a shared vision of the intended outcome, we’re likely to translate any purely verbal descriptions into different mental models, and onwards into different technical solutions… which then need to be aligned. “Oh I didn’t realise”, becomes all too common. This is particularly true in places where multiple disciplines overlap: Product owner, UX, front end, back end, operations, marketing, etc.

Everyone sees their piece of the puzzle, but diagrams confirm how they will all fit together. Without them, we introduce opportunities for hidden agendas and unnecessary extras to creep in. Human nature dictates that, where ambiguity exists, we tend to work silently to our own preferences rather than seeking a unified team answer. A diagram can provide that answer, or at least highlight the blank space in which it needs to go.

What to draw?

Whatever you find yourself discussing… but especially user interfaces, architecture, components, flows, and absolutely all inter-team or component boundaries.

How many drawings?

Just enough… but at least the highest levels of the product, then break down where more clarity is required and particularly where team boundaries exist.

Not too many though… or no-one will look at them. Find a happy balance.

What to draw with?

Informality, and a shared medium… Keep diagrams informal, because speed is important in encouraging capturing of info. Use a tool for more formal diagrams, but a tool that everyone can access; the person drawing the diagram will not necessarily be the only one changing it or adding to it later.

Where to draw them?

Visibly! Stuck away in a repository or a wiki where people rarely (or begrudgingly) go, diagrams are next to useless. Get copies of them on up on the walls too.

Update them!

An obsolete diagram is worse than no diagram. Think living diagrams!

When a change arises, update the diagram now, even if only with a scribbled annotation on a printout to be added to the original later. This is why informality often wins; there is no pristine copy that needs painstakingly updating.

Assign someone the action of updating the diagram, and make sure that they do.

Encourage diagrams

Make sure whiteboards are available, with pens (if you have to hunt, the moment will pass). Also make sure wall space is available, on which to stick diagrams that will live for longer.

Whiteboards aren’t just for visual disciplines; encourage engineers to draw components, flows and architecture too. Even short-lived diagrams, such as those drawn in one meeting, can unify people at a point in time. Get everyone comfortable with the idea of standing up with a whiteboard pen… and drawing. It should be a daily occurrence.

There should be a blank whiteboard usable by anyone who feels the need, at any time, and preferably within reach. This is a cheap resource, so over-provide. The improvement in communication, and in a coherent product, will be worth it.

Why Early UX Should Scratch Deeper than the UI

uxThe early stages of software engineering projects aiming to expose anything more than a trivial User Interface (UI) quite sensibly begin with a heavy focus on the User Experience (UX), as this dictates a great deal of what the software needs to do. This is usually a stage of the project that produces many wireframes, mockups, and a sense of what the user — or different user “personas” — will be attempting to achieve with the software.

This UX-centric approach undeniably works really well, and tends to result in software that delivers a strong, usable user experience.

However, I’ve seen a UX-/UI-only focus in the early stages of some projects lead to blind spots and incorrect assumptions which can force costly rework later in the project. For this reason, I’d assert that early UX work should scratch deeper than just the UI, if only to validate assumptions and, in-particular, to confirm what’s actually possible in terms of the underlying data and services.

Server-Side Constraints

Too often, in UX-centric early project stages, the assertion is that the server-side data, APIs and services are much less relevant than the UI. This is often correct as it can distract from the goal of designing usable software, but there’s an underlying assumption that needs validating: That the server can deliver anything the UX/UI work dictates it needs.

This is often not the case… If the server-side data model is built on serious data modelling or clever algorithms, those services and that model are often constrained in terms of what they can provide. There may be heavy computational costs involved in exposing certain data in a timely manner, if it’s even possible.

As always, using specific examples of server-side data in UX work can really help to clarify what’s possible. Even just involving a few token Data / Server-Side engineers can be enough to verbally “sign off” that the server will be able to support what’s required.

This form of early full-stack validation can prevent costly rework later, and can even help to direct server-side work appropriately by providing added context about what the UI needs.

Server-Side Data Model and User Mental Model

An added benefit in considering the server-side data model during UX work is that the user’s own mental model of what they are trying to achieve when they use the software is so often a version of that server-side model.

Considering, albeit leanly, how we might model data on the server can help to clarify the ways in which a user might expect (via their own mental model) to visualise, interact with and modify that data. After all, this is effectively what they’re going to be doing by using the software; accessing and manipulating server-side data and services indirectly via a UI.

Even just tying together the correct terminology, so UX and server-side work are talking the same language, can prevent costly disconnects between different parts of the team.

Again, precise examples of server-side data, or a little involvement of the relevant engineers, can really help here.

Lean Approach to Scratching the Full Stack

But doesn’t involving server-side engineers and data scientists in UX discussions slow down those discussions? Doesn’t it negate the value of considering the software purely from the point of view of the user at first?

I’d say not, but only if it’s clear that the focus of this stage is still primarily UX, and that any deeper layers in the stack are to be validated only. Crucially, any details of how the eventual UI will interact with the server should be avoided, unless they too dictate what is possible in the UI. No-one needs to build the server at this stage (though that work might be proceeding in parallel), so long as they can talk in as much detail as required about what it, and its services, will look like.

 

I think this is another example of how, in software engineering, regularly considering the bigger picture, even when performing early focussed work, can avoid costly mistakes. In complex systems, the bigger and the smaller views are often related or constrained in unexpected ways.

Distil Complexity, Nail Theory, Build Software… Just Not All at Once

questionmark

Trying to pat your head whilst rubbing your stomach is the standard way to demonstrate that we, as humans, find it difficult to perform certain activities concurrently.

The software project equivalent is attempting to overlap activities that teams find it difficult to handle in combination. It’s just a reality that some things are hard enough on their own, and so we hamper our efforts if we attempt to combine them.

It’s been my experience that projects where there is at least an attempt to separate, as much as possible, the following activities are those that experience fewer hiccups. It may still be necessary to iterate through these activities regularly, and they are not waterfall-style one-offs, but being aware of the complexity of doing more than one of them at once can help to simplify things for teams and to ensure one flows into the other:

  • Distilling Domain Complexity – Every software product is built to deliver a solution in a unique problem domain. Often, there is much to clarify in this domain, and much complexity to unravel and distil. This can be something of a voyage of discovery, particularly if it takes a while to extract that knowledge from a domain expert and to figure out which aspects of it the project will require. As building products for a market can be like chasing a moving target, it is highly likely that this domain-related activity will be revisited many times during a project;
  • Nailing Academic Theory – Beyond well-worn Software Architecture and Computer Science principles, a project may rely upon deeper academic theory, such as Artificial Intelligence, Machine Learning, Big Data… or non-tech project-specific areas such as psychology, natural language, etc. Each of these alone are huge fields, full of theory. The problem is, you can’t build software with theory, and so we need to pin down practical forms of that theory that will be implemented in software. This may involve prototyping, comparative analysis of different approaches, and finally a choice between palatable options.
    There is a tendency to linger over these decisions, and an attractiveness in remaining in that land of possibilities. However, software engineering requires knowns, not possibilities, and the sooner theory is nailed down — for this project or iteration anyway, as we can always make different decisions next time — the better.
  • Building Software – Software engineering is concerned with building stable, scalable, understandable software to meet a need. There is sufficient difficulty in doing that in isolation, without overlapping with the previous activities. For large chunks of time, teams need to focus on building software, unencumbered by academic theory or domain ambiguities. It is almost an unwritten assumption on some projects that most of the difficulty lies in the previous two activities (problem domain and academic theory), and that building the resulting software solution will be comparatively easy or, at least, will be more of a known activity. However, the complexities and practicalities of building software require that teams focus on the activity where possible.

Although we may revisit each of these activities many times during a project, identifying and managing the boundaries between them helps to isolate them from one another. One great way to do that is by creating and maintaining great team documentation: Domain knowledge and concrete decisions based upon academic theory are perfect examples where documentation can form a neat output from one activity (albeit perhaps occasionally updated), to be consumed by another. Documentation needn’t be heavy and cumbersome and writing just enough, but before it is needed, should be encouraged.

I wonder, how many times that we’ve seen projects go awry could we identify examples of these activities bleeding into one another too much? And, where projects have gone comparatively smoothly, I’d be interested in hearing whether it was because these activities were understood to be complex in combination.

I guess it’s all part of improving the task of building complex software, with a human team.

In 2015, Tech Will Continue To Be the Enabler, Not the Whole Story

10251565_830162623679537_14497031_n

“The Year of Big Data”

“The Year of the Cloud”

“The Year of the Internet of Things”

These are snippets of New Year headlines predicting the year ahead… but from previous years, some as far back as 2011, not just from this January.

These technologies have been around, in some form, for a number of years and are inevitably constantly evolving. As with all reasonably new technologies, we have a habit, in January, of hailing this to be the year that they will make their mark. I’d argue all of these have already made their mark in some way, and will continue to do so as the relevant tech improves. But the real story — the real mark — is when organisations commit to and make genuine use of these technologies in business and human contexts.

The real Big Data story isn’t Hadoop, or Apache Spark (though they are the tech enablers)… it’s when a business consumes and makes use of petabytes of incoming and historical customer data to make timely decisions that optimise and improve (and, of-course, monetise) the experience of each customer. Just warehousing large amounts of data and, theoretically, being able to perform distributed computations against it isn’t really a Big Data story; we need to make use of it, in ways that impact and have the buy-in of the wider business. Many businesses have begun doing this in recent years, and that  is the real story.

The real Cloud story isn’t Heroku or Amazon Web Services (AWS), or SaaS, Paas or anything else ending “aas” (though they will continue to be the tech enablers)… it was the point at which businesses could make real deployments of their apps & services, beyond physically renting rack space or buying their own hardware, and could scale those deployments up and down at will to suit their ever-changing needs. It was the point at which budding entrepreneurs could use it to do the same, and bootstrap a startup from almost nothing.

The real story of the Internet of Things (IoT) probably isn’t any of the tech-based articles we’ve read so-far (though those early devices are examples of tech enablers), but will probably be when we actually monitor and respond to our environmental impact, manage infrastructure, optimise energy usage and diagnose or treat medical patients… on a large scale, via the use of connected devices. Some of the devices we’ve seen so far are heading in this direction, but they seem to be more about proofs of concept and novelty, and less about genuine benefit at this stage. So perhaps these are still early days for IoT and we’ve yet to see it make its real mark or write its real story, which will involve way more than the devices themselves.

So whilst it’s great to hail this year (and previous years) as the one in which certain technologies will make their mark, it’s worth remembering that they are merely the tech enablers in a wider business & human story, which is where their true mark will be made. Or else, what is technology really for?

The Best-Laid Plans of Task-Based Engineering

018-linkedinWhen faced with the challenge of building software of any considerable size, we rather sensibly tend to begin by breaking down the complexity and fleshing out a design or architecture. Then we break down the work involved in building that design into tasks or user “stories”.

Sometimes we group stories into wider “epics”, themes or stages, but generally we estimate, schedule, and conduct the work of building software at a task level: Tasks are what we assign, place on agile boards, and discuss our progress in terms of.

(I’ll refer to everything as “tasks” from here on, but I also mean user stories, and much of what I’m referring to also applies to bug fixes and other ticketed items)

Tasks are abstractions of the underlying engineering work required, and a task-based view of engineering makes the process of building complex software more understandable, manageable and distributable as a team. But it also tends to ignore certain truths about software complexity, tasks themselves and the ways we actually work on them, both on our own and together in teams. In-particular, the tools we currently use to manage task-based engineering are rather restrictive and cumbersome.

So, given that it seems we need task-based engineering, what is it about the way we approach it, and current task-based tools, that can hamper our efforts?

Tasks Abstract Away Complexity

A task is simply an abstraction of the work involved in building a software feature, a use case, a component or maybe just of fixing a bug. As with all abstractions, it necessarily hides detail that, for all but the simplest of tasks, may be discovered later when we actually work on it, despite our best efforts at planning time.

Sometimes, as the true complexity of the work involved in a task becomes clear, we realise that we should restructure the task to break it into more manageable chunks, and that sub-tasks should ideally replace some (but perhaps not all) of the original parent task. This discovery of structure and complexity may occur several times, to an arbitrary depth, and perhaps only after part of the original task has already been completed. We may even discover aspects of the original design, and the tasks slated to build it, are wrong.

Rather than implying poor planning, this is simply a part of the discovery process and the complexity inherent in building software, particularly in an agile manner where we’re trying not to pre-think too much up-front. We need to accommodate that inevitable change, reflect it in the tasks we are working on and in those slated for the future.

Most current task management tools aren’t great at letting us change anything more than very small numbers of tasks and task-based planning. They certainly don’t help us to visualise the results we end up with. In a sense, they leave us with quite brittle and sizeable task-based plans. With some persistence though, it is usually possible to reflect the required changes, though this may require time that isn’t available and we risk the task-based view drifting away from the engineering reality.

What can also be particularly tough with current tools is arbitrarily defining sub-tasks to any depth, as we discover complexity and break down a piece of work further. We may need to have parent and child tasks representing slightly overlapping and part-completed functionality, whilst being able to see estimates for all of that work. I’m not sure that current tools allow much of this without significant discipline in our use of them. In this sense, they are too brittle to allow us to reflect the engineering reality beyond our (now outdated) initial plan.

Tasks Need Surrounding Context

Our planning process envisioned the whole (a delivery, a completed system, or a bunch of features for a sprint), then broke it down into tasks. Quite often, rather a bewilderingly large number of them. The tasks somehow lose our view of the whole, and it is hard to determine whether those tasks, collectively, still make up a meaningful delivery or omit anything crucial.

It seems like we need to retain some visualisation or record of the structure by which we arrived at the individual tasks, so we can work backwards from the tasks to the whole. Even when working on an individual task, we often need to get enough surrounding context to work on it, as each task rarely describes that in sufficient detail.

Being able to see where tasks fit into the bigger picture is particularly necessary when our plan inevitably changes over time, to avoid us getting lost in a changing sea of tasks, the intended outcome of which we can no-longer visualise clearly.

Current task management tools don’t allow us to retain the more complex structure of groups of tasks. We are limited to “epics”, tags, simple sub-tasks and basic categorisation. We need better ways of visualising how the tasks fit together to form the whole that we’re working towards, allowing us to spot omissions and to track progress. After all, the collective structure of the tasks will probably be as complex as the structure of the overall system we’re building.

Tasks Need Collaboration (Sometimes)

No-matter how much we break them down, often a task requires unexpected collaboration with someone else (together, or a hand off then a hand back), integration with their work or reaching an agreement. That micro-collaboration is hard to model, but crucial to a task’s completion, though we often assume (perhaps hope?) it is irrelevant.

Whether or not one person “owns” the task and the responsibility for its completion, several people may need to do work on it and must find time in their workload. This is particularly true with user stories, as we don’t tend to break them down into chunks any smaller than a story we can define in business / product terms.

Current task management tools suggest a single assignee per task, or multiple assignees but with no record of individual effort required or how the collaboration will occur. We actually need to reflect a more complex sense of ownership and collaboration, even if most of the work will be done by one person with fleeting collaboration at some point. Failing to capture this in a basic form can be a cause of tasks being unexpectedly “blocked” during sprints, or the logical next steps towards their completion being unclear.

Tasks Have Dependencies (Sometimes)

Sometimes, a task depends on other tasks being completed (or part-completed), or a bunch of tasks need to happen in a certain logical order. Often a task that is scheduled to be completed this week or this sprint is blocked by an implied dependency that won’t be resolved anytime soon. We must model dependencies, or risk impacting our progress if we discover them late, or if many tasks get blocked on a key resource that we should have planned around earlier.

Current task management tools can model dependencies clumsily at the task level, but aren’t good at helping us to visualise them. This can often be another cause of tasks being blocked during sprints. They certainly aren’t good at modelling dependencies between groups of tasks, and we tend to have to visualise those ourselves: e.g. All tasks in Stage 2 depend on the tasks in Stage 1.

A visual representation of dependencies at a task and group level, and ability to move them around, would give us a more efficient view on how best to approach those tasks in a sensible order.

Task Completion Often Isn’t Clean

Often a task is almost completed, with a small piece of work to be finished, perhaps when unblocked by another person or after a dependency or decision is resolved. The inability to easily mark the task as partially complete, or to define the remaining work (perhaps in a sub-task), hampers visibility of the true state of the task, particularly on agile boards.

Creating a new sub-task can be cumbersome, and it needs adding into the plan or sprint, in correct relation to the original task, so it isn’t forgotten or confusing when seen in isolation. Often the administrative effort leads to us just leaving the original task open until the remaining work is completed.

Current task management tools assume task completion is clean and atomic, leading to tasks remaining “in progress” when they are often 90% complete and the tool should allow us to easily reflect that. This often hampers visibility of remaining work, particularly close to the end of a sprint.

So What?

Clearly, I’m pointing out some broad problems with simplistic views of task-based engineering here and suggesting some very general solutions. Not all of these problems will apply to a project but, particularly with larger teams, I’ve seen many of them and their impact is real. Some teams tackle these problems well as part of the way they work, but others clearly struggle with the realities of task-based engineering.

A quick summary:

  • Tasks are valuable abstractions for understanding and managing engineering complexity, but the methods and tools we currently use to manage them are rather rigid. We need to be able to visualise and change large numbers of tasks easily, see their overall status and visualise how they combine to form a planned delivery or feature;
  • We need to be able to break tasks down to any level, describe what overlaps and what’s already complete, so we can model our fluid discovery of complexity but retain an ability to estimate and gauge the remaining effort involved. Again, current tools and methods are too brittle;
  • We need to be able to model and visualise task and task group dependencies, where they matter (they don’t always), or even just record that entire swathes of tasks must take place in a certain loose order. Current tools make this too cumbersome;
  • We need to model the necessary collaboration within individual tasks, both simple and more complex, so we can be aware of it. Current tools are too cumbersome to make this effective or even worth the effort, and agile boards are too simplistic.
  • Most crucially… All of this must remain light, easy to visualise and easy to change. Current approaches and tools are way too cumbersome, and often work against us.

As they say, “Your mileage may vary”. So I’d be really interested in hearing how this ties into your own views of task-based software engineering, whether you encounter any of these (or other) problems, and if you think the kind of broad changes I’m suggesting might improve things.

One thing is certain: Task-based engineering is here to stay!

Beyond the Geek Apprenticeship

forwardWhen you’ve immersed yourself deeply enough in programming to write a ray tracer in 68000 assembly language in your teens, it’s safe to say you probably aren’t going to become an accountant. So whilst I know some folks struggle to find theirs, my own career path was always rather obvious.

Since graduating from uni a not-insignificant number of years ago (!), I’ve been lucky enough to have worked for some truly inspiring folks at some amazing companies and on some very cool problems. I’ve had a chance to help build software for many different industries and sectors. I think that gives you a really valuable perspective on what we, as software engineers, can actually offer.

More importantly, it also gives you a great sense of what’s possible with this universal toolkit called “software”. For that’s all it is; a toolkit, and nothing more. As I’ve said before, it’s what we do with that toolkit that counts.

I’m mostly interested now in exploring what’s possible with technology, and what we can build with it to benefit each other. I think there’s a certain obligation to look up from what we’re building occasionally and to ask how it helps us. If the answer isn’t obvious, it probably doesn’t. Self-serving industries, with little benefit for those outside of them, no-longer interest me.

I realised that the earlier part of my career was probably naturally something of a travelling tech apprenticeship: You work in various corporate scenarios, learning, building and applying what you’ve discovered.

Beyond any apprenticeship, the bigger question always comes: What will you do with this now?

Unlike my initial career choice, the answers aren’t as obvious… but, if I’m open to finding and exploring them, they’re probably going to be at least as interesting.

Scaling Software Products, in a Nutshell: Why, When and How

Startups building software products tend to think about scalability as an after thought, quite understandably. There are far more pressing things to focus on first, like identifying a product that solves a problem, for a paying and addressable market, delivering value to that market, and then collecting revenue from them for doing it. All of that must take centre stage, at least initially, particularly if you’re following a lean approach to getting started and are launching to validate an idea with an MVP. (Hint: Too much thought about scalability probably violates the “M” for minimum).

At an early point in the lifecycle of a product however, someone inevitably asks “Does it scale?” — Meaning, will the product continue to work if we have 10 times, or 100 times, or 1,000 times the current number of simultaneous users? And, more interestingly… at what point will it start to fail?

Not surprisingly, the initial answer is often “I don’t know”. And this is a concern.

So here, in a digestible nugget, is what I’ve learned about the huge topic of Software Scalability in 20 years as a software engineer, so-far (I’m sure I have much more to learn). I’ve focussed mainly on server-side scalability of web applications, and particularly those deployed on Amazon Web Services (AWS) or Heroku (or similar) here, as that’s my area of expertise and close to what most startups are looking to scale. But many of the ideas here also apply to other types of software product.

SO WHY EVEN THINK ABOUT SCALABILITY?

In two words: Customer Experience.

If your customers have a bad experience when using your product — a slow response, a timeout, an error message — they will, in all likelihood, look elsewhere. With poor scalability, you can literally lose customers, often before you’ve gained them.

The basic — and often overlooked — software engineering fact that drives the need to consider scalability is that what works well in your Development environment — usually on your laptop — and seems to do similarly well when initially deployed, perhaps to AWS or Heroku with very little traffic — and by traffic, I mean users — may not work so well when you publish your first press release, get mentioned on Hacker News, or in the days after you give your first kick-ass presentation and everyone flocks to your site to find out more. These are all happy events… but only if your product scales to meet the increased demand, which naturally requires increased resources.

Being able to offer every visiting user a reasonably good experience of your product  is as key to success as figuring out what your product should be. A great product that is slow to respond is unappealing at best. A product that is hidden behind a timeout message is essentially worthless.

WHEN TO SCALE?

Ideally, just before you need to.

Metrics – Just like forecasting the weather, gathering and using metrics is the only real way to tell how your product is doing, whether it is handling the current load and to predict where the limit and the degradation in user experience might begin to occur.

Instrumenting your product code (by capturing your own timing stats, perhaps using Aspect-Oriented Programming or a third-party library), measuring throughput (hits/sec, average response time, etc) and then capturing and aggregating those metrics using a product such as New Relic, or building your own performance dashboard, are all great ways to expose the current performance  of your product in near real time.

Spare Capacity – Metrics only show how your product is currently doing. What you’re more interested in is the spare capacity: How much more traffic you can handle before problems start to occur. If you are comfortably handling 1,000 requests/minute right now with a decent average response time and customer experience, but you aren’t sure whether the 1,001st request will start to cause timeouts, then you know nothing.

This is where Performance Testing, in a separate but (crucially) production-like  environment, lets you figure out by experimentation where the limits of your system lie. Regularly performance testing and benchmarking  your product, graphing the results and determining at what point a single server begins to give an unacceptably-degraded user experience, lets you forecast the capacity and breaking point of multiple servers handling the load between them. This, in-turn, gives you a picture of the spare capacity you currently have. Even better, baking all of this into your product metrics, and reporting a percentage capacity remaining, lets you know how close to the need to scale you currently are.

It is worth saying that any such notion of capacity needs regularly reviewing, as changes to your product, the hardware it is deployed on, or other random factors can change capacity in unforeseen ways.

Automation – Metrics alone aren’t enough. It’s no use having great metrics if, at 3am when your product gets mentioned on Hacker News, you don’t notice the hits escalating, average response times dropping and the number of timeouts going through the roof. Automation and scaling policies  allow you to define conditions under which you’d like to scale up, even when you aren’t around to make the decision yourself. Unless you plan on watching them 24/7, or getting out of bed to answer the Pager Duty call, your metrics, and in-turn their notion of dwindling spare capacity, must trigger some sort of scaling action.

Cycles – When to scale is a never-ending “when”, and it is often cyclical: As well as wanting to scale up to meet increasing demand, you might want to scale back down again  in quiet times to avoid wasting resources. Any scaling policy should also define how much spare capacity is too much, if only so your AWS bill doesn’t hit the roof and wipe out your profit.

Metrics, coupled with automation tell you when to scale, but not how.

HOW TO SCALE

Scaling UP

scale-up

Scaling up (often called Vertical Scaling) involves running your product on a “bigger” server, meaning a combination of more processing power (CPUs / cores), and/or more memory. How much more of each you require is determined by which limits you are hitting. If your product is CPU-bound when under load, you’ll need more processing power. If it hits memory limits, then add more memory. Usually, a mix of both is required to scale up effectively.

Scaling up is easy to accomplish because it requires no architectural changes to your software. You just deploy your product to a larger instance (on AWS or Heroku), or buy a bigger server if you are hosting the product yourself. The only problem is that it needs to be a pre-planned affair, unless you can seamlessly upgrade your instance as the need for additional resource grows.

Scaling up should, however, be your first move. You should be running your product on a box that will — on its own — meet projected capacity for a while without needing to scale further.

Scaling OUT

scale-out

Your next move is to scale out (often called Horizontal Scaling). This involves distributing traffic across multiple servers. Hosting providers such as AWS provide load balancing technology to achieve this, spreading the load across however many servers you currently have running.

Scaling out introduces some architectural challenges but, unlike scaling up, it can be adaptive: You just add additional resources to meet demand, and remove them again when you no-longer need that capacity (though there are challenges in scaling down too; see later).

One of the main challenges with scaling out is that the state  of your application is no-longer held in memory within one server, and is distributed between them. If you are caching certain information, for performance reasons, one server’s view of that information may be out-of-date if a change is applied via another server. Simple solutions involve techniques such as “sticky sessions”, whereby individual users are always load balanced to the same server instance, whilst it is running. AWS offer this as an easily-configurable option. But the state of your application may not be easily subdivided by user. You may need to use your database as a place to update shared data transactionally, such that race conditions don’t cause you to lose or mis-calculate data.

Other things to consider are the use of a message bus  (such as RabbitMQ or Amazon’s SQS), such that information may be shared between servers; even something as simple as telling all instances when to invalidate specific data in their cache, can be easy to implement via messaging.

The architectural challenges introduced by scaling out are beyond the scope of this article, but the benefits of tackling them and being able to add spare capacity whenever you want are certainly worth it.

Incidentally, I mentioned scaling down being a challenge. You can’t simply remove spare capacity from behind a load balancer, because the instances may be in the process of handling a request from a user, resulting in an error or timeout. AWS now offer “connection draining”, such that instances that are about to be terminated are given a configured period of time to finish handling existing requests, but no new requests are routed to them. This makes scaling down as effortless as scaling up, if correctly configured.

Heterogeneous Environments

heterogeneous

So-far, both methods of scaling have involved either resizing or duplicating servers running the same software, and performing the same role. However, certain activities performed by your product may form a larger part of its performance bottleneck, such as expensive computations.  It may make sense to scale portions of your product separately from others, such that the spare capacity and scalability of one aspect may be controlled separately from that of another aspect.

This might be most easily achieved by separating the product into Services, and scaling the number of instances of each of those services separately to meet demand. Some services may share the same process, but others may run stand-alone, perhaps connected to the other services via a message bus, socket connection, or just by virtue of sharing the same database.

If certain services perform computationally-costly activities, you could assign them to compute-optimised instances and scale the number of instances based upon the size of the queue of waiting work, or the average turnaround time for tasks, or another metric you’d like to optimise.

Separating concerns in this manner becomes more cost-effective as the overall size of a product deployment increases. There is little point, initially, in doing this with a small deployment and everything should probably run in the same process, or at least one of several identical processes, at first.

DELAYING THE NEED TO SCALE

Sometimes, it is entirely possible to delay the need to scale. In web applications, if certain computations — particularly those involving just aggregation or simple processing before presentation — can be offloaded to the browser, without increasing bandwidth or affecting user experience, this can save server resources and means the server-side cost per user  is reduced. This, in-turn, delays or reduces the need to scale.

Where activities are not time-sensitive, scheduling  when they occur to coincide with times of lower traffic, can also reduce to server resources required at any given time, and delay (or entirely avoid) the need to scale.

SUMMARY

Software Scalability is clearly a huge topic. I hope this is a useful overview, particularly if you don’t come from a software background, and that it leads you to delve deeper into specific areas. At the very least, you might start building metrics into your product and use them to figure out when Scalability is likely to become a more pressing concern for you.