Scaling Software Products, in a Nutshell: Why, When and How

Startups building software products tend to think about scalability as an after thought, quite understandably. There are far more pressing things to focus on first, like identifying a product that solves a problem, for a paying and addressable market, delivering value to that market, and then collecting revenue from them for doing it. All of that must take centre stage, at least initially, particularly if you’re following a lean approach to getting started and are launching to validate an idea with an MVP. (Hint: Too much thought about scalability probably violates the “M” for minimum).

At an early point in the lifecycle of a product however, someone inevitably asks “Does it scale?” — Meaning, will the product continue to work if we have 10 times, or 100 times, or 1,000 times the current number of simultaneous users? And, more interestingly… at what point will it start to fail?

Not surprisingly, the initial answer is often “I don’t know”. And this is a concern.

So here, in a digestible nugget, is what I’ve learned about the huge topic of Software Scalability in 20 years as a software engineer, so-far (I’m sure I have much more to learn). I’ve focussed mainly on server-side scalability of web applications, and particularly those deployed on Amazon Web Services (AWS) or Heroku (or similar) here, as that’s my area of expertise and close to what most startups are looking to scale. But many of the ideas here also apply to other types of software product.


In two words: Customer Experience.

If your customers have a bad experience when using your product — a slow response, a timeout, an error message — they will, in all likelihood, look elsewhere. With poor scalability, you can literally lose customers, often before you’ve gained them.

The basic — and often overlooked — software engineering fact that drives the need to consider scalability is that what works well in your Development environment — usually on your laptop — and seems to do similarly well when initially deployed, perhaps to AWS or Heroku with very little traffic — and by traffic, I mean users — may not work so well when you publish your first press release, get mentioned on Hacker News, or in the days after you give your first kick-ass presentation and everyone flocks to your site to find out more. These are all happy events… but only if your product scales to meet the increased demand, which naturally requires increased resources.

Being able to offer every visiting user a reasonably good experience of your product  is as key to success as figuring out what your product should be. A great product that is slow to respond is unappealing at best. A product that is hidden behind a timeout message is essentially worthless.


Ideally, just before you need to.

Metrics – Just like forecasting the weather, gathering and using metrics is the only real way to tell how your product is doing, whether it is handling the current load and to predict where the limit and the degradation in user experience might begin to occur.

Instrumenting your product code (by capturing your own timing stats, perhaps using Aspect-Oriented Programming or a third-party library), measuring throughput (hits/sec, average response time, etc) and then capturing and aggregating those metrics using a product such as New Relic, or building your own performance dashboard, are all great ways to expose the current performance  of your product in near real time.

Spare Capacity – Metrics only show how your product is currently doing. What you’re more interested in is the spare capacity: How much more traffic you can handle before problems start to occur. If you are comfortably handling 1,000 requests/minute right now with a decent average response time and customer experience, but you aren’t sure whether the 1,001st request will start to cause timeouts, then you know nothing.

This is where Performance Testing, in a separate but (crucially) production-like  environment, lets you figure out by experimentation where the limits of your system lie. Regularly performance testing and benchmarking  your product, graphing the results and determining at what point a single server begins to give an unacceptably-degraded user experience, lets you forecast the capacity and breaking point of multiple servers handling the load between them. This, in-turn, gives you a picture of the spare capacity you currently have. Even better, baking all of this into your product metrics, and reporting a percentage capacity remaining, lets you know how close to the need to scale you currently are.

It is worth saying that any such notion of capacity needs regularly reviewing, as changes to your product, the hardware it is deployed on, or other random factors can change capacity in unforeseen ways.

Automation – Metrics alone aren’t enough. It’s no use having great metrics if, at 3am when your product gets mentioned on Hacker News, you don’t notice the hits escalating, average response times dropping and the number of timeouts going through the roof. Automation and scaling policies  allow you to define conditions under which you’d like to scale up, even when you aren’t around to make the decision yourself. Unless you plan on watching them 24/7, or getting out of bed to answer the Pager Duty call, your metrics, and in-turn their notion of dwindling spare capacity, must trigger some sort of scaling action.

Cycles – When to scale is a never-ending “when”, and it is often cyclical: As well as wanting to scale up to meet increasing demand, you might want to scale back down again  in quiet times to avoid wasting resources. Any scaling policy should also define how much spare capacity is too much, if only so your AWS bill doesn’t hit the roof and wipe out your profit.

Metrics, coupled with automation tell you when to scale, but not how.


Scaling UP


Scaling up (often called Vertical Scaling) involves running your product on a “bigger” server, meaning a combination of more processing power (CPUs / cores), and/or more memory. How much more of each you require is determined by which limits you are hitting. If your product is CPU-bound when under load, you’ll need more processing power. If it hits memory limits, then add more memory. Usually, a mix of both is required to scale up effectively.

Scaling up is easy to accomplish because it requires no architectural changes to your software. You just deploy your product to a larger instance (on AWS or Heroku), or buy a bigger server if you are hosting the product yourself. The only problem is that it needs to be a pre-planned affair, unless you can seamlessly upgrade your instance as the need for additional resource grows.

Scaling up should, however, be your first move. You should be running your product on a box that will — on its own — meet projected capacity for a while without needing to scale further.

Scaling OUT


Your next move is to scale out (often called Horizontal Scaling). This involves distributing traffic across multiple servers. Hosting providers such as AWS provide load balancing technology to achieve this, spreading the load across however many servers you currently have running.

Scaling out introduces some architectural challenges but, unlike scaling up, it can be adaptive: You just add additional resources to meet demand, and remove them again when you no-longer need that capacity (though there are challenges in scaling down too; see later).

One of the main challenges with scaling out is that the state  of your application is no-longer held in memory within one server, and is distributed between them. If you are caching certain information, for performance reasons, one server’s view of that information may be out-of-date if a change is applied via another server. Simple solutions involve techniques such as “sticky sessions”, whereby individual users are always load balanced to the same server instance, whilst it is running. AWS offer this as an easily-configurable option. But the state of your application may not be easily subdivided by user. You may need to use your database as a place to update shared data transactionally, such that race conditions don’t cause you to lose or mis-calculate data.

Other things to consider are the use of a message bus  (such as RabbitMQ or Amazon’s SQS), such that information may be shared between servers; even something as simple as telling all instances when to invalidate specific data in their cache, can be easy to implement via messaging.

The architectural challenges introduced by scaling out are beyond the scope of this article, but the benefits of tackling them and being able to add spare capacity whenever you want are certainly worth it.

Incidentally, I mentioned scaling down being a challenge. You can’t simply remove spare capacity from behind a load balancer, because the instances may be in the process of handling a request from a user, resulting in an error or timeout. AWS now offer “connection draining”, such that instances that are about to be terminated are given a configured period of time to finish handling existing requests, but no new requests are routed to them. This makes scaling down as effortless as scaling up, if correctly configured.

Heterogeneous Environments


So-far, both methods of scaling have involved either resizing or duplicating servers running the same software, and performing the same role. However, certain activities performed by your product may form a larger part of its performance bottleneck, such as expensive computations.  It may make sense to scale portions of your product separately from others, such that the spare capacity and scalability of one aspect may be controlled separately from that of another aspect.

This might be most easily achieved by separating the product into Services, and scaling the number of instances of each of those services separately to meet demand. Some services may share the same process, but others may run stand-alone, perhaps connected to the other services via a message bus, socket connection, or just by virtue of sharing the same database.

If certain services perform computationally-costly activities, you could assign them to compute-optimised instances and scale the number of instances based upon the size of the queue of waiting work, or the average turnaround time for tasks, or another metric you’d like to optimise.

Separating concerns in this manner becomes more cost-effective as the overall size of a product deployment increases. There is little point, initially, in doing this with a small deployment and everything should probably run in the same process, or at least one of several identical processes, at first.


Sometimes, it is entirely possible to delay the need to scale. In web applications, if certain computations — particularly those involving just aggregation or simple processing before presentation — can be offloaded to the browser, without increasing bandwidth or affecting user experience, this can save server resources and means the server-side cost per user  is reduced. This, in-turn, delays or reduces the need to scale.

Where activities are not time-sensitive, scheduling  when they occur to coincide with times of lower traffic, can also reduce to server resources required at any given time, and delay (or entirely avoid) the need to scale.


Software Scalability is clearly a huge topic. I hope this is a useful overview, particularly if you don’t come from a software background, and that it leads you to delve deeper into specific areas. At the very least, you might start building metrics into your product and use them to figure out when Scalability is likely to become a more pressing concern for you.

Never mind Big Data, what about Bad Data?

baddataWith all this talk of “Big Data” — extracting meaning and patterns from vast amounts of data — we often forget to be concerned about bad data: Data about us that is incorrect, either slightly or catastrophically. It’s a real concern because correcting bad data can be a nightmare, and one that the companies curating  all of that data about us seem ill-prepared and often unwilling to tackle.

I’ve had to attempt to correct bad data about me, or my environment, quite a few times, and each was something of a battle.

When I moved into a flat I purchased in 1997, there were 3 flats in the building. Two of the flats (unfortunately mine, and another) were being billed for the wrong gas meter. No-one had ever done the check that I did: turn the supply off at the meter, and confirm the supply in your flat is affected. It took almost 1 year to correct what the gas company saw as “static data” (i.e. unchanging, and therefore assumed to be correct). They seemed ill-prepared to validate the assumptions they made when gathering that data originally, and it took help from the regulatory body to get it corrected.

I’ve had lesser problems, but still annoying ones: I live by myself in a flat in London, with neighbours in a flat below. I get a discount on my Council Tax for living alone. But, each year, the Council sends me a sternly worded letter saying that the Electoral Roll data leads them to believe my downstairs neighbours live with me, and that I should have my discount withdrawn. The data on how the flats are named is clearly unclean. I won’t even get into the fact that they bill me the rate for owning the garden out back, even though my neighbours actually own that; we saw getting the building re-banded as too much of an additional problem. Handling incorrect data doesn’t fall into any of the standard procedures that their call centre staff can deal with, so you invariably have to wait to have your case passed to the one person they grant the required data-changing powers to, and hope they can help. Meanwhile, of-course, they continue to bill you, and chase you, as if the data was correct.

I’ve used the same contents insurance provider for about 10 years. Each year, they slightly inflate the policy to include new factors, as most insurers will by default if you don’t query them. And each year I have to call and tell them that I don’t need insurance for contents in a shed, what with being on an upper floor and having no shed. They seem to find such corrections hard, but adding the incorrect data in the first place seems to be a part of their regular procedures.

One of the responsibilities of gathering this data on us should be helping us to correct it when it is found to be incorrect, and in a way that puts the onus on the company, not the individual, to validate the data they hold. Until then, I see every letter or email containing an incorrect fact as the potential start of yet another bad data saga. It really shouldn’t need to be that painful.

Startup Founder Stories are Signposts to Success, not Prescriptions for it

changing-trajectoryThe stories of successful startup founders collectively point at — but don’t reliably prescribe — the ingredients for our own startup success.

There’s a particular genre of books in the startup world, which mainly consist of the “How I Made It”  accounts of successful startup founders. One per chapter, and usually not much else. I find these books interesting to a point, but it’s like reading an account of how someone created a great painting… You’re left feeling you’d much rather to go and create your own painting than read about theirs.

There is one fundamental flaw in seeing these accounts as anything other than inspiration and signposts: They are accounts of how someone made it with that  product, in that  market, with those  co-founders and — most crucially — at that  specific point in time. Almost all of those ingredients will be fundamentally different for our own startup story. Even just the passage of a few months of time changes things immensely.

Sure, there’s much to be taken away in terms of how and why they made specific decisions then. But the decisions themselves are less relevant than their thought process  at the time. When making our own decisions, the factors to be taken into account will be different and the best decision may be the opposite one from theirs. The specifics of their decisions may in-fact blind us, when what we should be trying to distil are the principles of good business  and good startup commonsense  behind them.

So I read these accounts and try to figure out, in aggregate, the good advice and principles within them. I’m less interested in the specifics now, because they are a matter of history. It is far more interesting to figure out what I can do that might influence my own startup outcomes in the future. Historic accounts help, but only to a point.


Your Startup Environment May Be Killing Innovation

environmentThe ideas, concepts and insights that startups — particularly those building a software product — deal with on a daily basis are often abstract. They can be somewhat intangible and fleeting until afforded the time to distil, define, discuss and pin them down. To make them real.

The innovation sought by startups is usually a by-product of depth  and duration: How far they can dive into ideas and concepts, and how long they get to spend there with them. This is true not only for solopreneurs, but also for startups of any size working, sharing ideas and building a product together.

But we’re all human… even startup superheroes. Our daily ability to conquer the inertia  required to start, and “stay started”, on such difficult work is finite. As is our tolerance for environmental issues that prevent us from working effectively. Friction and obstacles to working efficiently wear down this tolerance throughout the day, and we reach a point where we can no-longer get started. Beyond this point, the only way back is rest, taking a break and, perhaps, just waiting until tomorrow. No amount of coffee or incentives can change that human reality. This is also why working too many hours often turns out to be counterproductive.

Anything that startups can do to delay staff from exhausting their tolerance and reaching the point beyond which deeper work just isn’t going to happen each day, surely serves to boost innovation in environments that might otherwise be stifling it.

Noise & Interruptions – Background noise and interruptions — to an individual or groups — cause us to continually re-surface. Each time we start working again, we have to return to where we left off, with depleted tolerance for interruptions. There have been many studies on the cost of this context switching. It kills the depth and time spent at depth which are crucial to innovation. The work we achieve in an un-interrupted hour will be way more valuable than the work achieved in 6 ten-minute segments.

Considerate & Scheduled Interactions – Interactions within a startup are necessary and a valuable part of company culture, but they should either be scheduled, saved up to be tackled in one go, or should follow some simple guidelines about what’s urgent versus what can be more efficient when postponed. Spending too long in someone else’s headspace is a sure way never to get time in your own. This may require some education about priorities, at all levels. After all, what one person considers urgent may in-fact be far less urgent when the cost of the interruption is considered company-wide.

hat / headphones rule (or other visible signal) can help, and staff can use these to indicate that they are “in the zone”, allowing colleagues to choose whether to save an interruption for later. Any such rule must suggest that a reasonable amount of time should also be spent not  in the zone, allowing discussion and exchanges to be a little more fluid.

Headphones Don’t Conquer All – Having said the above, headphones aren’t always the answer to noise or interruptions and you can only spend so long with headphones on before your choice of music, or simply your physical tolerance for wearing headphones, gets on your nerves. The environment itself, minus headphones, needs to be sufficiently conducive to focus and innovation at least a fair amount of the time. It is simply not comfortable to spend 6+ hours per day wearing headphones, no-matter how cool they may make you look.

Places to Meet, Whenever – Companies of more than one person need places to meet, discuss and share. Whiteboards or flipcharts are essential, as is the ability to leave them up between meetings, such that unexpected conversations about the content on them can develop over time. Meeting facilities need to be available, where possible, without having to book or schedule them. Having even just a corner where each team can gather around a whiteboard on-demand, whenever they feel like it, can make a huge difference. The ad-hoc need to discuss an idea, a task someone is working on, or just to have the daily stand-up can arise at any time. Being blocked out of a meeting room because of other meetings can kill this essential interaction and the innovation it feeds.

Chairs and Desks – There is a certain amount of friction to physical comfort caused by the ergonomics of desks, chairs, lighting and other environmental factors. Staff need to want to spend a while in the company environment, rather than wanting to flee from it to find somewhere more comfortable. A bad physical environment can be as detrimental to innovation as constant interruptions.

Lower-Cost Facilities? – Communal spaces and open-plan offices can work well, particularly from a financial standpoint, but the ways in which they affect innovation need taking seriously. There is a real cost for the use of those seemingly-cheaper facilities if they mean that noise, interruptions or simply physical comfort can’t be effectively tackled.

Money Needn’t Be An Issue – Having said that, startups needn’t have a huge budget in order to address many of these issues, and it is possible to be frugal and clever with some simple physical choices: Different chairs, a few rules, a bit of thoughtful prioritising and scheduling.

Taking these environmental blockers to innovation seriously and throwing a little thought, if not money, at them can surely help startups to reap rewards and fuel innovation.

Twitter isn’t about Followers, it’s about finding your Tribe

Friends & FamilyPretty much anyone using Twitter, who arrived with the notion that it is all about micro-blogging and information sharing, quickly became aware of the secondary — and much more measurable — activity going on there… the quest for followers.

Hardly a day goes by without me being followed-then-unfollowed  by a bunch of accounts hoping to use that technique as a way to gain followers of their own, or without a “follow me and I’ll follow you back” message.

I see this single-minded quest for followers rather like collecting pebbles: It’s nice, if you like pebbles, is certainly something you can measure and compare, but otherwise rather a dead-ended activity. You have to want to do something with the pebbles.

Many folks have written about how to gain Twitter followers, the main advice being something along the lines of: “Stick to one topic, and tweet about it regularly”. This is understandable, as people will follow accounts that provide regular  content on a specific, predictable, requested topic. They tend to unfollow when they get a lot of irrelevant or unwanted content from an account.

I often do things that lose me Twitter followers, and several people have told me this bluntly. I mainly tweet about technology and software using posts from this blog and retweets from other similar accounts. But occasionally, I mention other aspects of my life, such as UK-specific politics, LGBT issues, or things I’ve watched on TV. I’m told this pollutes my tweets and that it loses me followers. Mentioning LGBT topics certainly does.

So why doesn’t this (albeit temporary) loss of Twitter followers worry me? Because I see Twitter as more than a quest for followers. It’s a way to find your tribe. I use that term to mean like-minded people rather than simple followers of specific topics.

Therefore, occasionally I lose a few followers. They simply weren’t my tribe. Something I tweeted about didn’t resonate with them. They followed to hear about one topic, or on the assumption that I’m this type of person, or that type of person… and then unfollowed when they delved deeper. Just as in the real world, this is ok.

What matters more is that, over time, I’m left with followers who genuinely want to hear what I tweet… all aspects of it… and with whom I enjoy sharing content. I usually also genuinely want to follow them back, because sharing is two-way. After all, I am not a TV channel or a radio station.

My tweets reflect more completely who I am and what I’m about. Some won’t like bits of it but, over time, I find my tribe. And this, more than the quest for followers, is what Twitter is about for me.