Simply Adding a Second Network to Fill Coverage Gaps Won't Fix Your Downtime Problems


By Ged Robinson, Regional Director, Goodmill Systems

Everything we rely on will fail. Eventually. And sometimes

This is a simple truth which we in the critical technology community don't acknowledge enough.

For us at Goodmill Systems this fact is our primary motivator. We know that when it comes to providing critical grade wireless broadband for users who can't tolerate downtime the only real game in town is to use multiple, independent networks, together in parallel.

Like many things this is easier said than done, but Goodmill knows how.

When it comes to network availability there are a couple of main reasons why a network may be up, or down, at any given time or in any given place:

  • Resilience: How reliable is it? How often does it fail?
  • Coverage: does it give service where you need it?

Unfortunately, existing networks, even ones engineered and designed for the emergency services, don't reach 100% in either of these metrics. They fail and are unavailable sometimes, and none of them provide 100% coverage.

Major failures tend to be high-profile and widely reported, and there have been some very noteworthy examples of public network failures in the UK alone in the last couple of years.

Coverage is less easily determined and more contested as a metric. For example, when we see a figure of "95% coverage" are we talking about 95% of population? Or 95% of geography?

If we are the emergency services, we are most definitely interested in geography and Goodmill's own tests and measurements confirm what we know as a matter of common sense - that no single broadband network gives the levels of coverage which police officers, fire officers and paramedics need.

It is true that public networks have improved hugely since they first appeared; they are now, in general, very reliable. It is also true, however, that they are not at the grade expected by these users, and that pushing them further in that direction gets exponentially more expensive and difficult.The further we push them, the more expensive it gets.

Improving from four 9’s to five 9’s is a lot more expensive and difficult than going from three 9’s to four 9’s.

We at Goodmill strongly believe that the future of critical communications lies in wireless broadband networks, and more specifically that it lies in combining existing networks together to address the coverage and resilience challenges.

Let's look at some actual data from some testing we have performed.

This was a drive test in and around the city of Newcastle in the North East of England. The total drive was around 2 hrs 15 mins and was confined to the urban and suburban areas of the city (we have other data sets from other routes which is all consistent with the findings from this one).

Simultaneous network measurements were made for two of the most popular UK cellular networks, which we will call simply "network 1" and "network 2" here.

In the maps below green and red indicate the areas of coverage versus non-coverage.We can easily see that it's mostly the case that where network 1 was down, network 2 was up; and vice-versa.

It is this difference in coverage gaps which Goodmill Systems solution exploits.

The Goodmill w24 router in the vehicle connects to both networks simultaneously. In this simple test we configured the solution to prefer network 1 if it was available, or else to use network 2.

Both networks were being continuously monitored and logged in detail.

When we analyse the numbers from the logs, we can produce some metrics and extrapolate to a 24-hour period:

Based just on the above route and coverage areas & gaps:

  • network 1 would cause us about 28 minutes downtime per day
  • network 2 would cause us about 47 minutes downtime per day
  • We might predict that if we combined their coverage areas, we would have enough coverage to reduce that downtime to about 2 minutes per day

This sounds impressive and important. And it’s pretty easy to understand. But it's not the end of the story.

Each time you switch between network 1 and network 2 to stay in coverage that switching event itself is prone to cause downtime.

We need to consider a little about networking and how mobile data networks are built. Each mobile data network is its own network. Separate from the others, and with its own configuration and physical connections to the rest of the connected world.

Let's consider a practical case; we have a Police Traffic car engaged in a pursuit. The Control Room is in command of the pursuit and they are watching live video from a camera on the Traffic car dashboard.

The video stream is at first being transported over network 1, but then the pursuit moves into an area with poor or no coverage from network 1, and we need to switch the stream to network 2.

If we don't do something to protect the video stream during that switch it will almost certainly be badly interrupted and will probably drop. The equipment connected to network 2 doesn't know anything about that ongoing video stream and when it stops coming from network 1 and suddenly appears to be coming from network 2 it will not be a valid stream, and it will not reach the Control Room. The stream will have to be re-established and that means the video solution has to take care of that re-establishment. In the worst case that may even be a manual task, and in the typical case we will have many seconds of interruption in what the Control Room in command of that pursuit will see.

A critical event might be completed missed during an interruption.

Worse still, in practice we know that most traffic coming across mobile networks is being sent in Virtual Private Network (VPN) tunnels for security reasons. This amplifies the problem. Whilst there are some kinds of connection which might survive the switching event described above, VPNs almost certainly will not. VPNs are very fragile networking connections, and this is, at least partially, the way they are designed. Data suddenly switching - suddenly coming from a different path - is regarded as a suspicious event from the point of view of network security, and typically a VPN server will not accept such a switch and will drop that connection. The VPN tunnel itself will need to be re-established before any actual traffic like our video stream can flow. So, in practical cases, when we switch from network 1 to network 2, our VPN will collapse and need to come back up, and only then our video stream can try to re-establish.

It's clear that whilst we want to switch between different networks in order to get enhanced network coverage, it's not a trivial thing to do this smoothly and elegantly, without interruption.

To put some numbers on it using the drive test above:

In the approximate 2-hr 15-minute drive it was necessary to switch from one network to another 42 times.

If we assume a reconnection time for the traffic stream (i.e., VPN re-establishment plus video stream re-establishment) of only 5 seconds this would mean 3-minutes 30-seconds of downtime just because of network switching.

There was continuous coverage - we were never totally out of coverage from both networks - but switching between the two will disrupt ongoing connections for 3 & 1/2 minutes. That's over 2.5% of the drive time, and extrapolated to a 24-hour period this would mean almost 37 minutes. In this case we would have been better off just using network 1 alone; it caused less downtime than this even with its coverage gaps!

(By the way, 5 seconds is a very quick re-establishment time. Something nearer 10 seconds is more realistic.At a minimum)

Goodmill solves these problems.The Goodmill solution includes sophisticated mechanisms to protect ongoing traffic streams - even VPNs - from the effects of network switching.

We can combine up to 4 independent mobile broadband networks together in this way, switching between them freely with negligible impact on ongoing data streams.

In this test drive the Goodmill Solution took two networks with 98.04% and 96.69% availabilities and combined them into one connectivity pipe with 99.86% availability.

Put another way that would be an approximate 94% reduction in connection downtime - from 47 minutes to 2 minutes, when compared with network 2.

The takeaway of this is that using multiple networks in critical environments is not trivial to do well. If all we do is bluntly switch between networks when we move in and out of coverage areas, we will not improve the situation.

Our experience and our analysis show that there are no gains - and one can actually make the situation worse - by doing multi-network broadband incorrectly.

Goodmill Systems multi-network broadband solution is designed and built from the ground-up for the needs and expectations of the most demanding critical users.

We are already delivering the uptime and reliability benefits described above to police, fire, ambulance, military and business-critical users in many countries. This is the future of critical communications, and as William Gibson said, "The future is already here — it's just not very evenly distributed".

Goodmill Systems can help you to rollout your mobile broadband services with confidence because we know that for critical users there is no substitute for being "Always Online".