Why maintenance and monitoring is vital
How vital is a well-defined maintenance and monitoring programme?
On 4 October social media imploded. Facebook, Instagram and Whatsapp was down around the world for around 6 hours leading to the biggest downtime ever recorded by Downdetector at 10.6million reports of problems.
Users flocked to Twitter (ironically) to post conspiracy theories of a massive hack and Facebook even had to, embarrassingly, turn to their Twitter accounts to apologise to users.
Was the downtime as a result of a disgruntled employee or hacking, as has been the case in some other recent outages?
No. The answer was it was simply poor maintenance processes that caused the outage.
Facebook confirmed that the outage was as a result of a routine maintenance job where a command was issued that took down all the connections to its network. So not foul play but equally as devastating for the company.
The consequences of the outage were wide-reaching:
- Users around the world were unable to keep in contact with friends and loved-ones, something that is a necessity in current global circumstances.
- Companies who rely on social-media for commerce were unable to trade for 6 hours and lost vital income.
- Facebook lost both revenue and share price because of the outage. An estimated $500,000 per hour that it was down in lost revenue and $6bn was wiped out of Mark Zuckerberg’s personal fortune due to the share price diving.
- More embarrassingly, employees who were needed to fix the problem were unable to gain access to the building due to their ID cards being linked to the systems that were out.
So all in all an expensive mistake.
The consequences of poor maintenance are not limited to large organisations like Facebook. Any unplanned downtime can have devastating effects on a business. The average cost of unplanned downtime is $216,000 per hour and in some industries can be much higher.
Poor maintenance, or patching is one of the major causes of downtime for companies but this is not the only reason for having a robust maintenance or patch management process.
Patch Management improves security of your systems. With 60% of data breaches being caused by poor patching processes, where vulnerabilities are known but not patched this is an own-goal for organisations. The Microsoft Server hack in 2021 is a case in point. 250,000 exchange servers fell victim to this hack but in the 20 days following the security patch being released, still 8% of servers had not been patched.
Compliance is another reason for having a robust patch management process. In the event of a data breach an organisation must demonstrate it’s “reasonable compliance” with GDPR/DPA 2018 and not having patched known system vulnerabilities will struggle to demonstrate this compliance.
It is not just for security and compliance. Having the latest releases of software will improve system uptime as it will enable a more stable system. End users will also have the latest version of the programmes with the latest functionality to improve productivity and collaboration.
So, as vital as patch management is, 70% of companies lack awareness of when maintenance is due on their devices but how can an organisation create a robust patch management process?
The first step is to know what devices you have in the organisation and what updates need to be installed on each device. An asset registry is the best tool to do this, but it must be kept up to date as part of your change management processes to ensure all live devices are in your asset registry and retired devices are removed.
The next step is to decide when the devices will be patched. The NCSC advise that patches that are labelled as “critical “ or “high risk” should be patched within 14 days of release. In most cases “critical” or “high risk” patches are known security vulnerabilities so in reality should be patched much sooner – within hours of release if possible. As there may be downtime or a performance hit when patches are being installed it is important that this is planned for when the majority of users will not be using the system and when important processes are not taking place. For most companies this will mean patching out of hours.
You also need to decide who will complete the patching. It needs to be someone with enough knowledge to implement the patches without supervision as it will be being completed out of hours. The person also needs to be able to work the out of hours process and having a backup in place for if this person is on holiday/ absent.
As patching is taking place out of hours, there needs to be a fall-back or escalation process for if the patches are not successfully implemented. Added to this, end-users need to be informed of the downtime and what actions they need to take (ie leaving machines switched on, backing up data etc)
All patches must be tested to ensure they have been successfully implemented. It is pointless to leave it until users start work the next day to realise that patch management has caused other issues within the network.
What process will be followed if a device is missed from the schedule either because the end-user didn’t follow the instructions or there was an issue in applying the patch, and how will you determine which devices were missed? This is a key activity as one unpatched device can leave your whole network vulnerable.
All of this can be a costly and time-consuming process, which is often why it is missed, but it does not need to be.
At Giotech, we can assist with a robust monitoring and maintenance system for your network – proactively discovering issues on the network equipment before this causes downtime and resolving hardware alerts, ensuring successful backups are completed and monitoring the performance of your system
We also take the strain out of your patch management process by completing a comprehensive, out of hours, patch management process across your network and applications increasing your security and demonstrating your compliance.