Affecting Other
-
Hosted VoIP Service
-
27/01/2020 14:23
- 18/02/2020 11:02
-
Last Updated 30/01/2020 17:22
To our VoIP customers,
We are currently (14:23hrs) receiving reports of issues with various VoIP platform feautes, including BLF, Call Waiting, and Call Transfers.
These are currently under investigation by our technical team, as well as our provider BT, alongside the VoIP platform vendor support.
This is being investigated as a matter of urgency and we will post updates as soon as we have any further information.
Kind Regards,
Your MTH Team
UPDATE: 14:45hrs
We are now getting reports that this issue is also affecting phone registrations intermittently; this may interrupt live calls, stop a phone receiving a call, and show "No Service" on the handset periodically.
Please be assured that this is now being increased to a High priority issue and escallated with various teams involved.
Further updates will follow as soon as we have information for you.
UPDATE: 15:38hrs
BT have identified the issue at play here; traffic is moving from Primary to Secondary Servers and during the sync it is lost. BT are now on a Tech Bridge with high level Cisco engineers to establish the root cause of this and work on a resolution.
For clarification, BT's Cloud voice platform is build on a multi-million pound Broadworkssoft infrastructure; Cisco aquired Broadworkssoft back in 2018 and this is why their world expert infrastructure and networking teams have been brought in on this.
UPDATE: 16:38hrs
We have now escalated this to a Critical Outage, due to the extreme length of service outage; MTH can only apologise for the impact this is having to all of your business'.
BT have updated us, however unfortunately nothing too much to report. Traffic is still bouncing between Application Servers (the entry point to the VoIP network) and remains a major problem and the cause of the issue all clients are facing.
All involved are actively working on a Technical Bridge with the platform provider (Cisco Broadworkssoft) however no work around has been identified as yet.
As soon as we have any hint of an update, we'll be in touch.
UPDATE: 17:08hrs
We appreciate everyone's patience here; it's not easy when you're experiencing this extent of downtime.
The update here is mainly the same as 16:38hrs; the Primary Application Server is in overload control, causing traffic to move to the Secondary Application Server. This secondary device is then trying to process this and dropping into overload itself and passing traffic back to the Primary Server.
Currently Cisco are working on a potential solution to resolve the routing issue; the alternative is to roll the platform back to a previous itterative version which should bring stability back, however this option would take a few hours to complete.
When we hear more, we will update everyone.
UPDATE: 19:20hrs
We've just received an update from BT:
- They're still working to fix the issue on their VoIP platform. The platform is currently experiencing a total loss of service; Cisco and Broadworkssoft senior engineers are working across the platform, restarting applications in an attempt to resore service.
- This is continued to be treated as a Major Critical issue, with the utmost urgency and all key personel, and relevant resources are engaged on this.
We fully expect to receive further updated throughout the evening until resolution has been reached; as always, as soon as we have anything, we'll update you here.
UPDATE: 20:20hrs
Since the last update at 19:20hrs, we’ve seen stable VoIP service from c. 19:30hrs; this has been carefully monitored and tested, and will continue to be throughout the evening.
Additionally, we’ve received the following from BT, Cisco and Broadworkssoft:
We have reinstated service on our VoIP platform. Testing and close monitoring will continue through the night.
We continue to treat this issue with the upmost urgency and all relevant resources are engaged.
We sincerely apologise for the inconvenience this has caused you and your customers
We are continuing to monitor and test the platform throughout this evening; we have been given a confirmed issue resolved status, however will complete our own testing before signing it off ourselves later this evening.
UPDATE: 22:30hrs
Following a further 2hrs of MTH testing and checking, and a successful 3hrs of stable service, we’re finally signing this VoIP platform issue off as Resolved.
We can only apologise for the massive inconvenience and impact this has had on all your business, and assure you the whole situation is being investigated at the highest level.
If anyone experiences any further issues following this resolution, please first of all reboot your phone handsets; if this doesn’t resolve the issue, please do contact our support team directly and we will assist and resolve as promptly as possible.
UPDATE: 09:00hrs 28/01/2020
Unfortunately we have been advised this morning that issues are still being seen across the platform, despite the overnight works lastnight; urgent investigations are still underway.
We have checked with various clients and service is seemingly OK for our clientbase, the platform is still in an "at risk" state.
Further updates will follow as soon as we have any information.
UPDATE: 09:35hrs
BT have advised that a large number of their customers are experiencing further issues already this morning, specifically making and receiving calls and phones de-registering from the platform. Senior BT engineers are continuing to work with Cisco and Broadworkssoft to isolate and resolve these issues.
Further update to be provided c. 10:30am. Please accept our sincere apologies for the impact this is causing.
UPDATE: 10:30hrs
BT Update:
We continue to investigate issues on the platform. While we have received reports of issues, not all customers are impacted. All relevant technical resource is present on a technical bridge.
We'll provide further updates by 12:00hrs.
UPDATE: 11:00hrs
Further information from BT is that they're looking into applying a new 'tactical fix' due to the overnight fix failing. They are also looking into the possibility of rolling back the Broadworks R22 upgrade.
As soon as we have any update regarding the intended course of action, this will be posted below.
UPDATE: 12:30hrs
BT Update:
We’re sorry for the continued disruption to service. We estimate this impact to be around 40%, with many of these customers able to make calls, but not use some of the key features. Over 1.5 million calls have been successfully made so far today. We of course recognise the criticality of these features to our customers and are working through a plan to resolve.
We believe the issue is due to the planned R22 upgrade which was deployed over the weekend and as a result we are building parallel plans to roll back to the previous version. This is likely to take around 8 hours, so have taken the decision to defer this until 8pm. At this point in time if we don’t have certainly on a resolution, which will resolve the issue in a loaded environment, we’ll execute our tried and tested roll back plans to ensure service is fully restored before 7:30 am tomorrow.
UPDATE: 15:30hrs
Early indications are that the issue appears to be resolved, however we are giving it more time before confirming. If anyone is having issues, please contact the MTH Support team.
UPDATE: 16:30hrs
While MTH clients haven't seen any issues through today, BT, Cisco and Broadworks have repoted a further influx of client issue reports, thus MTH are keeping this service notification open.
UPDATE: 17:00hrs
BT Update:
Due to the further impact experienced 16:30 decision has been made that a full rollback of the R22 upgrade will be completed this evening to resolve all outstanding issues.
Platform will revert platform back to as it was evening of Friday 24th
Rollback will commence at 17:30, no customer impact is expected during the rollback process. Rollback is estimated to take 8 hours and BTW engineers will be monitoring throughout the evening / early hours.
MTH have a scheduled call with senior engineers and directors at 07:30hrs to confirm platform status prior to the working day. Updates will be posted following this call.
UPDATE: 07:50hrs 29/01/2020
Work to roll back to the previous version of software has continued through the night, but at 2am the team encountered a number of issues. In order to ensure customers have service this morning the decision was taken to re-instate the newer version of operating firmware.
While the rollback continues, calls and services will be working normally although we are likely to see similar issues as Monday and yesterday. In parallel to the roll back, we have a team assembled to resolve these issues and implement a fix.
Business Zone is available but any orders will be queued and not progressed. Business Portal is unavailable during the roll back procedure. This is not a fault and is part of the process, and we currently anticipate this will be completed in the next 3 hours.
All necessary resource is available on the technical bridge as is support from vendors at the most senior level.
UPDATE: 09:35hrs
We can finally confirm some good progress is being made to revert the platform rollback to R21. Unfortunately this has resulted in the MTH VoIP portal currently being unavailable, however we expect this to be back online this afternoon.
Please be aware "call gapping" is in place now to alleviate any further customer impact; this could result in a small number of calls returning busy to control peak volumes and allow for complete resolution to carry on.
Updates to follow.
UPDATE: 10:10hrs
BT Update:
Work to re-instate the newer version of operating firmware continues and progress is being made. In order to ensure that service is maintained throughout the process it is taking longer than initially expected. As a result provision activities remain suspended and MTH VoIP Portal is unavailable. This is expected to remain for some hours yet and may not be resolved until later this afternoon.
In order to mitigate the issues we have seen over the last two days we have implemented some minimal call gapping. This will mean that at during our busiest periods some customers may get busy tone. Initial feedback this morning suggests that this approach is working.
In parallel to the roll back, we have a team assembled to resolve these issues and implement a fix.
All necessary resource is available on the technical bridge as is support from vendors at the most senior level.
Please accept our sincere apologies for the impact this is causing.
UPDATE: 11:40hrs
Latest from BT:
Work to re-instate the newer version of operating firmware continues. Provision activities remain suspended and MTH VoIP Portal is unavailable. We are making progress and we expect to restore both provision activities and MTH VoIP Portal by 14:00hrs.
We have implemented some minimal call gapping in order to mitigate the issues we have seen over the last two days. This will mean that during our busiest periods some customers may get busy tone. This approach continues to be effective.
In parallel to the roll back, we have a team assembled to resolve the issues with the firmware and implement a fix. This remains our top priority.
All necessary resource is available on the technical bridge as is support from vendors at the most senior level. We will update you by 3:00 p.m. or sooner if the situation changes.
UPDATE: 12:15hrs
We've been monitoring the situation today and everything has been continued stable for MTH and all it's clients; this has been the case since 19:30hrs on 27/01/2020.
Despite the continued stability for MTH and clients, we have been keeping ontop of the updates, and keeping everyone posted here due to the extensive issues being reported elsewhere. Work is still ongoing to completely resolve the platform issues; the MTH VoIP portal will be available later this afternoon.
Next update expected c. 15:00hrs
UPDATE: 14:10hrs
Latest from BT:
The newer version of operating firmware has now been re-instated. Provision activities have restarted and the MTH VoIP Portal is now available.
Minimal call gapping will continue to be applied in order to mitigate the issues we have seen over the last two days. This will mean that during our busiest periods some of your customers may get busy tone. This approach continues to be effective.
A team from across BT and our vendor continue working on resolving the issues with the firmware in order to implement a fix. This remains our top priority.
Close monitoring of the platform will continue. We will update you by 5 p.m. or sooner if the situation changes.
Please accept our sincere apologies for the impact this is causing.
UPDATE: 17:09hrs
Latest update from BT:
The platform has remained stable since the new firmware was re-instated. Provision orders have all progressed and Business Portal is fully restored. Close monitoring of the platform continues and none of the call drop and function issues experienced on Monday and Tuesday have been observed today.
The BT and our vendor team are now fully focussed on working to resolving the issues with the firmware; progress has been made but its too early to confirm a resolution timescale at this time.
Minimal call gapping will continue to be applied in order to mitigate the issues we have seen over the last two days. This will mean that during our busiest periods some of your customers may get busy tone. This approach continues to be effective.
We do not foresee any further issues with performance through the night period. We will provide a further update at 8 a.m. tomorrow. However if there is any change to this we will update you accordingly.
Please accept our sincere apologies for the impact this has caused you.
MTH will update further here as soon as we receive information; additionally, we have had zero clients report issues since 19:30hrs on 27/01/2020. These updates are for completeness.
UPDATE: 08:20hrs 30/01/2020
We have seen the platform remain stable overnight, without any issues seen or reported. The new version of the firmware is successfully loaded across all clusters with all the latest patches. Parameter changes were made lastnight on all clusters and these will be monitored closely throughout the day to ensure that they have had the expected impact.
Update from BT:
The BT and Cisco team are now fully focussed monitoring the platform following yesterday’s changes. While we are confident of the work carried out yesterday it is too early to state that the issue is resolved and we will not be making that statement today as we need to monitor stability over a longer period.
Minimal call gapping will continue to be applied. As we gain confidence this will be relaxed and we will keep you updated on this. This will mean that during our busiest periods some of your customers may get busy tone.
Clearly your customers experienced a lot of issues on Monday and Tuesday most of which should now be resolved. If they experience issues today can we please ask that you report these on a new ticket with all the necessary detail. This will allow us to prioritise these issues and work closely with our technical team to resolve.
We do not foresee any further issues but will provide a further update at 12 noon today. However if there is any change to this we will update you accordingly.
Please accept our sincere apologies for the impact this has caused you.
Updates will be posted below midday.
UPDATE: 12:07hrs
The platform has remained stable, with no issues processing calls. During the morning busy period (08:15 to 08:45hrs) the platform handled twice the call volume of yesterday without any issue.
We appreciate over the last few days there have been a number of sporadic issues, and service interruptions; here's the latest progress update:
• Voicemail – is not working correctly if the call is delivered via auto-attendant or a hunt group. We have identified the issue and a fix is being worked on now.
• SIM Ring – is not working correctly and only ringing on one phone. Our technical team are working on this.
• Transferring Calls – if delivered via auto attendant the call was failing on transfer. This is now resolved.
• BLF (Busy Lamp Field) – this was not working correctly. This is now resolved.
• Call Waiting – Despite being disabled it was still alerting to a second call. This is now resolved.
• Phones de-registering – Phones were de-registering and calls were dropped. This is now resolved.
Both the outstanding issues are being worked on with utmost priority.
Further updates will be provided at 13:00hrs today.
UPDATE: 13:09hrs
The platform has remained stable without issues; call processing has been flowing continually all day.
Following the outstanding issue update earlier, the two outstanding issues are as below:
• Voicemail – is not working correctly if the call is delivered via auto-attendant or a hunt group. We have a proposed fix which is currently in test. We expect to deploy this today.
• SIM Ring – is not working correctly and only ringing on one phone. This is currently with our vendor who are working on a fix.
Further updates will be posted at the end of today, c. 17:00hrs.
UPDATE: 14:30hrs
In addition to the issues advised above, an issue has also been identified with provisioning of new devices to the platform. Initial indications are that this is only impacting companies built before September 2019
Details from BT:
The platform is receiving the requests to build device configurations but is currently very slow to respond. It will appear to customers that the device is there in the portal however the background configuration is not and handsets will not register. Some are getting "no config" error on the display.
We have been advised that the platform should eventually catchup, however BT and Cisco are working on the cause of the build up of requests and slow response. We do not have an ETA on this fix, but will notify as soon as posisble.
UPDATE PENDING