Dear all,
Following the cluster failures on Friday we have attempted to replicate the issue over the weekend by loading up all the compute nodes within the cluster to the highest level possible. So far we have not seen any further failures. As such we will shortly return the cluster to service but as the root cause of the original failures are still unknown we may see further sudden and unexpected losses of service. If you submit jobs to the cluster please keep careful track of which jobs you have submitted and ensure that you validate your jobs have run successfully before using any of the output.
HP Enterprise (who manufacture and support our hardware) are currently investigating the cause of the failures and we hope to have a full resolution as quickly as possible.
Best wishes, Paul.