Dear all,
Following the cluster failures on Friday we have attempted to replicate
the issue over the weekend by loading up all the compute nodes within
the cluster to the highest level possible. So far we have not seen any
further failures. As such we will shortly return the cluster to service
but as the root cause of the original failures are still unknown we may
see further sudden and unexpected losses of service. If you submit jobs
to the cluster please keep careful track of which jobs you have
submitted and ensure that you validate your jobs have run successfully
before using any of the output.
HP Enterprise (who manufacture and support our hardware) are currently
investigating the cause of the failures and we hope to have a full
resolution as quickly as possible.
Best wishes,
Paul.
--
Paul Elliott, UNIX Systems Administrator
York Neuroimaging Centre (YNiC), University of York
Show replies by date