eScience
All systems are operational
UCloud: DeiC Interactive HPCOperational
 ↳ Provider: SDU (Kubernetes)Operational
 ↳ Provider: AAU (OpenStack)Operational
Hippo: DeiC Large Memory HPCOperational

Past Events

Dec 19, 2023 07:00: The AAU OpenStack cluster will undergo scheduled maintenance and running VMs will be restarted as part of the process.
Dec 5, 2023 14:30: We are deploying an update to the user-interface to resolve an issue related to logs not appearing.
12:44: The issue with Syncthing has been resolved. You may need to manually restart the instance by visiting the "Manage synchronization" page.
10:17: We are currently investigating an issue with duplicate Syncthing instances.
Dec 4, 2023 15:27: The second stage of the upgrade has been completed. We will be monitoring the system over the next days to make sure the new updates are working as intended.
11:54: Our storage system has been upgraded. We will continue with the second part of the upgrade shortly.
07:37: We are taking UCloud offline to begin scheduled maintenance.
07:00: We will perform a major maintenance operation on the UCloud system. The entire system is expected to be down all day and all jobs running at the "DeiC Interactive HPC (SDU)" provider will be terminated in the morning.
Nov 24, 2023 07:56: Maintenance on the AAU OpenStack cluster is still ongoing it might affect running VMs.
Nov 19, 2023 13:42: Everything should now be back online. We are monitoring the systems for possible anomalies.
12:15: Maintenance finished early and we are now bringing systems back online.
07:12: We will start taking the systems offline shortly.
07:00: Unfortunately, we will have to temporarily shut down both Hippo and the UCloud platform on Sunday (19/11/23). The services will be unavailable from the morning until sometime in the afternoon. During this time it will not be possible to access the services, our status page, or any of the jobs. This will also shut down all jobs running on the "DeiC Interactive HPC (SDU)" provider on UCloud.
Nov 15, 2023 17:19: Around 10:12 this morning there was a major incident in one of the SDU server rooms, which resulted a complete loss of power. All our storage systems and our internet connectivity was down, resulting in all services being inaccessible. Around 16:20 power was restored and services slowly came back online.
Nov 9, 2023 07:00: There will be a maintenance operation on the OpenStack cluster hosting the VMs accessible via UCloud (the uc-xxx machine types). During the maintenance window it might not be possible to start new VMs, but already running machines will not be affected.
Oct 7, 2023 12:27: We have deployed a fix to Hippo and we are monitoring the behavior of the system. Unfortunately a number of jobs have failed due to the issue.
11:26: There appear to be an issue with the Hippo system after the update. We are looking into the problem. For now all nodes have been put in maintenance mode.
Oct 6, 2023 12:08: Hippo is now handling jobs again. The new nodes will continue to be reserved for a day or two to run some more benchmarks.
10:50: The Hippo frontend is now accessible again. The maintenance reservation is expected be removed this afternoon.
Oct 4, 2023 08:23: The Hippo cluster is now down for maintenance. The system will be updated and expanded during the next few days. The system is expected to be back online again no later than Monday, October 9th.
Sep 26, 2023 09:40: Today all virtual machines will temporarily be turned off to install security updates on the physical hosts. UCloud virtual machines will automatically be restarted after the service is completed.
Sep 14, 2023 09:33: The Hippo frontend node had to be rebooted due to a kernel crash.
Aug 16, 2023 10:17: Maintenance is complete.
08:26: We are performing minor hardware maintenance this morning. This will affect Syncthing jobs on UCloud and the Hippo frontend will be temporarily offline. Everything is expected to be running again around 10am.
Jul 25, 2023 15:34: All systems should be operational again. This was again caused by an external event, which unfortunately is out of our hands.
15:12: We just had another power fluctuation in the server room, which caused most machines to reboot. We are working on restoring services.
Jun 28, 2023 11:47: Job status on UCloud should now be in sync with the provider. Public links and IPs which were bound to failed jobs have been released and should be available for use again.
11:20: Hippo is now back online.
11:19: UCloud is now back online. We are aware that several user jobs were shutdown in the process. This status might not be correctly displayed in UCloud at the moment. We are working on restoring the correct status in the interface.
11:01: UCloud and Hippo is currently down. Our current understanding of the situation is that this was caused by a temporary power fluctuation.
Jun 20, 2023 12:26: We have completed a minor update to the UCloud platform.
Jun 16, 2023 12:21: The issue has been resolved.
12:17: We are investigating an issue related to the interactive interface of jobs.
Jun 9, 2023 13:07: The filesystem is fully operational again and everything should be back to normal.
09:24: The network issue appears to have affected our new file system on UCloud. This means that drives which has been migrated to the new system are unavailable. We are working on resolving this issue. Update: This has also caused some issues on internal systems which use the file system. This has affected the ability to launch jobs even if the jobs themselves do not directly require this file system.
08:22: We are currently investigating an issue with the network.
May 31, 2023 11:15: The power fluctuation was caused by an accident with a power cable. Click here for more details.
09:31: Hippo has completed its restart and is functional again. We will monitor the system to look any remaining issues
09:28: UCloud has completed its restart and is functional again. We will monitor the system to look any remaining issues.
09:22: The issue has been identified as a power outage. Both UCloud and Hippo has been affected.
09:13: We are currently experiencing issues with our infrastructure. We are working on resolving the issue.
08:22: We have deployed a minor patch to the user-interface of UCloud.
May 29, 2023 10:17: There is currently an issue scheduling jobs on UCloud.
May 24, 2023 15:30: Update complete.
15:29: We are updating UCloud. You might briefly lose connection.
May 23, 2023 16:17: Update is complete.
16:15: Update is starting.
12:15: We will be performing maintenance on the "DeiC Interactive HPC (SDU)" provider today at 16:15. We expect the provider to be down for approximately 15 minutes.
May 19, 2023 09:25: Jobs starts as expected again. If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
09:06: We are currently experiencing problems with scheduling new jobs. We are working on returning to full operations.
May 17, 2023 09:10: If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
May 16, 2023 16:08: Update complete.
16:07: We are pushing an update to the DeiC Interactive HPC (SDU) provider.
May 13, 2023 09:45: There is currently an issue with UCloud that is preventing some jobs from running correctly.
May 11, 2023 09:00: Jobs should be starting again.
08:51: We are deploying a small fix to address the issue.
08:40: We are investigating an issue which is causing jobs to not start in a timely manner.
May 10, 2023 16:31: Restart complete and fix confirmed.
16:27: We will shortly perform another restart to fix an issue affecting GPU machines.
16:08: The update is complete.
16:00: We will be performing a minor update to UCloud and the DeiC Interactive HPC (SDU) provider. You will experience a brief disruption to the service while we are restarting the services.
15:12: The situation has been resolved.
14:57: We are investigating issues related to compute.
May 9, 2023 13:54: The issue has been resolved.
13:50: We are aware of an issue which makes you unable to start and stop jobs on DeiC Interactive HPC (SDU)
Apr 12, 2023 12:18: The issue with the filesystem has been solved.
11:55: We are experiencing problems with the filesystem on UCloud.
09:51: The issue has been resolved and we will continue to monitor the system for related issues.
09:39: Some drives are currently returning an internal server error. We are investigating the issue.
Mar 22, 2023 06:54: The issue with jobs being slow to start appears to have improved. We will continue to monitor the situation.
Mar 21, 2023 21:31: Maintenance is complete. We will continue to monitor the system. We have noticed that some jobs are slower than usual to start, we will attempt to determine the root cause behind this in the morning.
18:16: Maintenance has started on UCloud and will be ongoing for the next few hours. We will update when the maintenance is complete. We do not expect any disruption to user jobs.
Mar 15, 2023 14:55: The u1-standard machine is currently running at 100% utilization. As a result, you may have to wait before your job starts. Update: UCloud had resources available again around 18:00 Wednesday.
Mar 10, 2023 12:53: Nearly 100% of all UCloud resources are currently in use. This is causing some jobs to immediately transition to a failed state due to our scheduler refusing to accept any new jobs. Update: UCloud had resources available again at around 21:00 Friday.
Mar 3, 2023 10:02: UCloud is currently experiencing higher load than usual. This means that you may experience longer wait times than usual. In particular, it can be hard to start jobs which require large machine types. If you are having trouble starting a job, then try selecting a smaller machine type. The high load is also causing the output of applications to not always appear. We are working on a fix for this issue. If you are running one of the applications which depend on this and the output don’t appear, then please contact support. The support team should be able to retrieve any information you may need from the output. This in particular affects applications such as MinIO and Rsync Server which depend on this output.