eScience
All systems are operational
UCloud: DeiC Interactive HPCOperational
 ↳ Provider: SDU (Kubernetes)Operational
 ↳ Provider: AAU (OpenStack)Operational
 ↳ Provider: AAU (Kubernetes)Operational
Hippo: DeiC Large Memory HPCOperational

Past Events

Apr 18, 2024 14:01: The maintenance period at DeiC Interactive HPC (AAU) has been extended until 30/04/24.
Apr 2, 2024 07:00: The DeiC Interactive HPC (AAU) provider will be performing maintenance between 02/04/24 and 30/04/24. To minimise the number of disruptions to research it has been determined to perform several upgrades successively. A 4 week window has been allocated from 02/04 to 30/04. All running instances should be unaffected by the upgrades, and should be accessible throughout the upgrade processes. These upgrades are essential for the operation and security of the OpenStack platform, which serves the DeiC Interactive HPC with virtual machines. While these upgrades are underway, there will be periods where it will not be possible to submit any commands to the OpenStack platform. For example users will not be able to use the UCloud interface to create, start, stop or delete any instances. The duration of this period of limited access is not known before the upgrade process is completed, and the potential effect of each upgrade will be assessed before performing each.
Mar 25, 2024 09:13: The issue has been resolved.
09:10: We are currently investigating an issue affecting the DeiC Interactive HPC (SDU) provider.
Mar 7, 2024 10:36: The issue has been resolved.
10:25: At 9:45 UCloud starting becoming unresponsive. We have stabilized the situation and are currently working on resolving an issue affecting the web-interfaces of jobs.
Feb 29, 2024 13:58: Some of the GPU nodes on UCloud had to be rebooted due to a misconfiguration. A couple of user jobs were affected by this.
Dec 19, 2023 07:00: The AAU OpenStack cluster will undergo scheduled maintenance and running VMs will be restarted as part of the process.
Dec 5, 2023 14:30: We are deploying an update to the user-interface to resolve an issue related to logs not appearing.
12:44: The issue with Syncthing has been resolved. You may need to manually restart the instance by visiting the "Manage synchronization" page.
10:17: We are currently investigating an issue with duplicate Syncthing instances.
Dec 4, 2023 15:27: The second stage of the upgrade has been completed. We will be monitoring the system over the next days to make sure the new updates are working as intended.
11:54: Our storage system has been upgraded. We will continue with the second part of the upgrade shortly.
07:37: We are taking UCloud offline to begin scheduled maintenance.
07:00: We will perform a major maintenance operation on the UCloud system. The entire system is expected to be down all day and all jobs running at the "DeiC Interactive HPC (SDU)" provider will be terminated in the morning.
Nov 24, 2023 07:56: Maintenance on the AAU OpenStack cluster is still ongoing it might affect running VMs.
Nov 19, 2023 13:42: Everything should now be back online. We are monitoring the systems for possible anomalies.
12:15: Maintenance finished early and we are now bringing systems back online.
07:12: We will start taking the systems offline shortly.
07:00: Unfortunately, we will have to temporarily shut down both Hippo and the UCloud platform on Sunday (19/11/23). The services will be unavailable from the morning until sometime in the afternoon. During this time it will not be possible to access the services, our status page, or any of the jobs. This will also shut down all jobs running on the "DeiC Interactive HPC (SDU)" provider on UCloud.
Nov 15, 2023 17:19: Around 10:12 this morning there was a major incident in one of the SDU server rooms, which resulted a complete loss of power. All our storage systems and our internet connectivity was down, resulting in all services being inaccessible. Around 16:20 power was restored and services slowly came back online.
Nov 9, 2023 07:00: There will be a maintenance operation on the OpenStack cluster hosting the VMs accessible via UCloud (the uc-xxx machine types). During the maintenance window it might not be possible to start new VMs, but already running machines will not be affected.
Oct 7, 2023 12:27: We have deployed a fix to Hippo and we are monitoring the behavior of the system. Unfortunately a number of jobs have failed due to the issue.
11:26: There appear to be an issue with the Hippo system after the update. We are looking into the problem. For now all nodes have been put in maintenance mode.
Oct 6, 2023 12:08: Hippo is now handling jobs again. The new nodes will continue to be reserved for a day or two to run some more benchmarks.
10:50: The Hippo frontend is now accessible again. The maintenance reservation is expected be removed this afternoon.
Oct 4, 2023 08:23: The Hippo cluster is now down for maintenance. The system will be updated and expanded during the next few days. The system is expected to be back online again no later than Monday, October 9th.
Sep 26, 2023 09:40: Today all virtual machines will temporarily be turned off to install security updates on the physical hosts. UCloud virtual machines will automatically be restarted after the service is completed.
Sep 14, 2023 09:33: The Hippo frontend node had to be rebooted due to a kernel crash.
Aug 16, 2023 10:17: Maintenance is complete.
08:26: We are performing minor hardware maintenance this morning. This will affect Syncthing jobs on UCloud and the Hippo frontend will be temporarily offline. Everything is expected to be running again around 10am.
Jul 25, 2023 15:34: All systems should be operational again. This was again caused by an external event, which unfortunately is out of our hands.
15:12: We just had another power fluctuation in the server room, which caused most machines to reboot. We are working on restoring services.
Jun 28, 2023 11:47: Job status on UCloud should now be in sync with the provider. Public links and IPs which were bound to failed jobs have been released and should be available for use again.
11:20: Hippo is now back online.
11:19: UCloud is now back online. We are aware that several user jobs were shutdown in the process. This status might not be correctly displayed in UCloud at the moment. We are working on restoring the correct status in the interface.
11:01: UCloud and Hippo is currently down. Our current understanding of the situation is that this was caused by a temporary power fluctuation.
Jun 20, 2023 12:26: We have completed a minor update to the UCloud platform.
Jun 16, 2023 12:21: The issue has been resolved.
12:17: We are investigating an issue related to the interactive interface of jobs.
Jun 9, 2023 13:07: The filesystem is fully operational again and everything should be back to normal.
09:24: The network issue appears to have affected our new file system on UCloud. This means that drives which has been migrated to the new system are unavailable. We are working on resolving this issue. Update: This has also caused some issues on internal systems which use the file system. This has affected the ability to launch jobs even if the jobs themselves do not directly require this file system.
08:22: We are currently investigating an issue with the network.
May 31, 2023 11:15: The power fluctuation was caused by an accident with a power cable. Click here for more details.
09:31: Hippo has completed its restart and is functional again. We will monitor the system to look any remaining issues
09:28: UCloud has completed its restart and is functional again. We will monitor the system to look any remaining issues.
09:22: The issue has been identified as a power outage. Both UCloud and Hippo has been affected.
09:13: We are currently experiencing issues with our infrastructure. We are working on resolving the issue.
08:22: We have deployed a minor patch to the user-interface of UCloud.
May 29, 2023 10:17: There is currently an issue scheduling jobs on UCloud.
May 24, 2023 15:30: Update complete.
15:29: We are updating UCloud. You might briefly lose connection.
May 23, 2023 16:17: Update is complete.
16:15: Update is starting.
12:15: We will be performing maintenance on the "DeiC Interactive HPC (SDU)" provider today at 16:15. We expect the provider to be down for approximately 15 minutes.
May 19, 2023 09:25: Jobs starts as expected again. If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
09:06: We are currently experiencing problems with scheduling new jobs. We are working on returning to full operations.
May 17, 2023 09:10: If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
May 16, 2023 16:08: Update complete.
16:07: We are pushing an update to the DeiC Interactive HPC (SDU) provider.
May 13, 2023 09:45: There is currently an issue with UCloud that is preventing some jobs from running correctly.
May 11, 2023 09:00: Jobs should be starting again.
08:51: We are deploying a small fix to address the issue.
08:40: We are investigating an issue which is causing jobs to not start in a timely manner.
May 10, 2023 16:31: Restart complete and fix confirmed.
16:27: We will shortly perform another restart to fix an issue affecting GPU machines.
16:08: The update is complete.
16:00: We will be performing a minor update to UCloud and the DeiC Interactive HPC (SDU) provider. You will experience a brief disruption to the service while we are restarting the services.
15:12: The situation has been resolved.
14:57: We are investigating issues related to compute.
May 9, 2023 13:54: The issue has been resolved.
13:50: We are aware of an issue which makes you unable to start and stop jobs on DeiC Interactive HPC (SDU)