eScience
All systems are operational
UCloudOperational
Hippo (HPC Type 3)Operational

Past Events

Sep 26, 2023 09:40: Today all virtual machines will temporarily be turned off to install security updates on the physical hosts. UCloud virtual machines will automatically be restarted after the service is completed.
Sep 14, 2023 09:33: The Hippo frontend node had to be rebooted due to a kernel crash.
Aug 16, 2023 10:17: Maintenance is complete.
08:26: We are performing minor hardware maintenance this morning. This will affect Syncthing jobs on UCloud and the Hippo frontend will be temporarily offline. Everything is expected to be running again around 10am.
Jul 25, 2023 15:34: All systems should be operational again. This was again caused by an external event, which unfortunately is out of our hands.
15:12: We just had another power fluctuation in the server room, which caused most machines to reboot. We are working on restoring services.
Jun 28, 2023 11:47: Job status on UCloud should now be in sync with the provider. Public links and IPs which were bound to failed jobs have been released and should be available for use again.
11:20: Hippo is now back online.
11:19: UCloud is now back online. We are aware that several user jobs were shutdown in the process. This status might not be correctly displayed in UCloud at the moment. We are working on restoring the correct status in the interface.
11:01: UCloud and Hippo is currently down. Our current understanding of the situation is that this was caused by a temporary power fluctuation.
Jun 20, 2023 12:26: We have completed a minor update to the UCloud platform.
Jun 16, 2023 12:21: The issue has been resolved.
12:17: We are investigating an issue related to the interactive interface of jobs.
Jun 9, 2023 13:07: The filesystem is fully operational again and everything should be back to normal.
09:24: The network issue appears to have affected our new file system on UCloud. This means that drives which has been migrated to the new system are unavailable. We are working on resolving this issue. Update: This has also caused some issues on internal systems which use the file system. This has affected the ability to launch jobs even if the jobs themselves do not directly require this file system.
08:22: We are currently investigating an issue with the network.
May 31, 2023 11:15: The power fluctuation was caused by an accident with a power cable. Click here for more details.
09:31: Hippo has completed its restart and is functional again. We will monitor the system to look any remaining issues
09:28: UCloud has completed its restart and is functional again. We will monitor the system to look any remaining issues.
09:22: The issue has been identified as a power outage. Both UCloud and Hippo has been affected.
09:13: We are currently experiencing issues with our infrastructure. We are working on resolving the issue.
08:22: We have deployed a minor patch to the user-interface of UCloud.
May 29, 2023 10:17: There is currently an issue scheduling jobs on UCloud.
May 24, 2023 15:30: Update complete.
15:29: We are updating UCloud. You might briefly lose connection.
May 23, 2023 16:17: Update is complete.
16:15: Update is starting.
12:15: We will be performing maintenance on the "DeiC Interactive HPC (SDU)" provider today at 16:15. We expect the provider to be down for approximately 15 minutes.
May 19, 2023 09:25: Jobs starts as expected again. If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
09:06: We are currently experiencing problems with scheduling new jobs. We are working on returning to full operations.
May 17, 2023 09:10: If a UCloud job is stuck in "Job is starting soon" for more than 5 minutes, try restarting the job before contacting support. If you are requesting a full machine with 64 cores, the job might not start simply because there are no available nodes at the moment.
May 16, 2023 16:08: Update complete.
16:07: We are pushing an update to the DeiC Interactive HPC (SDU) provider.
May 13, 2023 09:45: There is currently an issue with UCloud that is preventing some jobs from running correctly.
May 11, 2023 09:00: Jobs should be starting again.
08:51: We are deploying a small fix to address the issue.
08:40: We are investigating an issue which is causing jobs to not start in a timely manner.
May 10, 2023 16:31: Restart complete and fix confirmed.
16:27: We will shortly perform another restart to fix an issue affecting GPU machines.
16:08: The update is complete.
16:00: We will be performing a minor update to UCloud and the DeiC Interactive HPC (SDU) provider. You will experience a brief disruption to the service while we are restarting the services.
15:12: The situation has been resolved.
14:57: We are investigating issues related to compute.
May 9, 2023 13:54: The issue has been resolved.
13:50: We are aware of an issue which makes you unable to start and stop jobs on DeiC Interactive HPC (SDU)
Apr 12, 2023 12:18: The issue with the filesystem has been solved.
11:55: We are experiencing problems with the filesystem on UCloud.
09:51: The issue has been resolved and we will continue to monitor the system for related issues.
09:39: Some drives are currently returning an internal server error. We are investigating the issue.
Mar 22, 2023 06:54: The issue with jobs being slow to start appears to have improved. We will continue to monitor the situation.
Mar 21, 2023 21:31: Maintenance is complete. We will continue to monitor the system. We have noticed that some jobs are slower than usual to start, we will attempt to determine the root cause behind this in the morning.
18:16: Maintenance has started on UCloud and will be ongoing for the next few hours. We will update when the maintenance is complete. We do not expect any disruption to user jobs.
Mar 15, 2023 14:55: The u1-standard machine is currently running at 100% utilization. As a result, you may have to wait before your job starts. Update: UCloud had resources available again around 18:00 Wednesday.
Mar 10, 2023 12:53: Nearly 100% of all UCloud resources are currently in use. This is causing some jobs to immediately transition to a failed state due to our scheduler refusing to accept any new jobs. Update: UCloud had resources available again at around 21:00 Friday.
Mar 3, 2023 10:02: UCloud is currently experiencing higher load than usual. This means that you may experience longer wait times than usual. In particular, it can be hard to start jobs which require large machine types. If you are having trouble starting a job, then try selecting a smaller machine type. The high load is also causing the output of applications to not always appear. We are working on a fix for this issue. If you are running one of the applications which depend on this and the output don’t appear, then please contact support. The support team should be able to retrieve any information you may need from the output. This in particular affects applications such as MinIO and Rsync Server which depend on this output.
Feb 14, 2023 10:50: We have applied several patches since yesterday. Preliminary results suggest that the service is more stable now, but we will continue to monitor the situation.
Feb 13, 2023 15:01: We are still trying to solve stability issues with UCloud caused by the maintenance operation last week. Some apps are more affected by this issue, such as RStudio and JupyterLab. If you experience a disconnect (or 403 errors), please close the app and press the "Open interface" button again.
Feb 9, 2023 15:14: We are still experiencing issues with UCloud related to the maintenance earlier this week. We are aware of the problem and actively trying to find a solution.
Feb 8, 2023 14:21: We have deployed some mitigations against the observed errors. We will continue to observe the system for errors.
12:46: We are observing some errors related to the maintenance yesterday. We are aware of these problems and are monitoring the situation closely.
Feb 7, 2023 12:56: Access to UCloud has been restored. The cluster currently has fewer nodes than normal. As we are still in the process of moving some nodes to a new system. We expect this capacity to return to normal levels by the end of tomorrow. GPU machines in the Type 1 (SDU) system will remain unavailable until they have been fully migrated (expected by end of tomorrow).
12:04: The migration has been completed. We are running some additional tests to check the system.
07:58: We have started scheduled maintenance to migrate the DeiC Interactive HPC provider to a new and improved system.
Feb 1, 2023 11:07: Performance has stabilised but we are still seeing slower than usual queries. We are working on improving performance.
09:54: We are experiencing issues related to the compute sub-system of UCloud.
Jan 10, 2023 15:46: We have observed some file-system sporadic instability on UCloud. We are monitoring the situation, but we are still considering the system operational.
Dec 22, 2022 12:04: Around 11:31 we experienced around 10 minutes of downtime due to a network reconfiguration.
Dec 16, 2022 12:44: The Virtual Machines hosted at AAU may be slow and less responsive between Dec 14 and Dec 31 as AAU ITS is performing scheduled maintenance.
Dec 6, 2022 10:16: An issue with application output has been resolved.
09:13: The last application section has returned correctly to UCloud.
08:31: UCloud is now operational. We are investigating an issue with some applications not showing up correctly in the "apps" interface.
07:51: UCloud is currently experiencing some issues due to unforeseen issues during maintenance.
Dec 4, 2022 12:38: We have experienced system-wide DNS issues from around 03:27 this morning until 12:11 where the problem was identified and solved.
Nov 18, 2022 15:22: The problem with the storage system has been solved.
14:30: Our Ceph storage system is experiencing problems, which is affecting the UCloud platform.
Nov 17, 2022 15:30: The problem with the storage system has been solved.
15:20: Our Ceph storage system is experiencing problems, which is affecting the UCloud platform.
Nov 7, 2022 14:00: The issue with the storage system caused a compute node to crash and consequently the jobs running on the node were cancelled. UCloud should otherwise be working again.
13:52: The problem with the storage system has been solved.
13:34: Our Ceph storage system is experiencing problems, which is affecting the UCloud platform.
Oct 24, 2022 14:10: The problem with the storage system has been solved.
13:42: Our Ceph storage system is experiencing problems, which is affecting the UCloud platform.
Oct 7, 2022 11:11: UCloud is operational again.
10:22: UCloud is experiencing some problems with applications.
Oct 3, 2022 12:12: All Hippo nodes are accepting jobs again.
08:58: The Hippo compute nodes are down for maintenance, but the frontend is available. System should be fully operational again around noon.