eScience
All systems are operational
UCloud: DeiC Interactive HPCOperational
 ↳ Provider: SDU (Kubernetes)Operational
 ↳ Provider: AAU (OpenStack)Operational
 ↳ Provider: AAU (Kubernetes)Operational
Hippo: DeiC Large Memory HPCOperational

Past Events

Jul 10, 2024 11:11: The system is now stable.
10:57: We are investigating an issue with UCloud.
Jun 17, 2024 11:08: The issue has been resolved.
11:07: We are currently investigating an issue with the SDU/K8s provider. We do not expect this issue to last more than a few minutes.
Jun 11, 2024 10:42: The issue has been fixed.
10:24: There is currently an issue with GPUs on the AAU K8s cluster.
Jun 7, 2024 16:55: It looks like the configuration change has solved the issue with the u3-gpu nodes.
12:08: We believe the issue with the u3-gpu nodes is caused by a recent firmware update. We are testing a configuration change to the Linux kernel that potentially will solve the problem.
Jun 6, 2024 14:30: We are experiencing a hardware issue on one of the u3-gpu nodes. Sometimes a GPU gets stuck and the machine has to be rebooted. We are monitoring the situation and trying to resolve it.
May 27, 2024 13:06: Update is complete and Slurm is scheduling jobs again.
12:06: The update is taking a bit longer than expected. We are currently running some tests to check that everything works.
08:00: We have scheduled maintenance for the Hippo Slurm cluster in the morning, where Slurm and the Linux distribution will be updated. A reservation has been added to the system between 08:00 and 12:00.
May 17, 2024 07:49: A new version of the UI has been deployed containing small tweaks and various bug fixes.
May 16, 2024 10:38: We have deployed a fix for the following issues: IP addresses should now be usable in jobs again (please reload your UI). New Syncthing jobs should now be able to start. Fixes a few UI issues related to Syncthing.
09:02: We are still working on investigating a Syncthing issue.
08:11: We have rolled out another update with various bug fixes. We have released some links and IP addresses which were left in a bound state since the update. These are now usable again.
May 15, 2024 15:40: There has been an issues with job submissions for the past couple of hours, but it has now been fixed.
11:07: Notifications for jobs are now functional again. This also fixes a rare issues which caused too many notifications to be sent to a few users.
08:25: Syncthing is operational once again. Your jobs should have restarted automatically. If they have not, please try to restart it manually or contact support.
May 14, 2024 15:35: The upgrade has been completed and access to UCloud has been restored. There are currently a few known problems, such as Syncthing not working, and notifications for jobs are unreliable. We will work on fixing the remaining issues during the week.
15:06: The upgrade is almost complete and we are now checking that everything is working as expected.
09:06: The infrastructure part of the upgrade has been completed. We are now preparing the new version of UCloud for deployment.
08:00: We are now starting the scheduled maintenance on the UCloud system.
07:00: UCloud will be down for maintenance on the 14th of May. The expected downtime will be between 08:00 and 16:00. All jobs on DeiC Interactive HPC (SDU/AAU K8) will be terminated when the maintenance period begins. All VMs at the AAU OpenStack provider will also be restarted on the same day due to maintenance operations.
May 2, 2024 15:43: The AAU Kubernetes cluster is now back up and running.
13:24: The AAU Kubernetes cluster is currently down.
Apr 18, 2024 14:01: The maintenance period at DeiC Interactive HPC (AAU) has been extended until 30/04/24.
Apr 2, 2024 07:00: The DeiC Interactive HPC (AAU) provider will be performing maintenance between 02/04/24 and 30/04/24. To minimise the number of disruptions to research it has been determined to perform several upgrades successively. A 4 week window has been allocated from 02/04 to 30/04. All running instances should be unaffected by the upgrades, and should be accessible throughout the upgrade processes. These upgrades are essential for the operation and security of the OpenStack platform, which serves the DeiC Interactive HPC with virtual machines. While these upgrades are underway, there will be periods where it will not be possible to submit any commands to the OpenStack platform. For example users will not be able to use the UCloud interface to create, start, stop or delete any instances. The duration of this period of limited access is not known before the upgrade process is completed, and the potential effect of each upgrade will be assessed before performing each.
Mar 25, 2024 09:13: The issue has been resolved.
09:10: We are currently investigating an issue affecting the DeiC Interactive HPC (SDU) provider.
Mar 7, 2024 10:36: The issue has been resolved.
10:25: At 9:45 UCloud starting becoming unresponsive. We have stabilized the situation and are currently working on resolving an issue affecting the web-interfaces of jobs.
Feb 29, 2024 13:58: Some of the GPU nodes on UCloud had to be rebooted due to a misconfiguration. A couple of user jobs were affected by this.
Dec 19, 2023 07:00: The AAU OpenStack cluster will undergo scheduled maintenance and running VMs will be restarted as part of the process.
Dec 5, 2023 14:30: We are deploying an update to the user-interface to resolve an issue related to logs not appearing.
12:44: The issue with Syncthing has been resolved. You may need to manually restart the instance by visiting the "Manage synchronization" page.
10:17: We are currently investigating an issue with duplicate Syncthing instances.
Dec 4, 2023 15:27: The second stage of the upgrade has been completed. We will be monitoring the system over the next days to make sure the new updates are working as intended.
11:54: Our storage system has been upgraded. We will continue with the second part of the upgrade shortly.
07:37: We are taking UCloud offline to begin scheduled maintenance.
07:00: We will perform a major maintenance operation on the UCloud system. The entire system is expected to be down all day and all jobs running at the "DeiC Interactive HPC (SDU)" provider will be terminated in the morning.
Nov 24, 2023 07:56: Maintenance on the AAU OpenStack cluster is still ongoing it might affect running VMs.
Nov 19, 2023 13:42: Everything should now be back online. We are monitoring the systems for possible anomalies.
12:15: Maintenance finished early and we are now bringing systems back online.
07:12: We will start taking the systems offline shortly.
07:00: Unfortunately, we will have to temporarily shut down both Hippo and the UCloud platform on Sunday (19/11/23). The services will be unavailable from the morning until sometime in the afternoon. During this time it will not be possible to access the services, our status page, or any of the jobs. This will also shut down all jobs running on the "DeiC Interactive HPC (SDU)" provider on UCloud.
Nov 15, 2023 17:19: Around 10:12 this morning there was a major incident in one of the SDU server rooms, which resulted a complete loss of power. All our storage systems and our internet connectivity was down, resulting in all services being inaccessible. Around 16:20 power was restored and services slowly came back online.
Nov 9, 2023 07:00: There will be a maintenance operation on the OpenStack cluster hosting the VMs accessible via UCloud (the uc-xxx machine types). During the maintenance window it might not be possible to start new VMs, but already running machines will not be affected.
Oct 7, 2023 12:27: We have deployed a fix to Hippo and we are monitoring the behavior of the system. Unfortunately a number of jobs have failed due to the issue.
11:26: There appear to be an issue with the Hippo system after the update. We are looking into the problem. For now all nodes have been put in maintenance mode.
Oct 6, 2023 12:08: Hippo is now handling jobs again. The new nodes will continue to be reserved for a day or two to run some more benchmarks.
10:50: The Hippo frontend is now accessible again. The maintenance reservation is expected be removed this afternoon.
Oct 4, 2023 08:23: The Hippo cluster is now down for maintenance. The system will be updated and expanded during the next few days. The system is expected to be back online again no later than Monday, October 9th.
Sep 26, 2023 09:40: Today all virtual machines will temporarily be turned off to install security updates on the physical hosts. UCloud virtual machines will automatically be restarted after the service is completed.
Sep 14, 2023 09:33: The Hippo frontend node had to be rebooted due to a kernel crash.
Aug 16, 2023 10:17: Maintenance is complete.
08:26: We are performing minor hardware maintenance this morning. This will affect Syncthing jobs on UCloud and the Hippo frontend will be temporarily offline. Everything is expected to be running again around 10am.