On December 1st there will be scheduled maintenance in the SDU data center, which requires a complete shutdown of all servers. For this reason UCloud and all other services offered by SDU eScience will be unavailable during the entire working today. We expect systems to be back online late in the afternoon.Dec 1, 07:00 - Dec 1, 15:16
Updates
All services should be back up and running.Dec 1, 15:16
November, 2025
Hardware problems for u2-gpu
The u2-gpu machines are currently experiencing hardware problems. User jobs are able to run, but they can be killed at any point due to maintenance.Nov 18, 09:00 - Dec 3, 11:32
Updates
The hardware has been replaced and all GPUs are working again.Dec 3, 11:33The machine has been powered off and hardware replacements should be performed later today.Dec 3, 09:21
UCloud is experiencing issues
UCloud is currently experiencing issues, we are working on fixing the problem.Nov 17, 19:26 - Nov 17, 20:21
Updates
UCloud partially unavailable
During the night UCloud had an internal issue, which made it impossible to access jobs and files.Nov 14, 00:12 - Nov 14, 06:48
Updates
UCloud has been updated
UCloud has been updated with the latest round of bug fixes and improvements to the UI. As always, this may have caused a few minutes of disruption to the service. Sorry for the inconvenience. Nov 11, 09:37 - Nov 11, 09:36
Updates
October, 2025
UCloud jobs will be terminated
On Sunday, October 26th, several UCloud jobs that have been running for more than 14 days will be terminated due to hardware maintenance.Oct 26, 10:00 - Oct 26, 18:07
Updates
The jobs have been terminated.Oct 26, 18:07
SDU/K8s unavailable
The SDU/K8s provider was unavailable between 12/10/25 15:56:34 and 12/10/25 17:04:25 due to a software bug. A bug fix has been released now (13/10/25 07:00).Oct 12, 15:56 - Oct 12, 17:04
Updates
September, 2025
UCloud is being restarted for an update
UCloud is being restarted for an update. The update will take a few minutes.Sep 30, 08:50 - Sep 30, 08:52
Updates
Two H100 nodes rebooted
Two of the H100 nodes were rebooted this morning due to hardware maintenance, the first one around 9.00 and the second one around 10.30 A couple of jobs where stopped during the reboot.Sep 26, 08:00 - Sep 26, 11:05
Updates
Compute nodes unresponsive
Around 8:50 this morning we started decommissioning an old storage system, which unfortunately affects the SDU/K8s compute nodes, making them partially unresponsive.Sep 22, 08:52 - Sep 22, 10:45
Updates
Things should finally be returning to normal. If jobs are stuck for more than 10 minutes, start a new one.Sep 22, 10:31
Power fluctuation caused machines to reboot
A fluctuation in the power grid caused around 30 machines to reboot in the UCloud server room. Jobs running on the machines were terminated during the reboot.Sep 9, 21:45 - Sep 10, 06:30
Updates
Issues with nodea0-19
The machine 'nodea0-19' has been rebooted due to an error with one of the GPU cards. We are monitoring the node for a potential hardware issue.Sep 2, 07:35 - Sep 19, 13:19
Updates
The faulty GPU has been replaced.Sep 19, 13:19Hardware maintenance has been initiated.Sep 19, 12:46A support case has been opened to get the card replaced.Sep 10, 07:49The error has reappeared, the card will most likely need to be replaced.Sep 4, 07:48
August, 2025
UCloud degraded
Between 00:21 and 08:00 UCloud was running with elevated error rates causing the job page to not load correctly. The issue has been resolved.Aug 29, 00:21 - Aug 29, 08:01
Updates
Jobs unable to start on SDU/K8s
We are aware of a situation causing jobs to not start. We believe this is related to the storage system. We are currently investigating the situation.Aug 20, 12:26 - Aug 20, 15:22
Updates
'Type 1 - KU' storage allocations unavailable
Projects receiving storage allocations from "Type 1 - KU" for SDU/K8s were temporarily unavailable. The allocations should be available once again now.Aug 15, 11:03 - Aug 15, 11:02
Updates
SDU/K8s usage tracked incorrectly for storage
We are aware of an issue causing usage to be tracked incorrectly for storage. We will restart the system to fix this issue.Aug 14, 11:27 - Aug 14, 11:39
Updates
We have completed a reset of usage numbers in storage which we believe will fix the issue. The system is now back online after a few minutes of downtime.Aug 14, 11:39
SDU/K8s some jobs are slow to start
We are experiencing slower storage performance on some nodes, which are leading to jobs being slow to start. Jobs that would ordinarily start within a minute can now take up to a few minutes before they start. We are monitoring the situation. Aug 14, 08:54 - Aug 15, 13:07
Updates
SDU/K8s downtime
An issue has been resolved leading to a crash is under investigationAug 13, 17:30 - Aug 13, 17:55
Updates
Syncthing temporarily disabled
Syncthing has temporarily been disabled while we investigate an issue with the system. We do not believe that the system will be re-enabled today (13/08/25). We hope to have an update with ETA or fix tomorrow (14/08/25).Aug 13, 15:58 - Aug 15, 13:07
Updates
Syncthing has been re-enabled. We will monitor the situation. If instability is reintroduced then we may need to disable it again. Status updates will be posted here if this becomes needed.Aug 15, 13:08We are currently testing a fix internally and hope to deploy the fix tomorrow (15/08/25).Aug 14, 14:08
Jobs unable to start
We are investigating an issue related to jobs not starting.Aug 13, 14:52 - Aug 13, 15:58
Updates
The is no longer present. We will continue to monitor the system.Aug 13, 15:58
SDU/K8s maintenance
UPDATE: The maintenance has been moved from the 12th of August to the 13th of August.
The SDU/K8s (DeiC Interactive HPC, SDU) service provider will be down on the 13th of August (13/08/2025). All jobs will be killed prior to the maintenance and will not be automatically restarted. The maintenance is expected to take place between 08:00 and 16:00.Aug 13, 08:00 - Aug 13, 13:45
Updates
Maintenance has been completed. We will be monitoring the system over the coming hours and days.Aug 13, 13:45Preparation phase of the maintenance took longer than expected. The primary work has started now and the provider is no longer accessible as noted in the original maintenance notice.Aug 13, 09:56
UCloud interactive interfaces not working
We are investigating an issue which causes the "Open interface" button to not appear on the SDU/K8s provider.Aug 8, 08:50 - Aug 8, 09:03
Updates
The issue has been resolved.Aug 8, 09:03
UCloud is experiencing issues
We are looking into issues with UCloud, which causes most funtionality to not be working.Aug 7, 21:15 - Aug 8, 08:50
Updates
The problem has been identified and the system is operational again.Aug 8, 08:45
AAU/K8s maintenance
The AAU/K8s (DeiC Interactive HPC, AAU) service provider will be down on the 5th of August (05/08/2025). All jobs will be killed prior to the maintenance and will not be automatically restarted. The maintenance is expected to take place between 08:00 and 16:00.Aug 5, 08:00 - Aug 5, 10:29
Updates
Maintenance has completed and the system is available. We are monitoring the system for possible bugs.Aug 5, 10:29
July, 2025
Short network outage
This morning we had an issue with the internal network, which caused disconnects across the entire infrastructure for around 15 minutes. We are monitoring the system for related issues.Jul 9, 08:43 - Jul 9, 16:00
Updates
We believe most things should be working again, but we are still monitoring the system for potential issues.Jul 9, 11:53It seems there are still some services that have not properly recovered.Jul 9, 10:20
June, 2025
Slow response times
We are currently investigating an issue causing slow response times.Jun 24, 12:52 - Jun 24, 13:17
Updates
The issue has been resolved.Jun 24, 13:17
UCloud jobs might be slower than normal
On Monday evening (June 23rd) a machine in our Ceph storage system failed. This is affecting u1-standard and u1-fat nodes, where jobs might run slower than usual, because the storage system is under heavy load while rebuilding data redundancy.Jun 23, 22:15 - Jul 16, 11:42
Updates
Issues with H100 GPU node
We are experiencing hardware issues with one of the u3-gpu nodes on UCloud, resulting in the machine sometimes powering off. We are monitoring the situation.Jun 16, 07:48 - Aug 14, 15:00
Updates
The machine has been repaired and returned to production.Aug 14, 15:00The issue persists and the machine has been taken out of production until the problem has been solved.Jul 3, 09:30
UCloud apps interface not working
This morning we had an issue where accessing the interface of UCloud apps simply resulted in "Not Found". The problem should now be solved.Jun 11, 06:00 - Jun 11, 07:57
Updates
Internal error when submitting jobs
UCloud has experienced issues with job creation beginning around Friday 23:28. This has likely caused some jobs to be terminated with an incorrect "insufficient funds" message. The issue has now been resolved and we are monitoring the situation.Jun 6, 23:28 - Jun 8, 10:30
Updates
Unexpected machine reboots
This morning 15 machines rebooted in the UCloud server room due to a fluctuation in the power grid. All machines are running again, but jobs running on the machines were terminated.Jun 2, 06:25 - Jun 2, 06:30
Updates
May, 2025
Bug fixes for user-interface
We have deployed a new version of UCloud's interface containing various fixes. This caused more update messages than intended, we apologize for the temporary notification spam.May 16, 08:11 - May 16, 08:11
Updates
Minor UCloud maintenance
During the weekend 3-4 May we are performing minor maintenance on the infrastructure behind UCloud in the morning. This might cause minor disruptions of a few minutes while services are being restarted.May 3, 06:00 - May 4, 10:00
Updates
April, 2025
UCloud issue
UCloud was down between 10:30 and 10:55. The issue was caused by an issue with the persistence layer of UCloud. The issue didn't affect already running jobs.Apr 22, 10:30 - Apr 22, 10:55
Updates
Power fluctuation caused machines to reboot
A transient power fluctuation in the UCloud data center caused 32 machines to reboot. All running jobs where terminated, but the machines are back online and accepting new jobs.Apr 12, 17:41 - Apr 12, 18:00
Updates
Disruption of service
UCloud experienced disruption of service between 08:39 and 08:52. The issue has been resolved now.Apr 2, 08:39 - Apr 2, 08:52
Updates
March, 2025
Issues with AAU OpenStack provider
Users may experience instability in starting and stopping virtual machines on the AAU provider. The issue has intermittently caused some jobs to fail to start with a message that "all available resources have been allocated" even though resources are available.Mar 5, 12:00 - Mar 6, 15:00
Updates
February, 2025
SDU/K8s unavailable for maintenance
The SDU/K8s is currently unavailable for maintenance. The maintenance is expected to take a few minutes and is required to fix an issue causing incorrect "insufficient funds" messages.Feb 24, 12:45 - Feb 24, 13:26
Updates
The provider is now running again. We are still running a background maintenance job and the system might restart several times during the next hour.Feb 24, 13:04
Elevated error rates on the SDU/K8s provider
The SDU/K8s provider experienced elevated error rates between 09:05 and 09:25. The issue has now been resolved.Feb 18, 09:05 - Feb 18, 09:25
Updates
AAU/K8s increased failure rate
The AAU/K8s provider is currently experiencing timeouts and increased failure rates.
The timeouts can cause slower than usual response times from UCloud even when not attempting to start something on AAU/K8s.Feb 11, 08:04 - Feb 11, 08:28
Updates
The situation has now been stabilized and it should now function as expected.Feb 11, 08:28
Maintenace for AAU providers
Maintenance window for the AAU providers (Kubernetes and OpenStack). Running jobs at the Kubernetes provider will be terminated and running VMs in the OpenStack provider will be rebooted. Services will return to normal at the end of the day.Feb 11, 07:00 - Feb 11, 16:00
Updates
Maintenance for the AAU Kubernetes provider has been completed.Feb 11, 11:15
Disruption of service to the UCloud platform
We have solved an issue which caused disruption of service to the UCloud platform itself. This caused several job related endpoints to fail unexpectedly. The issue started around 04:00 and was resolved at around 07:56.Feb 11, 04:00 - Feb 11, 07:56
Updates
AAU/K8s and AAU/OpenStack CEPH maintenance
In order to ensure stable operations and data security, the storage system that is utilised by several AAU based HPC platforms is undergoing rebalancing. The systems affected include: AAU/Kubernetes and AAU/OpenStack.
This process runs automatically, and can result in reduced performance (read and write speeds) on all local platforms.
This process is planned to be completed by 10/02/2025.
Feb 3, 08:11 - Feb 10, 09:00
Updates
January, 2025
UCloud degraded performance
We are currently working on resolving an issue related to the orchestration components of UCloud.Jan 27, 08:34 - Jan 27, 13:06
Updates
The issue has been resolved.Jan 27, 13:06
System update on Hippo
We will have a maintenance window to perform system updates on the Hippo cluster. The work will start in the morning and it is expected to be completed in the afternoon.Jan 27, 08:00 - Jan 27, 15:21
Updates
Update has been completed and jobs are running again.Jan 27, 15:21
UCloud maintenance
On Sunday morning (January 26th) we will perform extraordinary maintenance on some core parts of the UCloud system, including some database migrations. To complete these tasks we will have to take UCloud offline for a few hours. Running jobs will continue to run, but they will be inaccessible while UCloud is offline.Jan 26, 06:30 - Jan 26, 10:16
Updates
Maintenance has completed.Jan 26, 10:16The database migration has been completed and UCloud is online again. However, there might be smaller instabilities while we perform the rest of the tasks.Jan 26, 08:41
Unable to apply for certain machine types
We are currently investigating an issue in the grant system which makes it impossible to approve applications for certain machine types, primarily affecting the AAU/K8s service proivder.Jan 2, 11:09 - Jan 2, 13:16
Updates
The issue has been resolved.Jan 2, 13:17
2025 allocations unavailable
We are aware of an issue which caused the 2024 allocations to be correctly deactivated but without the 2025 allocations being activated. We have manually activated the 2025 allocations now and will continue to do testing to ensure that all 2025 allocations are usable.Jan 2, 10:02 - Jan 2, 10:00
Updates
December, 2024
Hippo is unavailable and other services are also affected
We have lost a core switch in the SDU data center, which means that part of the network is unavailable, in particular Hippo, but some of the UCloud compute nodes are also not able to access the internet.Dec 10, 06:39 - Dec 10, 08:49
Updates
All services should be up and running again.Dec 10, 08:49
Maintenance in the UCloud data center
In the morning on 9 Dec 2024 we have to perform maintenance in the UCloud data center. During the maintenance operation we will most likely have to shut down the u1-madlab, u2-madlab and u2-gpu machines for a couple of hours. Running jobs will be terminated.Dec 9, 08:00 - Dec 9, 10:46
Updates
Maintenance has completed and all machines should now be available again.Dec 9, 10:46The Hippo frontend is accessible again.Dec 9, 10:19Unfortunately we have also been forced to power off the frontend for Hippo, the system will be unavailable for an hour or two.Dec 9, 09:39