Updates


Event Date Summary

Some configuration changes have helped stabalize /scratch a bit more. So we will resolve this issue.

The Cedar /scratch file system has been more stable since the changes made Friday evening. There's still one server we need to continue to monitor closely, but overall, we’re hoping that there will be fewer performance and increased reliability issues with it moving forward.  If you continue to see issues with /scratch file please report them.

Work has completed and we are now monitoring

We are in the process of applying some configuration changes to systems serving the /scratch file systems.  This work is expected to take roughly one hour and will generate a few short file system outages while this activity takes place.  We will provide an update once the work has completed.

Good morning. Staff continue to work on diagnosing the file systems issues that users have been experiencing. We will provide more updates as they become available.

Diagnostics are still ongoing and we will be running tests throughout the night to try and gather more information to determine the root cause of the storage server slowdowns.  We appreciate your patience while we work through the situation.

We continue to try and locate the root cause for some of the issues with storage.  We thank you for your patience while we work through the various systems.

We’ve identified a performance issue with one of our file servers and are actively working to resolve it. We will update the ticket every hour or as changes occur.


Incident description

Service Incident status Start Date End Date
Cedar Closed No closed date
Created by James Peltier on

Title


Filesystem problem - Problème de système de fichiers


Summary


We've recently identified issues with the file system performance and stability of our storage servers, which has caused intermittent delays and hangs. Our team is actively addressing the problem and working toward a solution.

As the storage hardware is reaching the end of its lifecycle, we are seeing an increase in error rates. This may lead to temporary disruptions while we transition to a new and more reliable system. We understand this can be frustrating, and we appreciate your patience during this time. Our goal is to minimize any impact on your work, and we’ll keep you updated as we progress with the hardware replacement.

If you experience issues lasting longer than one hour and this outage notice hasn’t been updated, please feel free to submit a support request so we can investigate promptly.

Thank you for your continued support as we work to improve the system.


Updated by James Peltier on