Created by
Darren Boss
on
Title
Control plane networking outage
Summary
The database that underpins the virtual networking in Arbutus is being moved to faster storage which should result in fewer issues that impact the networking for Arbutus. Expected network interruption is ~5-10 min. Full details for the expected impact are below.
Expected Impact During the Change
- Existing VM traffic (data plane): Unaffected during StatefulSet and pod deletion — OVS flow tables are cached in the kernel on each compute node and continue to forward traffic without the OVN control plane
- OVN control plane unavailability (~5–10 min): During pod and PVC deletion and while new pods initialize, no new OVN logical flow updates will be processed
- New VM boots, port creation, network/router operations, floating IP associations, security group changes: Temporarily queued or delayed — Neutron API remains available but changes will not be reflected in the data plane until the sync completes
- Data plane blackout (~15–20 min): Once OVS agents on compute nodes detect loss of the OVN controller connection, cached flows are flushed — existing VM traffic will be interrupted for the duration of the Neutron sync repair
- Sync / restoration:
neutron-ovn-db-sync-util repopulates ~2,400 ports, ~307 routers, ~531 security groups, ~298 floating IPs, and ~313 networks from Percona XtraDB; estimated sync duration is 5–10 minutes based on a dry-run conducted in advance - Estimated total maintenance window: ~20–30 minutes from start to full data plane restoration
Updated by Darren Boss on