Operations

Backups

Backup policies do the remembering for you: which VMs, how often, how many copies to keep, and where they go.

Incremental forever. After the first full copy, only changed data moves. Nightly backups of large VMs become a matter of minutes.
Application-consistent. The platform can quiesce the guest before the copy, so databases and journals recover cleanly, not "probably".
Off-site targets. Send copies to object storage or network shares outside the cluster. Transfers and stored data can be encrypted with keys you control.
Immutable retention. Targets can enforce write-once storage: for the retention period, nobody — not even an administrator, not ransomware with stolen credentials — can delete or alter a backup.
Restores go back to the original VM, to a new VM, or to a different cluster entirely from the off-site copy.

Disaster recovery

Backup gets your data back; DR gets your service back. The platform replicates chosen VMs to a second site, groups them into recovery plans with a boot order and per-VM network mapping, lets you rehearse the recovery without touching production, and keeps immutable copies for ransomware recovery. It has its own guide: Disaster recovery →

Upgrades

Platform upgrades are rolling: one node drains and updates while the others carry the load, then the next, until the cluster is uniform. Built-in preflight checks verify cluster health — every node reachable, storage healthy, no conflicting operations — and refuse to start otherwise. You choose the moment of each step or let it run unattended. A version inventory shows exactly what every node runs, so drift is visible at a glance.

Monitoring and alerting

Built-in metrics for every host, VM and container, with history kept at full resolution short-term and summarised long-term. No external database needed.
Capacity planning watches growth and projects when you'll cross 80, 90 and 100 percent of CPU, memory or storage — with enough notice to order hardware.
Alerting evaluates your rules continuously (CPU above X for Y minutes, host offline, storage filling) and delivers to e-mail, chat or any webhook, with cooldowns so one incident doesn't become two hundred messages.
Bring your own monitoring. A standard metrics endpoint exposes cluster, workload and platform health for Prometheus-compatible collectors; a ready-made Grafana dashboard ships with the platform. Audit and system logs forward to syslog/SIEM.

When things go wrong

A host dies: HA restarts its VMs on surviving hosts; storage was never only on the dead host. Replace the hardware whenever convenient and re-add it.
A disk dies: the storage pool re-replicates affected data automatically; you swap the disk at leisure.
A VM misbehaves: console access works even when the guest's network doesn't; snapshots give you a way back before risky changes.
You misbehave: the audit trail records who did what and when — also useful for un-doing it.