Operations
Backups
Backup policies do the remembering for you: which VMs, how often, how many copies to keep, and where they go.
- Incremental forever. After the first full copy, only changed data moves. Nightly backups of large VMs become a matter of minutes.
- Application-consistent. The platform can quiesce the guest before the copy, so databases and journals recover cleanly, not "probably".
- Off-site targets. Send copies to object storage or network shares outside the cluster. Transfers and stored data can be encrypted with keys you control.
- Immutable retention. Targets can enforce write-once storage: for the retention period, nobody — not even an administrator, not ransomware with stolen credentials — can delete or alter a backup.
- Restores go back to the original VM, to a new VM, or to a different cluster entirely from the off-site copy.
Disaster recovery
Backup gets your data back; DR gets your service back. The platform replicates chosen VMs to a second site on a schedule you set. Recovery plans group replicated VMs with a boot order and per-VM network mapping, so "bring up the ERP stack at the DR site" is one action — and, importantly, one you can rehearse:
- Test failover brings the plan up at the DR site in isolation, without touching production or replication. Run it quarterly; keep the report.
- Real failover promotes the replicas and starts the plan when the primary site is genuinely gone.
Upgrades
Platform upgrades are rolling: one node drains and updates while the others carry the load, then the next, until the cluster is uniform. Built-in preflight checks verify cluster health — every node reachable, storage healthy, no conflicting operations — and refuse to start otherwise. You choose the moment of each step or let it run unattended. A version inventory shows exactly what every node runs, so drift is visible at a glance.
Monitoring and alerting
- Built-in metrics for every host, VM and container, with history kept at full resolution short-term and summarised long-term. No external database needed.
- Capacity planning watches growth and projects when you'll cross 80, 90 and 100 percent of CPU, memory or storage — with enough notice to order hardware.
- Alerting evaluates your rules continuously (CPU above X for Y minutes, host offline, storage filling) and delivers to e-mail, chat or any webhook, with cooldowns so one incident doesn't become two hundred messages.
- Bring your own monitoring. A standard metrics endpoint exposes cluster, workload and platform health for Prometheus-compatible collectors; a ready-made Grafana dashboard ships with the platform. Audit and system logs forward to syslog/SIEM.
When things go wrong
- A host dies: HA restarts its VMs on surviving hosts; storage was never only on the dead host. Replace the hardware whenever convenient and re-add it.
- A disk dies: the storage pool re-replicates affected data automatically; you swap the disk at leisure.
- A VM misbehaves: console access works even when the guest's network doesn't; snapshots give you a way back before risky changes.
- You misbehave: the audit trail records who did what and when — also useful for un-doing it.