VMware IO stress
VMware IO stress is a VMware chaos fault that drives disk IO on the Linux VM VM_NAME by running NUMBER_OF_WORKERS IO worker processes that each consume FILESYSTEM_UTILIZATION_PERCENTAGE percent of available filesystem (or FILESYSTEM_UTILIZATION_BYTES GB when set) for TOTAL_CHAOS_DURATION seconds, then stops the workers. The fault uses VMware Tools (Guest Operations API) to run the stress workload inside the guest as VM_USER_NAME.
Use this fault to test how a workload on a VMware-hosted VM behaves when storage throughput is saturated: whether IO latency stays inside the SLA, whether databases queue writes correctly, whether vSphere DRS reacts to datastore latency, and whether monitoring detects the saturation within the alerting SLA.
If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.
Use cases
Run this fault when you want to answer concrete questions like:
- IO pressure on a vSphere VM: When disk throughput saturates, does application latency stay inside the SLA?
- Database resilience: Does the database queue writes correctly when fsync latency spikes?
- Datastore impact: Do co-tenant VMs on the same datastore degrade?
- Monitoring fidelity: Do vCenter datastore counters fire alerts inside the SLA?
Prerequisites
- Kubernetes version: 1.21 or later for the chaos infrastructure cluster.
- vCenter reachable: The chaos infrastructure can reach
GOVC_URLover port 443. - VMware Tools running on the guest: Verify with
vmware-toolbox-cmd -vinside the VM. - Stress binary installed inside the guest: Go to VMware Linux binary installation to install the IO stress prerequisite.
- Free space in the target filesystem: The guest filesystem has enough free space to absorb
FILESYSTEM_UTILIZATION_BYTESorFILESYSTEM_UTILIZATION_PERCENTAGEworth of writes. - vCenter chaos role:
GOVC_USERNAMEis mapped to the chaos role described in VMware permissions.
Supported environments
| Platform | Support status |
|---|---|
| Linux VMs hosted on vSphere / vCenter (any distro with VMware Tools) | Supported |
| Linux VMs without VMware Tools | Not supported |
| Windows VMs | Not supported (use VMware Windows disk stress) |
Permissions required
On vCenter. Map GOVC_USERNAME to the chaos role described in VMware permissions. The role needs:
- Virtual machine → Guest operations → Program execution, Modifications, Queries.
On the guest OS. VM_USER_NAME must be able to execute the IO stress binary and write into the working directory.
Authentication
| Layer | Tunables |
|---|---|
| vCenter | GOVC_URL, GOVC_USERNAME, GOVC_PASSWORD, GOVC_INSECURE |
| Guest OS | VM_USER_NAME, VM_PASSWORD |
Store each credential as a text secret in Harness Secret Manager and reference the secret identifier when configuring the experiment.
Fault tunables
Configure the following fault parameters when you add VMware IO stress to an experiment in Chaos Studio. Defaults are shown for reference.
Required parameters
| Tunable | Description | Default |
|---|---|---|
VM_NAME | Name of the target VM as it appears in vCenter. | (required) |
VM_USER_NAME | OS user account on the target VM. | (required) |
VM_PASSWORD | Password for VM_USER_NAME. | (required) |
Stress parameters
| Tunable | Description | Default |
|---|---|---|
FILESYSTEM_UTILIZATION_PERCENTAGE | Percentage of available filesystem to write to. Ignored when FILESYSTEM_UTILIZATION_BYTES is set. | 10 |
FILESYSTEM_UTILIZATION_BYTES | Amount of data to write in GB. Takes precedence over FILESYSTEM_UTILIZATION_PERCENTAGE when set. | "" |
NUMBER_OF_WORKERS | Number of IO worker processes. | 4 |
VOLUME_MOUNT_PATH | Filesystem path on the guest where the workers write. | "" (defaults to the working dir of the stress process) |
Chaos parameters
| Tunable | Description | Default |
|---|---|---|
TOTAL_CHAOS_DURATION | Total duration of the fault in seconds. | 30 |
CHAOS_INTERVAL | Delay in seconds between iterations. | 10 |
SEQUENCE | Order in which multiple targets are stressed: parallel or serial. | parallel |
RAMP_TIME | Wait period in seconds before and after the fault. Go to ramp time to read how it is applied. | 0 |
vCenter authentication
| Tunable | Description | Default |
|---|---|---|
GOVC_URL | vCenter server URL. | "" |
GOVC_USERNAME | vCenter user mapped to the chaos role. | "" |
GOVC_PASSWORD | Password for GOVC_USERNAME. | "" |
GOVC_INSECURE | Skip SSL certificate verification when set to true. | true |
Tunables that apply to every fault are documented in common tunables for all faults.
Fault execution in brief
Authenticates to vCenter, opens a Guest Operations session on VM_NAME as VM_USER_NAME, launches NUMBER_OF_WORKERS IO workers that write FILESYSTEM_UTILIZATION_PERCENTAGE percent of the filesystem (or FILESYSTEM_UTILIZATION_BYTES GB) under VOLUME_MOUNT_PATH for TOTAL_CHAOS_DURATION seconds, then stops the workers and removes the scratch files.
Expected behavior during fault execution
- Disk read/write throughput on
VM_NAMEsaturates the underlying datastore. - Application IO latency may rise.
- vCenter datastore counters (
datastore.totalWriteLatency.average,datastore.numberWriteAveraged.average) reflect the activity. - After the duration ends, the workers exit and IO returns to baseline.
The chaos pod stops the IO workers via Guest Operations and removes the scratch files. IO latency and datastore activity return to baseline within seconds.
Signals to watch
- Disk IO: Use a Prometheus probe on
node_disk_io_time_seconds_totalfrom a node exporter inside the VM. - Application: Use an HTTP probe and assert latency stays inside the SLA.
Verify the fault execution effect
-
Inspect vCenter datastore performance.
In vCenter UI, open Datastore → Monitor → Performance for the datastore backing the VM.
-
SSH and run
iostat -x 1.Throughput on the target disk should spike during the chaos window.
Recovery and cleanup
- End of duration: The chaos pod stops the workers and removes scratch files via Guest Operations.
- Abort the experiment: Stopping the experiment also stops the workers.
- Manual recovery: SSH and
sudo pkill -f stress-ng, thenrm -rfany leftover scratch files underVOLUME_MOUNT_PATH.
Limitations
- Disk full risk: Setting
FILESYSTEM_UTILIZATION_PERCENTAGEclose to 100 can fill the filesystem before the fault ends. - VMware Tools required: Without VMware Tools, the fault cannot run.
- Datastore contention: Stressing IO can affect co-tenant VMs on the same datastore.
- Single VM per run: Each fault run targets one
VM_NAME.
Troubleshooting
VMware IO stress fails with no space left on device in Harness Chaos Engineering
The chosen VOLUME_MOUNT_PATH does not have enough free space for FILESYSTEM_UTILIZATION_BYTES or FILESYSTEM_UTILIZATION_PERCENTAGE. Use df -h inside the guest to check free space, then either point VOLUME_MOUNT_PATH at a roomier filesystem or reduce the request.
VMware IO stress fails with VMware Tools not running
The Guest Operations API requires VMware Tools to be installed and running on the target VM. Install or restart open-vm-tools / VMware Tools on the guest and retry.
Related faults
- VMware CPU hog: Stress CPU instead of disk IO.
- VMware memory hog: Stress memory instead of disk IO.