Skip to main content

VMware memory hog

Last updated on

VMware memory hog is a VMware chaos fault that consumes MEMORY_CONSUMPTION_PERCENTAGE percent of RAM (or MEMORY_CONSUMPTION_MEBIBYTES mebibytes when set) through NUMBER_OF_WORKERS worker processes on the Linux VM VM_NAME for TOTAL_CHAOS_DURATION seconds, then stops the workers. The fault uses VMware Tools (Guest Operations API) to run the stress workload inside the guest as VM_USER_NAME.

Use this fault to test how a workload on a VMware-hosted VM behaves when memory headroom shrinks: whether the OOM killer fires on the right process, whether GC-heavy applications pause, whether vSphere DRS reacts, and whether monitoring detects the saturation within the alerting SLA.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.


Use cases

Run this fault when you want to answer concrete questions like:

  • Memory pressure on a vSphere VM: When RAM utilization climbs, does the OOM killer target the expected process?
  • GC behavior: Does the JVM/CLR pause for an unacceptable duration under memory pressure?
  • DRS reaction: Does vSphere DRS migrate the VM to a host with more headroom?
  • Monitoring fidelity: Do vCenter performance counters and downstream alerts fire inside the alerting SLA?

Prerequisites

  • Kubernetes version: 1.21 or later for the chaos infrastructure cluster.
  • vCenter reachable: The chaos infrastructure can reach GOVC_URL over port 443.
  • VMware Tools running on the guest: Verify with vmware-toolbox-cmd -v inside the VM.
  • Stress binary installed inside the guest: Go to VMware Linux binary installation to install the memory stress prerequisite.
  • vCenter chaos role: GOVC_USERNAME is mapped to the chaos role described in VMware permissions.

Supported environments

PlatformSupport status
Linux VMs hosted on vSphere / vCenter (any distro with VMware Tools)Supported
Linux VMs without VMware ToolsNot supported
Windows VMsNot supported (use VMware Windows memory hog)

Permissions required

On vCenter. Map GOVC_USERNAME to the chaos role described in VMware permissions. For this Advanced fault, the role needs:

  • Virtual machine → Guest operations → Program execution, Modifications, Queries.

On the guest OS. VM_USER_NAME must be able to execute the memory stress binary and pkill.


Authentication

LayerTunables
vCenter (control plane)GOVC_URL, GOVC_USERNAME, GOVC_PASSWORD, GOVC_INSECURE
Guest OS (target VM)VM_USER_NAME, VM_PASSWORD

Store each credential as a text secret in Harness Secret Manager and reference the secret identifier when configuring the experiment.


Fault tunables

Configure the following fault parameters when you add VMware memory hog to an experiment in Chaos Studio. Defaults are shown for reference.

Required parameters

TunableDescriptionDefault
VM_NAMEName of the target VM as it appears in vCenter.(required)
VM_USER_NAMEOS user account on the target VM.(required)
VM_PASSWORDPassword for VM_USER_NAME.(required)

Stress parameters

TunableDescriptionDefault
MEMORY_CONSUMPTION_PERCENTAGEPercentage of guest RAM to consume (0-100). Ignored when MEMORY_CONSUMPTION_MEBIBYTES is set.30
MEMORY_CONSUMPTION_MEBIBYTESMemory to consume in mebibytes. Takes precedence over MEMORY_CONSUMPTION_PERCENTAGE when set.""
NUMBER_OF_WORKERSNumber of worker processes that hold the memory.4

Chaos parameters

TunableDescriptionDefault
TOTAL_CHAOS_DURATIONTotal duration of the fault in seconds.30
CHAOS_INTERVALDelay in seconds between successive iterations when running for more than one cycle.10
SEQUENCEOrder in which multiple targets are stressed: parallel or serial.parallel
RAMP_TIMEWait period in seconds before and after the fault. Go to ramp time to read how it is applied.0

vCenter authentication

TunableDescriptionDefault
GOVC_URLvCenter server URL.""
GOVC_USERNAMEvCenter user mapped to the chaos role.""
GOVC_PASSWORDPassword for GOVC_USERNAME.""
GOVC_INSECURESkip SSL certificate verification when set to true.true

Tunables that apply to every fault are documented in common tunables for all faults.


Fault execution in brief

Authenticates to vCenter, opens a Guest Operations session on VM_NAME as VM_USER_NAME, launches NUMBER_OF_WORKERS memory-stress workers that hold MEMORY_CONSUMPTION_PERCENTAGE percent of RAM (or MEMORY_CONSUMPTION_MEBIBYTES MiB) for TOTAL_CHAOS_DURATION seconds, then stops the workers.


Expected behavior during fault execution

  • Available memory on the target VM drops for the duration.
  • Workloads with high memory pressure may hit GC pauses, swap, or OOM kill.
  • vCenter performance counters (mem.usage.average, mem.swapout.average) reflect the drop.
  • After the duration ends, the workers exit and memory returns to baseline.
When the fault ends

The chaos pod stops the stress workers via Guest Operations. Memory returns to baseline within seconds.

Signals to watch

  • VM memory: Use a Prometheus probe on node_memory_MemAvailable_bytes.
  • Application: Use an HTTP probe and assert error rate stays under threshold.

Verify the fault execution effect

  1. Inspect vCenter Memory performance.

    In vCenter UI, open the VM → Monitor → Performance, switch to Memory view.

  2. SSH and run free -h.

    available should drop during the chaos window.


Recovery and cleanup

  • End of duration: The chaos pod stops the workers via Guest Operations.
  • Abort the experiment: Stopping the experiment from Chaos Studio also stops the workers.
  • Manual recovery: SSH into the VM and sudo pkill -f stress-ng if any workers survived.

Limitations

  • OOM risk: Setting MEMORY_CONSUMPTION_PERCENTAGE close to 100 may OOM-kill critical processes; start conservatively.
  • Swap behavior varies: Guest swap configuration affects behavior; pages may swap instead of OOM.
  • VMware Tools required: Without VMware Tools, the fault cannot run.
  • Single VM per run: Each fault run targets one VM_NAME.

Troubleshooting

VMware memory hog fails with VMware Tools not running in Harness Chaos Engineering

The Guest Operations API requires VMware Tools to be installed and running on the target VM. Install or restart open-vm-tools / VMware Tools on the guest and retry.

VM became unresponsive during memory hog

If MEMORY_CONSUMPTION_PERCENTAGE was very high, the OOM killer may have terminated VMware Tools or SSH. Power-cycle the VM via vCenter (or use ESXi reset) and reduce MEMORY_CONSUMPTION_PERCENTAGE.