VMware process kill

Last updated on Jun 22, 2026

VMware process kill is a VMware chaos fault that terminates the processes listed in PROCESS_IDS (PIDs) on the Linux VM VM_NAME for TOTAL_CHAOS_DURATION seconds, then waits VERIFICATION_WINDOW seconds to confirm the outcome. Set FORCE=true to send SIGKILL; otherwise the fault sends SIGTERM. The fault uses VMware Tools (Guest Operations API) to act inside the guest as VM_USER_NAME.

Use this fault to test how a workload running on a VMware-hosted VM behaves when a critical process is killed: whether the supervisor (systemd, supervisord, runit) restarts it inside the SLA, whether replicas absorb the load, whether monitoring detects the regression within the alerting SLA, and whether on-call alerts fire correctly.

Run your first experiment

If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.

Use cases

Crash resilience: When a critical PID dies, does the supervisor restart it inside the SLA?
Replica absorption: When one replica's process dies, do peers absorb the traffic inside the SLO budget?
Alert fidelity: Do downstream alerts fire inside the alerting SLA?

Prerequisites

Kubernetes version: 1.21 or later for the chaos infrastructure cluster.
vCenter reachable: The chaos infrastructure can reach GOVC_URL over port 443.
VMware Tools running on the guest: Verify with vmware-toolbox-cmd -v.
Process IDs: You know the PID(s) to kill, or your workload includes a wrapper that reports the PID(s) of supervised processes.
Sudo permissions: VM_USER_NAME can kill the target PID(s) (process owner or root via sudo).
vCenter chaos role: GOVC_USERNAME is mapped to the chaos role per VMware permissions.

Supported environments

Platform	Support status
Linux VMs hosted on vSphere / vCenter (any distro with VMware Tools)	Supported
Linux VMs without VMware Tools	Not supported
Windows VMs	Not supported (use Windows process kill)

Permissions required

On vCenter. Map GOVC_USERNAME to the chaos role described in VMware permissions. The role needs Guest Operations (Program execution, Modifications, Queries).

On the guest OS. VM_USER_NAME must own the target processes or have sudo to kill them.

Authentication

Layer	Tunables
vCenter	`GOVC_URL`, `GOVC_USERNAME`, `GOVC_PASSWORD`, `GOVC_INSECURE`
Guest OS	`VM_USER_NAME`, `VM_PASSWORD`

Store each credential as a text secret in Harness Secret Manager and reference the secret identifier when configuring the experiment.

Fault tunables

Required parameters

Tunable	Description	Default
`VM_NAME`	Name of the target VM as it appears in vCenter.	(required)
`VM_USER_NAME`	OS user account on the target VM.	(required)
`VM_PASSWORD`	Password for `VM_USER_NAME`.	(required)
`PROCESS_IDS`	Comma-separated list of PIDs to kill on the target VM.	(required)

Chaos parameters

Tunable	Description	Default
`FORCE`	If `true`, send `SIGKILL` instead of `SIGTERM`.	`false`
`TOTAL_CHAOS_DURATION`	Total duration of the fault in seconds.	`30`
`VERIFICATION_WINDOW`	Time window in seconds after the kill during which the fault verifies the outcome.	`10`
`RAMP_TIME`	Wait period in seconds before and after the fault. Go to ramp time to read how it is applied.	`0`

vCenter authentication

Tunable	Description	Default
`GOVC_URL`	vCenter server URL.	`""`
`GOVC_USERNAME`	vCenter user mapped to the chaos role.	`""`
`GOVC_PASSWORD`	Password for `GOVC_USERNAME`.	`""`
`GOVC_INSECURE`	Skip SSL certificate verification when set to `true`.	`true`

Tunables that apply to every fault are documented in common tunables for all faults.

Fault execution in brief

Authenticates to vCenter, opens a Guest Operations session on VM_NAME as VM_USER_NAME, sends SIGTERM (or SIGKILL when FORCE=true) to each PID in PROCESS_IDS, waits VERIFICATION_WINDOW seconds, and reports success once every targeted PID is gone.

Expected behavior during fault execution

Each PID in PROCESS_IDS receives the kill signal.
A supervisor (systemd, supervisord, runit) typically respawns the process inside its own restart policy.
Application metrics may dip while the process restarts; replicas may absorb traffic if the workload is clustered.
After the duration ends, the fault exits without further action; supervised processes are expected to be running normally.

When the fault ends

The fault does not restart processes. Recovery depends on the guest's process supervisor or the user's manual intervention.

Signals to watch

Process up: Use a command probe running pgrep -x <name> and assert the process is back inside the SLA.
Workload health: Use an HTTP probe on a user-visible endpoint.

Verify the fault execution effect

SSH into the VM during the chaos window.
```
ps -p <PID>
```
The PID should briefly disappear and be replaced by a new PID for the same command when the supervisor restarts it.
Inspect the supervisor log.
```
journalctl -u <unit> -n 50
```

Recovery and cleanup

Supervised processes: The supervisor restarts them automatically.
Unsupervised processes: Restart them manually (sudo systemctl start <unit> or your own runner).
Abort: Stopping the experiment from Chaos Studio also stops further iterations of the fault.

Limitations

PID-based: The fault targets exact PIDs, not process names. If the workload's PID changes between iterations, you must look up the new PID.
No auto-restart: The fault does not restart killed processes; supervision is the user's responsibility.
VMware Tools required: Without VMware Tools, the fault cannot run.
Single VM per run: Each fault run targets one VM_NAME.

Troubleshooting

VMware process kill fails with no such process in Harness Chaos Engineering

The PIDs in PROCESS_IDS may have changed since you looked them up. SSH into the VM, look up the current PIDs (pgrep <name>), update PROCESS_IDS, and retry.

Killed process did not restart

The fault only kills the process; restart is up to the guest's supervisor. Check journalctl -u <unit> and ensure the unit has Restart=on-failure (or similar) in its systemd unit file.

VMware service stop: Stop a service (which the supervisor manages) instead of killing a PID directly.

Use cases​

Prerequisites​

Supported environments​

Permissions required​

Authentication​

Fault tunables​

Fault execution in brief​

Expected behavior during fault execution​

Signals to watch​

Verify the fault execution effect​

Recovery and cleanup​

Limitations​

Troubleshooting​

Related faults​