VMWare Cilium Internode Communication Broken
Note
This is not official documentation for AutomationSuite
Issue Description
Cilium inter-node communication drops randomly ( even though cilium seems fine at first glance ) causing nodes to not communicate with other nodes. Cilium endpoints seem up, but different application pods crash.
Root Cause
VMWare Hardware Cilium and Calico compatibility issues
Resolution
Disable checksum offloading on below network interfaces
sudo ethtool -K ens192 tx-checksum-ip-generic off
sudo ethtool -K cilium_host tx-checksum-ip-generic off
Key Obversations when we encounter this issue
- Random calls will fail
- kubectl commands will be very slow