See moby/moby#47728 for context.
I've just had the case that the networkDB looked correct, but the packages were still being routed to the wrong node.
Details: Node S, running the source container, has IP 172.17.2.2 Node T, running the target container, has IP 172.16.3.2
The target container has the endpoint IP 192.168.74.111. On both nodes, the following command return the same data:
docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t overlay 2>&1 | grep \\.111
time="2024-10-18T16:18:25Z" level=debug msg="key:c3d4202a63eac0339c97051eaf16464f21e505e63782c5944d5c6ab7a120fd59 value:{EndpointIP:192.168.74.111/24 EndpointMAC:02:42:c0:a8:4a:6f TunnelEndpointIP:172.16.3.2} owner:f05333e5c866"
-> TunnelEndpointIP is correct.
Still, I could verify via tcpdump, that node S tries to send the packages to a different cluster node with IP 172.17.7.2. This shows that the network DB is not the complete truth.
I dug further. For the next bit, it's important to know that the MAC addresses of the container's interfaces are always 02:42:IP:as:hex, so the MAC address of the target container is 02:42:c0:a8:42:6f. Or just get it via nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz arp -n | grep \\.111
Using nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f
I could verify that the forwarding database in fact has the wrong IP 172.17.7.2 as the destination:
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep 6f
02:42:c0:a8:4a:6f dev vxlan0 master br0
02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2 link-netnsid 0 self permanent
I've manually fixed the fdb entry via
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb delete 02:42:c0:a8:4a:6f dev vxlan0 dst 172.17.7.2
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb add 02:42:c0:a8:4a:6f dev vxlan0 dst 172.16.3.2 self permanent
This resulted in the packages now being received by the correct node T, but still not by the container running on it. A few minutes later, it started working without further changes...
On another node, I had a problem that looked like the same at first, but was very different: containers on that node couldn't reach the Docker Swarm Service running on node T.
However, they could access the container of that service via its IP. Just the connection via the service VIP didn't work.
Checking the endpoint table showed nothing wrong:
docker run --net host dockereng/network-diagnostic:onlyclient -port 2000 -v -net ebbnwg5bhz9y9g40r8yum5bdt -t sd 2>&1 | grep mimir-lb
Resulted in the correct entry:
{Name:mimir-lb.1.o2ta5bj935uwseekyddngin78 ServiceName:mimir-lb ServiceID:yt4kbu3b59erq6rgmzbvxrd86 VirtualIP:192.168.74.42 EndpointIP:192.168.74.111 IngressPorts:[] Aliases:[] TaskAliases:[d5a90b876bde] ServiceDisabled:false}
But similar to the other issue, the network DB might not actually be in sync with what was actually configured on the node, so I checked the actual mapping between the VIP and the container:
# nsenter --net=/var/run/docker/netns/lb_ebbnwg5bh iptables -t mangle -L -n -v --line-numbers | grep \\.42
4 7402 444K MARK all -- * * 0.0.0.0/0 192.168.74.42 MARK set 0x1db
Translating the mark to decimal (475) and checking it in the ipvsadm:
# nsenter --net=/var/run/docker/netns/lb_ebbnwg5bh ipvsadm -L -n
...
FWM 475 rr
-> 192.168.74.111:0 Masq 1 5 0
Looks also correct.
The fdb entry for that container is also correct on this node.
To capture the relevant packages, I executed tcpdump -i any udp port 4789 -w out.pcap
on this node and looked for packages from or to the local overlay endpoint to or from the target container's IP.
There was a single outgoing package and some retransmissions, no incoming.
Capturing on node T confirmed: The packages are being received, but no response is being sent. Capturing on node T in the context of the target container showed that no packages are being received.
On the target node, executing nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb show | grep <MAC address of overlay network load balancer on source node
revealed that this entry was incorrect. I corrected it:
# nsenter --net=/var/run/docker/netns/1-ebbnwg5bhz bridge fdb delete 02:42:c0:a8:4a:ea dev vxlan0 dst 172.17.4.3
After the incorrect entry was removed, a correct one was automatically created within a minute or two, resolving the issue
On different nodes, I had issues with the Docker Swarm DNS server:
- On several nodes I got a few to over a dozen IP addresses for a specific service when it should've only returned a single one.
- On a different node, resolution of a specific service failed with SERVFAIL
I checked everything I checked in the previous issues, looked good. I also tried /leavenetwork and /joinnetwork, but to no avail. In fact, that resulted in the only correct IP address to no longer be returned by nslookup.
I ran out of ideas of things to check, so I then restarted Docker Daemon. That fixed it.
I checked everything, but didn't try /leavenetwork and /joinnetwork.
After restarting docker daemin, it was broken even more:
- the task of a global service couldn't start the container, because of this error: network sandbox join failed: subnet sandbox join failed for "192.168.74.0/24": error creating vxlan interface: file exists
docker network inspect monitoring
sometimes failed with "network monitoring not found", then again it worked and then failed again. Looked like it got deleted and recreated in a loopip -d link show type vxlan
showed a DOWN vxlan with the vxland id of the overlay network
Deleting that vxlan interface fixed these issues.