What is asymmetric routing?
Asymmetric routing is a situation in which packets take one path to go from source to destination, but replies take a different path to return. Notice I called it a โsituationโ and not an โissueโ? Thatโs because itโs not always a problem. It only becomes a problem where thereโs something stateful in the path, like a NAT device or a firewall. Stateful devices expect the path that traffic takes between two devices to be the same in both directions (the โsymmetryโ part).
Quick review: whatโs a โstatefulโ firewall?
Stateful firewalls, as the name suggests, do more than just filter packets according to rules. Putting it simply, they are able to monitor all the aspects of traffic, checking for patterns, and taking action when something looks out of the ordinary.
In this case, that means that when a firewall sees the first packets in a new session, it creates an entry in an internal table specifically for that session. Assuming that all the packets pass the appropriate policy and inspection rules on the firewall, the firewall knows that itโs OK to forward these packets. When the session is done, the firewall removes the entry from this table.
This ensures that the destination canโt send arbitrary packets back to the source (a vector for malicious attacks). The only packets that are allowed through are ones that match the entry in the session table. Itโs basically guaranteeing that you can connect to a website on the Internet, but that website canโt connect back to you unexpectedly.
What does asymmetric routing look like?
Consider the following network diagram.
The workstation at the bottom sends a packet to the server along the green path, and it goes through the firewall on the left. But, because of a routing issue in the network, the serverโs response follows the red path back, taking it through the firewall on the right. And that, friends, is asymmetric routing.
This is actually a pretty common scenario. Suppose, for example, you have a pair of VPN firewalls connecting your office network to a cloud provider (AWS or Azure, for example) for redundancy. You might not even think of these firewalls as firewalls, because their primary function is terminating the VPNs.
So unless youโre very careful with your routing, you could easily develop an asymmetric routing issue. And in the case of a VPN to your cloud provider, all the IP addresses will be private, just like the example.
Also, if youโve configured any kind of multipath load balancing between these networks, there are four possible scenarios:
- The up and down paths both go through the firewall on the left
- They both go through the firewall on the right
- The up could go through the left and down through the right
- The up could go through the right and the down through the left
So, half of your sessions will have up and down paths using different firewalls.
This is important because what youโll see in practice is even more confusing than the asymmetric routing issue example that weโll be digging into. Thatโs because half of the sessions will work, and the other half will fail.
How do we fix asymmetric routing issues?
Intermittent and inconsistent asymmetric routing issues problems are always hard to find. To get around that situation, weโre going to construct an appropriate filter using Wireshark that just finds the non-working session.
To start, letโs focus on just one session using a filter like this:
ip.addr==10.10.10.55 && tcp.port==59574
This filter selects all packets with the specified IP address (either source or destination) and the specified TCP port (again, either in the source or the destination).
Test #1: PING
Now letโs do a few experiments and look at the results in Wireshark. First, weโll PING the server from the workstation.
This looks completely normal. The source device, 192.168.100.10 is sending PING packets to 10.10.10.55, and the destination device is responding. From this capture, it looks like we have good routing and good connectivity between the source and destination networks. Moving on.
Test #2: HTTPS
Next, letโs try to establish an HTTPS session.
That’s not working. Looks like weโve found a problem.
Letโs get forensic
The first questionโwhy does PING work?
This is where things can get a little confusing because not every protocol is stateful. In particular, ICMP, the protocol that carries PING packets, is not stateful. Itโs common to simply allow all PING and PING responses through the firewall, particularly for internal traffic.
Looking back at our network diagram, the PING request from 192.168.100.10 to 10.10.10.55 goes to the firewall on the left. The firewall creates a session table entry for this session and waits for the reply traffic. However, the reply packet comes back through the firewall on the right. The one on the right is configured to simply allow all ICMP packets, which is common for internal firewalls. The second firewall also creates a session table entry and forwards the packet back to the original source.
There are other stateless protocols that will behave similarly. For example, DNS and NTP will work perfectly well in this network, despite the asymmetric forwarding because these protocols both use UDP as their transport.
Now, look at the packet capture for the HTTPS session. The source device (the client) sends a TCP SYN packet (packet number 3). The firewall on the left creates a session table entry, forwards the packet, and waits for other packets that are part of the same session to come along. The SYN packet reaches the webserver at 10.10.10.55, and it responds with a TCP SYN ACK packet (packet number 5).
But remember, this packet is getting forwarded along the other path. In this case, the firewall on the right shouldnโt have created a session table entry and forwarded the packet because this packet has its ACK flag set, so itโs not actually the first packet of the session. But itโs not unusual for firewalls to just blindly forward anything with a SYN flag.
To summarize what weโve seen so far:
- The SYN packet sent by the client reached the server
- The SYN ACK sent by the server in response reached the client
- The client sent the third part of the standard TCP 3-way handshake, an ACK. We see this ACK packet in the trace as packet number 6
The client device then tries to start the HTTPS TLS session on top of this TCP session, and it fails. We see the โclient helloโ in packet number 7. Then we see a lot of โPSH ACKโ messages, which indicate that the client device is desperately trying to get this session started, but not seeing the responses that it expects.
At this point, many people looking at this trace will guess that thereโs something wrong with the serverโs HTTPS configuration. Maybe thereโs a problem with the certificate, or maybe the webserver process isnโt communicating properly with the network stack. These are good guesses because the trace shows everythingโs apparently working up until the point of establishing the TLS session.
But if we look at the trace a little more closely, we see packet number 20. The server is retransmitting the SYN ACK. Why would it retransmit this packet? Itโs retransmitting because it never saw the third packet of the 3-way handshake (packet number 6).
Why didnโt it see packet number 6? Remember the firewall on the left is forwarding packets from the client to the server. This firewall is keeping track of the session. It wants to see a SYN from client to server, followed by a SYN ACK from server to client, and then an ACK from client to server. But the SYN ACK packet went along the other path. So, from the firewallโs point of view, the ACK sent from client to server was wrong. Thatโs not part of the session initiation. So the firewall dropped this packet.
Finally, I want to show you something else from this same packet capture. Notice that, in the previous image, I included a filter:
ip.addr==10.10.10.55 && tcp.port==59574
I did this so we could just see one session. But I actually looked at 2 sessions at the same time.
Now you have to be careful because itโs easy to confuse the two sessions. And in a real network, there are probably dozens of sessions going on at the same time. In the noise of all these packets, itโs even easier to miss that tell-tale packet number 20.
Conclusions
Thereโs a few good lessons in this. The first is that just because PING works doesnโt mean your network is routing traffic correctly.
The second lesson is that when youโre looking at a packet trace, donโt stop at the point where things appear to have broken. In this case, packet number 20, far past the point of breakdown, was the critical clue that packet number 6 was not delivered. You canโt assume that the packet was received just because the packetโs in your trace.
And the third lesson is to be careful about your filters. If youโre looking at a particular TCP session, make sure to use a filter that shows you just the session that youโre interested in.
Being able to visualize your network connections in real-time is a great way to better understand how assets are connected, and spot possible routing issues that need troubleshooting as weโve outlined here. Want to experience what deep network visibility can do for your network management? Try Auvik FREE for 14 days.
Your Guide to Selling Managed Network Services
Get templates for network assessment reports, presentations, pricing & moreโdesigned just for MSPs.
Hi,
The question is now, how to fix the asymmetric routing if detcted?
On my firewall i can see many packets get dropped per second for reasons like:
First packet isn’t SYN with flags like: PUSH-ACK
I mean how to find out the way that packets take in and out when using only one firewall
Thank you for this.. well done and highly informative blog post
lots of devs are oblivious to networking details like that (“”the guts””) but with devops, you get immersed more and more
It has to be easier than this.
I realize that you posted this a year ago, but what are the solutions to asymmetric routing issues? We have a SonicWALL firewall, and it has options on it’s interfaces to handle asymmetric routing, but when I enabled them, it didn’t help.
However, when I disabled the firewalls “Enable TCP sequence number randomization” option, the traffic all works as expected.
Why would that fix an asymmetric route issue?
So, where’s the fix? ๐