Saturday, May 14, 2016

Client connected to itself...Isn't it Strange (effect of TCP Simultaneous Connect)

Yes I observed this strange behavior during one of my work, which I would like to share. The symptom of the issue was that client was getting some unexpected junk response, which server never sent.

Background

While working on a system, where client application has logic to keep reconnecting to server till connection becomes successful. At one point of time it was observed that the client connection has been established successfully even though server has not yet come up. Later when Server was brought up, it failed to start saying "Address already in use". Which gave us hint that client must have connected to any other application running on same address as the server.

Then we saw the result of netstat command at this point of time and below were the result:

    netstat -nA inet | fgrep :32246
    tcp        0      0 127.0.0.1:32246         127.0.0.1:32246         ESTABLISHED

Which shows client has connected to itself (source and destination IP:Port is same). Also there is no additional application listening on 32246 port. Address 127.0.0.1:32246 was the address on which client was trying to connect.

Cause Analysis

In our case we were trying to reconnect to server repetitively and every time it keeps failing as server was still down. As per TCP, every time client tries to re-connect it is assigned one new source port in increasing order from the range of ephemeral ports(Its range is defined in file /proc/sys/net/ipv4/ip_local_port_range). So at one point of time, it may use the same source port as used for destination (if destination port is also in the range of ephemeral ports).
Below was the range of ephemeral port on the machine where issue being observed:

    cat /proc/sys/net/ipv4/ip_local_port_range
    9000    65500

And the server port used in my set-up was 32246. So server port was also in the range of ephemeral ports. So it was possible that at one point of time source port will be same as that of destination port.

Reason

Now you may think that even though client has chosen same port as of server, still how it can connect as no server is listening on that port. This is because of the one of the TCP feature called simultaneous connect documented in RFC793. This allows two clients to connect to each other without anyone entering into listening state. The connection establishment as per this approach is different from the usual 3-way handshake. Here both clients performs an active OPEN as shown in below table:
Table-1: Synchronous Connect
This gets triggered only if two clients are trying to reach each other (In our case though one client is trying to connect to server but since port being same it gets treated as if both are trying to connect to each other and hence simultaneous connect gets triggered.)

Experiment

Execute below command snippet:
while true
do
   telnet 127.0.0.1 32775
done

Even though there is no telnet server running on port 32775 but at some point of time it will succeed.
telnet: connect to address 127.0.0.1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
hello
hello

Solution

Once connection is successful, we can get the port number dynamically being assigned to client and compare the same with destination port. If it matches then we can disconnect the current connection and continue to retry.


    struct sockaddr_in cAddr;
    socklen_t cAddrLen = sizeof(cAddr);

    // Get the current client dynamic information allocated.
    int e = getsockname(fd, (sockaddr *)&cAddr, &cAddrLen);
    if (e != 0)
    {
        close(fd);
    }

Conclusion

Though this is very rare scenario but very much possible. So any application having infinite reconnect logic and there is no authentication info exchange between client and server in order to finalize the connection, then it should take care of this.
In-case of PostgreSQL, client expect authentication request from server once connection is established. So issue related to self-connect will not happen.
But if there is any third party tool working on top of PostgreSQL and they have their own way of connection mechanism, then they should evaluate to check if this issue is applicable to them.

Please share your comment/feedback or if you have some other idea to address this issue.