Yes I observed this strange behavior during one
of my work, which I would like to share. The symptom of the issue was that
client was getting some unexpected junk response, which server never sent.
Background
While working on a system, where client
application has logic to keep reconnecting to server till connection becomes
successful. At one point of time it was observed that the client connection has
been established successfully even though server has not yet come up. Later
when Server was brought up, it failed to start saying "Address already in
use". Which gave us hint that client must have connected to any other
application running on same address as the server.
Then we saw the result of netstat command at
this point of time and below were the result:
netstat -nA inet | fgrep :32246
tcp 0 0 127.0.0.1:32246 127.0.0.1:32246 ESTABLISHED
Which shows client has
connected to itself (source and destination IP:Port is same). Also there is no
additional application listening on 32246 port. Address 127.0.0.1:32246 was the
address on which client was trying to connect.
Cause Analysis
In our case we were trying to reconnect to
server repetitively and every time it keeps failing as server was still down.
As per TCP, every time client tries to re-connect it is assigned one new source
port in increasing order from the range of ephemeral ports(Its range is defined
in file /proc/sys/net/ipv4/ip_local_port_range). So at one point of time, it
may use the same source port as used for destination (if destination port is
also in the range of ephemeral ports).
Below was the range of ephemeral port on the
machine where issue being observed:
cat
/proc/sys/net/ipv4/ip_local_port_range
9000 65500
And the server port used
in my set-up was 32246. So server port was also in the range of ephemeral
ports. So it was possible that at one point of time source port will be same as
that of destination port.
Reason
Now you may think that even though client has chosen same port as of
server, still how it can connect as no server is listening on that port. This
is because of the one of the TCP feature called simultaneous connect
documented in RFC793.
This allows two clients to connect to each other without anyone entering into
listening state. The connection establishment as per this approach is different
from the usual 3-way handshake. Here both clients performs an active OPEN as
shown in below table:
This gets triggered only if two clients are trying to reach each other
(In our case though one client is trying to connect to server but since port
being same it gets treated as if both are trying to connect to each other and
hence simultaneous connect gets triggered.)
Experiment
Execute below command snippet:
while true
do
telnet
127.0.0.1 32775
done
Even though there is no telnet server running on
port 32775 but at some point of time it will succeed.
telnet: connect to address 127.0.0.1: Connection
refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection
refused
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
hello
helloSolution
Once connection is successful, we can get the
port number dynamically being assigned to client and compare the same with
destination port. If it matches then we can disconnect the current connection
and continue to retry.
struct
sockaddr_in cAddr;
socklen_t cAddrLen = sizeof(cAddr);
// Get
the current client dynamic information allocated.
int e
= getsockname(fd, (sockaddr *)&cAddr, &cAddrLen);
if (e
!= 0)
{
close(fd);
}
Conclusion
Though this is very rare scenario but very much
possible. So any application having infinite reconnect logic and there is no
authentication info exchange between client and server in order to finalize the
connection, then it should take care of this.
In-case of PostgreSQL, client expect
authentication request from server once connection is established. So issue
related to self-connect will not happen.
But if there is any third
party tool working on top of PostgreSQL and they have their own way of
connection mechanism, then they should evaluate to check if this issue is
applicable to them.
Please share your comment/feedback or if you
have some other idea to address this issue.