Skip to content

fix a bug in rdma pre-connector

Zihang Yao requested to merge yaozh/wukong:fix-rdma-bug into master

Problem

When the system starts, wukong tried to bind a socket(RDMA pre-connector socket) at a certain port number(19344). After you exit wukong and try to restart it instantly, you will get an error message which shows the address is already in use.

Phenomenon

Get error msg ERROR on binding,Address already in use
wukong: /home/yzh/wukong-mainstream/rdma_lib/pre_connector.hpp:51: static int rdmaio::PreConnector::get_listen_socket(const string&, int): Assertion `false' failed.
[meepo3:42650] *** Process received signal ***   
[meepo3:42650] Signal: Aborted (6)
[meepo3:42650] Signal code:  (-6) 
[meepo3:42650] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0) [0x7f43a99068a0] 
[meepo3:42650] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7f43a9541f47]   
[meepo3:42650] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7f43a95438b1]  
[meepo3:42650] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3042a) [0x7f43a953342a]    
[meepo3:42650] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x304a2) [0x7f43a95334a2] 
[meepo3:42650] [ 5] ../build/wukong(_ZN6rdmaio8RdmaCtrl11recv_threadEPv+0x590) [0x5609452e79c0]
[meepo3:42650] [ 6] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f43a98fb6db]
[meepo3:42650] [ 7] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f43a9624a3f]
[meepo3:42650] *** End of error message ***

Cause

After you exit a process, the socket will enter into a state called TIME_WAIT.

I've run into that same issue as well. It's because you're closing your connection to the socket, but not the socket itself. The socket can enter a TIME_WAIT state (to ensure all data has been transmitted, TCP guarantees delivery if possible) and take up to 4 minutes to release.

stackoverflow

Solution

Add the following lines after create the listenfd and before bind the socket fd to address

int option = 1;
sockfd = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &option, sizeof(option));

Merge request reports