Hung sockets, Huh?

  2 mins read
  January 04, 2021
  gdb hack


So… you have a process hung on doing some socket operation (which you don’t want it to do?)

As always here is the story

I was trying to debug a hanging python command. pstack said it was connecting somewhere.

$ pstack 24372
#0  0x00007f3ea2964c10 in __poll_nocancel () from /lib64/libc.so.6
#1  0x00007f3e98717b3c in internal_select_ex.isra.0 () from /usr/lib64/python2.7/lib-dynload/_socketmodule.so
#2  0x00007f3e987183c4 in internal_connect () from /usr/lib64/python2.7/lib-dynload/_socketmodule.so
#3  0x00007f3e9871aff8 in sock_connect () from /usr/lib64/python2.7/lib-dynload/_socketmodule.so
#4  0x00007f3ea364981a in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5  0x00007f3ea364b64d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#6  0x00007f3ea35d4f88 in function_call () from /lib64/libpython2.7.so.1.0
#7  0x00007f3ea35b0073 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#8  0x00007f3e98b3f8e1 in partial_call () from /usr/lib64/python2.7/lib-dynload/_functoolsmodule.so
#9  0x00007f3ea35b0073 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#10 0x00007f3ea3644846 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007f3ea364b64d in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0

Where was it connecting to?

$ lsof -p 24372
...
python  24372 root    4w   REG              253,0    343936 201366877 /var/log/rhsm/rhsm.log
python  24372 root    5u  IPv4          222394982       0t0       TCP xyz.abc.mno:46640->subscription.rhsm.redhat.com:https (SYN_SENT)
...

So it was redhat subscription website. Since internet access was not provided to the box, the proecss was hanging on TCP socket creation (SYN_SENT state)

I never wanted this python process to communicate with redhat website. But somehow it was (Actually the culprit was the yum configuration which enabled a subscription plugin)

Now I disabled the plugin (settings are in /etc/yum/pluginconf.d/subscription-manager.conf) but I didn’t want to restart the process. It could have wasted more time. Is there a way I can interrupt the thread/task?

kill -13 / kill -2 didn’t help. So I went for a gdb approach. Since I already knew the fd of the socket (check the lsof output above) it’s only matter of calling close system call.

$ gdb -p 24372
...
Missing separate debuginfos, use: debuginfo-install python-2.7.5-86.el7.x86_64
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f57a239b740 (LWP 24372) "python" 0x00007f57a11c9c10 in __poll_nocancel () from /lib64/libc.so.6
(gdb) t 1
[Switching to thread 1 (Thread 0x7f57a239b740 (LWP 24372))]
#0  0x00007f57a11c9c10 in __poll_nocancel () from /lib64/libc.so.6
(gdb) call close(5)
$1 = 0
(gdb) quit
A debugging session is active.
...

That was it! socket was closed and process continued (it was trying different alternatives.)