qemu/io
Bin Meng 23f77f05f2 io/channel-watch: Fix socket watch on Windows
Random failure was observed when running qtests on Windows due to
"Broken pipe" detected by qmp_fd_receive(). What happened is that
the qtest executable sends testing data over a socket to the QEMU
under test but no response is received. The errno of the recv()
call from the qtest executable indicates ETIMEOUT, due to the qmp
chardev's tcp_chr_read() is never called to receive testing data
hence no response is sent to the other side.

tcp_chr_read() is registered as the callback of the socket watch
GSource. The reason of the callback not being called by glib, is
that the source check fails to indicate the source is ready. There
are two socket watch sources created to monitor the same socket
event object from the char-socket backend in update_ioc_handlers().
During the source check phase, qio_channel_socket_source_check()
calls WSAEnumNetworkEvents() to discover occurrences of network
events for the indicated socket, clear internal network event records,
and reset the event object. Testing shows that if we don't reset the
event object by not passing the event handle to WSAEnumNetworkEvents()
the symptom goes away and qtest runs very stably.

It seems we don't need to call WSAEnumNetworkEvents() at all, as we
don't parse the result of WSANETWORKEVENTS returned from this API.
We use select() to poll the socket status. Fix this instability by
dropping the WSAEnumNetworkEvents() call.

Some side notes:

During the testing, I removed the following codes in update_ioc_handlers():

  remove_hup_source(s);
  s->hup_source = qio_channel_create_watch(s->ioc, G_IO_HUP);
  g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
                        chr, NULL);
  g_source_attach(s->hup_source, chr->gcontext);

and such change also makes the symptom go away.

And if I moved the above codes to the beginning, before the call to
io_add_watch_poll(), the symptom also goes away.

It seems two sources watching on the same socket event object is
the key that leads to the instability. The order of adding a source
watch seems to also play a role but I can't explain why.
Hopefully a Windows and glib expert could explain this behavior.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2022-10-26 13:32:08 +01:00
..
channel-buffer.c QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
channel-command.c io/command: implement support for win32 2022-10-12 19:22:01 +04:00
channel-file.c QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
channel-null.c io: add a QIOChannelNull equivalent to /dev/null 2022-06-22 18:11:21 +01:00
channel-socket.c QIOChannelSocket: Add support for MSG_ZEROCOPY + IPV6 2022-08-05 16:18:15 +01:00
channel-tls.c QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
channel-util.c
channel-watch.c io/channel-watch: Fix socket watch on Windows 2022-10-26 13:32:08 +01:00
channel-websock.c io/channel-websock: Replace strlen(const_str) by sizeof(const_str) - 1 2022-09-22 16:38:28 +01:00
channel.c QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
dns-resolver.c build-sys: add HAVE_IPPROTO_MPTCP 2021-09-30 15:30:25 +02:00
meson.build io: add a QIOChannelNull equivalent to /dev/null 2022-06-22 18:11:21 +01:00
net-listener.c io/net-listener: Call the notifier during finalize 2021-06-08 19:36:17 +01:00
task.c
trace-events io: add a QIOChannelNull equivalent to /dev/null 2022-06-22 18:11:21 +01:00
trace.h