We have observed that GDB would randomly trip the following
assertion failure when debugging on Windows. When allowing
the program to run until the inferior exits, we occasionally see:
(gdb) cont
Continuing.
[Thread 48192.0xd100 exited with code 1]
[Thread 48192.0x10ad8 exited with code 1]
[Thread 48192.0x36e28 exited with code 0]
[Thread 48192.0x52be4 exited with code 0]
[Thread 48192.0x5aa40 exited with code 0]
../../src/gdb/thread.c:453: internal-error: void delete_thread_1(thread_inf
o*, bool): Assertion `thr != nullptr' failed.
Running the same scenario with some additional traces enabled...
(gdb) set verbose
(gdb) set debugevents
... allows us to understand what the issue is. To understand, we need
to first look at the events received when starting the program, and
in particular which threads got created how. First, we get a
CREATE_PROCESS_DEBUG_EVENT for tid=0x442a8:
gdb: kernel event for pid=317536 tid=0x442a8 code=CREATE_PROCESS_DEBUG_EVENT)
Shortly after, we get some CREATE_THREAD_DEBUG_EVENT events,
one of them being for tid=0x4010c:
gdb: kernel event for pid=317536 tid=0x4010c code=CREATE_THREAD_DEBUG_EVENT)
Fast forward a bit of debugging, and we do a "cont" as above,
at which point the programs reaches the end, and the system reports
"exit" events. The first interesting one is the following:
gdb: kernel event for pid=317536 tid=0x442a8 code=EXIT_THREAD_DEBUG_EVENT)
This is reporting a thread-exit event for a thread whose tid
is the TID of what we call the "main thread". That's the thread
that was created when we received the CREATE_PROCESS_DEBUG_EVENT
notification, and whose TID is actually stored in a global variable
named main_thread_id. This is not something we expected, as
the assumption we made was that the main thread would exit last,
and we would be notified of it via an EXIT_PROCESS_DEBUG_EVENT.
But apparently, this is not always true, at least on Windows Server
2012 and 2016 where this issue has been observed happening randomly.
The consequence of the above notification is that we call
windows_delete_thread for that thread, which removes it from
our list of known threads.
And a little bit later, then we then get the EXIT_PROCESS_DEBUG_EVENT,
and we can see that the associated tid is not the main_thread_id,
but rather the tid of one of the threads that was created during
the lifetime of the program, in this case tid=0x4010c:
gdb: kernel event for pid=317536 tid=0x4010c code=EXIT_PROCESS_DEBUG_EVENT)
And the debug trace printed right after shows why we're crashing:
[Deleting Thread 317536.0x442a8]
We are trying to delete the thread whose tid=0x442a8, which is
the main_thread_id! As we have already deleted that thread before,
the search for it returns a nullptr, which then trips the assertion
check in delete_thread_1.
This commit fixes this issue. It ignores the open question of
what to do with the main_thread_id global, particularly after
that thread has been removed from our list of threads. This will
be dealt with as a separate patch, to allow cherry-picking
this patch into a release branch.
For now, we fix the code so as to avoid this crash.
gdb/ChangeLog:
* windows-nat.c (get_windows_debug_event) <EXIT_PROCESS_DEBUG_EVENT>:
Use current_event.dwThreadId instead of main_thread_id.
+2019-04-30 Joel Brobecker <brobecker@adacore.com>
+
+ * windows-nat.c (get_windows_debug_event) <EXIT_PROCESS_DEBUG_EVENT>:
+ Use current_event.dwThreadId instead of main_thread_id.
+
2019-04-30 Tom Tromey <tromey@adacore.com>
* ada-lang.c (ada_lookup_simple_minsyms): New function.
else if (saw_create == 1)
{
windows_delete_thread (ptid_t (current_event.dwProcessId, 0,
- main_thread_id),
+ current_event.dwThreadId),
0, true /* main_thread_p */);
ourstatus->kind = TARGET_WAITKIND_EXITED;
ourstatus->value.integer = current_event.u.ExitProcess.dwExitCode;
- thread_id = main_thread_id;
+ thread_id = current_event.dwThreadId;
}
break;