Bug 1099274 - Crash reporter fails to submit report for e10s crashed tab (test case)
First off, this isn't very easy to reproduce, but I've done it, on this little Surface thing.

The problem seems to be that we're not getting a dumpID set on the nsIPropertyBag2 subject passed to the TabCrashReporter observer for ipc:content-shutdown.

Now why is that? Who sends that notification?

DXR points to ContentParent.cpp: http://hg.mozilla.org/mozilla-central/file/6ce1b906c690/dom/ipc/ContentParent.cpp#l1779

What I need, really, is a debug build that I can breakpoint on this Surface thing. I've never tried to remotely connect to a debug build on Windows before. I guess there's a first time for everything.

Let me read up and see how I can do that with Visual Studio. jimm said this was possible, and I'd actually be really surprised if it isn't - remote debugging isn't exactly revolutionary.

Gahhhhhh - Visual Studio 2010 Express does not get remote debugging. Rats. :/

I wonder if windbg could be used here instead.

Ok, installing Standalone Debugging Tools for Windows, a la these instructions: http://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx

Ok, I have a debug build on this Surface, and windbg installed and running. This thing feels awful. I feel like I'm banging rocks together.

How the hell do I set a breakpoint in this thing?

Ah, Alt-F9 opens up some kind of breakpoint interface… mozilla::dom::ContentParent::ActorDestroy is where I want to set my breakpoint. Let's see if that works.

Huh, so, I can set breakpoints, but I can't hit them. windbg complains about not being able to resolve the symbols. So I guess I need to somehow load the symbols for this build into windbg.

How the hell do I do that?

*sigh*. I couldn't find symbols for the build I got.

Feelin' like this (courtesy of dolske):


Ok, I've found a build, got some symbols, and now I'm hooking up windbg. Fingers crossed I can actually debug my stuff now.

Aaaaand I need the source as well. Damn it.

Downloading… waiting...

Unzipping… time to decompress - 55 minutes. WTF Oh, ok, now 28 minutes. I forgot how Windows measured time.

AH GEEZ, 1 hour 50 minutes now. What the hell.

And now 10 minutes. What a rollercoaster.

GUHHHHHH - finally unzipped, and I still can't hit breakpoints. windbg complains about "Unable to resolve unqualified symbol in Bp expression mozilla::dom::TabParent::TabParent"… maybe I'm doing this wrong. :/

I think I'm going to have to get some more serious hardware here.

OK! I've got a debug environment on this little Surface, and we're breakpointing. Finally.

So it looks like crashReporter->ChildDumpID() is returning the empty string… why is that?

Because we're straight up not creating a minidump! That's the problem here.

Apparently, we're reaching this "we're screwed" stuff?

It looks like the ProcessHandle in PContentParent is getting overwritten with null from ContentParent::KillHard...

Also looks like Toplevel->TakeMinidump is failing because nsExceptionHandler's GetEnabled is returning false (because gExceptionHandler is nullptr… wtf?)

Why is gExceptionHandler nullptr?

OH FOR PETE'S SAKE. The crash reporter is disabled in debug builds - no wonder this doesn't work.

I need to enable it with MOZ_CRASHREPORTER in debug builds.

Ok, even with the crash reporter enabled, we still can't seem to find the mOtherProcess.

In fact… yeah, the plugin-process is *flat out gone* by the time we start debugging. Even if I attach a debugger to it, it looks like there are certain types of crashes that make it shut down *immediately*.

We're calling KillHard because we get a ProcessingError...

>     xul.dll!mozilla::dom::ContentParent::KillHard() Line 2984     C++
xul.dll!mozilla::dom::ContentParent::ProcessingError(mozilla::ipc::HasResultCodes::Result what) Line 1605     C++
xul.dll!mozilla::dom::PContentParent::OnProcessingError(mozilla::ipc::HasResultCodes::Result aCode) Line 6129     C++
xul.dll!mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result code, const IPC::Message & aMsg, const char * channelName) Line 1475     C++
xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(const IPC::Message & aMsg) Line 1107     C++
xul.dll!mozilla::ipc::MessageChannel::DispatchMessageW(const IPC::Message & aMsg) Line 1047     C++
xul.dll!mozilla::ipc::MessageChannel::OnMaybeDequeueOne() Line 1034     C++
xul.dll!DispatchToMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void)>(mozilla::ipc::MessageChannel * obj, bool (void) * method, const Tuple0 & arg) Line 384     C++
xul.dll!RunnableMethod<mozilla::ipc::MessageChannel,bool (__thiscall mozilla::ipc::MessageChannel::*)(void),Tuple0>::Run() Line 307     C++
xul.dll!mozilla::ipc::MessageChannel::RefCountedTask::Run() Line 403     C++
xul.dll!mozilla::ipc::MessageChannel::DequeueTask::Run() Line 420     C++
xul.dll!MessageLoop::RunTask(Task * task) Line 362     C++
xul.dll!MessageLoop::DeferOrRunPendingTask(const MessageLoop::PendingTask & pending_task) Line 372     C++
xul.dll!MessageLoop::DoWork() Line 447     C++
xul.dll!mozilla::ipc::DoWorkRunnable::Run() Line 234     C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 830     C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 265     C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 99     C++
xul.dll!MessageLoop::RunInternal() Line 234     C++
xul.dll!MessageLoop::RunHandler() Line 227     C++
xul.dll!MessageLoop::Run() Line 201     C++
xul.dll!nsBaseAppShell::Run() Line 166     C++
xul.dll!nsAppShell::Run() Line 178     C++
xul.dll!nsAppStartup::Run() Line 281     C++
xul.dll!XREMain::XRE_mainRun() Line 4150     C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4226     C++
xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4446     C++
firefox.exe!do_main(int argc, char * * argv, nsIFile * xreDirectory) Line 287     C++
firefox.exe!NS_internal_main(int argc, char * * argv) Line 656     C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 113     C++
[External Code]
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]

Looks like a message with name PDocAccessible::Msg_ShowEvent is returning something that the parent can't process. Because the parent can't process it, it flips out and calls KillHard on it. When we call KillHard on it, we clear out our notion of the process ID, which prevents us from getting the crash report. KillHard, if I had to guess, also likely kills the child before it has a chance to dump its crashreport.

If I get rid of the KillHard, then we never "crash".

So that's what's going on - total infanticide - the parent is receiving a message from the child that it can't process, so it just kills the child, so there's no crash report.

Ah, and billm has identified this as a dupe of bug 1068349 .

Aaaand closed.