Bug 1134252 - [e10s] PBrowserChild::SendGetRenderFrameInfo fails with 'message was deserialized, but contained an illegal value'
Okie dokie, here we go.


So something is going wrong in the messaging between PBrowserChild and PBrowserParent, involving the SendGetRenderFrameInfo message.

'(msgtype=0x200002,name=PBrowser::Msg_PRenderFrameConstructor) Value error: message was deserialized, but contained an illegal value'

Actually, it seems like it's the Msg_PRenderFrameConstructor message.  Huh.

The child sends this message, and the parent flips out and kills the child.

Ok, in TabChild::ProvideWindowCommon, we've got this:

PRenderFrameChild* renderFrame = newChild -> SendPRenderFrameConstructor();
newChild -> SendGetRenderFrameInfo( renderFrame ,
& textureFactoryIdentifier ,
& layersId );

That SendPRenderFrameConstructor bit is, I think async, but that next line is sync.

That first bit fails, and so we kill the child, which is stuck waiting for results on the second.

So something is going wrong in that construction there.

Hypothesis

TabParent::AllocPRenderFrameParent is returning nullptr, which causes the child abort.

Test: Alter ContentParent::AllocPScreenManagerParent so that it returns a nullptr, to see if we get a similar message.

Result: WE DO. I think it's highly likely that this is the problem here...

Hypothesis

There is a window of time between TabParent::RecvCreateWindow and TabParent::AllocPRenderFrameParent where either:

  1. The window has already been destroyed by the time we received TabParent::AllocPRenderFrameParent
  2. The frame element can't seem to get resolved to a frame loader.

I think (1) is probably our best bet - suppose we're opening a new popup window… and the parent runs TabParent::RecvCreateWindow, and the window starts to be constructed, and that's all good, and then we send the result back down to the child, which starts to respond immediately and sends a sync message up to the parent… but maybe within that time, the user closed the window? Is that possible?

Let's see if we can trigger it! I'll open windows in the content process, and then close them in the parent as soon as they open.

Hrm. No luck reproducing just yet. :/

So, I think I need to think about the lifetime of RenderFrameParent, and what occurs if allocating the RFP in the parent fails.

Here's what I'm thinking...

Instead of returning nullptr in AllocPRenderFrameParent, always return the RenderFrameParent, even if success was false… but if success was false, then in the child, we'll realize that we can't get a layer ID, and then we'll delete the RenderFrameParent/Child pair. And then… well, we're in kind of a weird state then.

And how do I test this is working correctly? Well… I can make it so that we always fail to create a RenderFrameParent correctly. This will probably be interesting.

Let's see what happens!

billm likes it, and he's assuaged my fears - if the mFrameLoader doesn't exist, we are screwwwwwed - and the cases where this probably happens is when the tab has gone away. The frameloader can't not exist when still being set-up - it's only in teardown when this can occur.

So this is probably occurring when a tab is opened and closed extremely quickly.

So let's land this puppy!