Bug 1047603 - [e10s] Non-remote tabs in e10s windows do not handle target="_blank" or window.open links properly.
(originally [e10s] Contents does not load when I click a link in sidebar panel)

STR:
  1. Bookmark a page that has links on it that use target="_blank", or window.open.
  2. Edit that bookmark, and choose to open it in the sidebar
  3. Open the bookmark in the sidebar
  4. Click on any of those links
The links don't work properly… new windows don't open (I get NS_ERROR_FAILURE from window.open), and new tabs don't load any content.

This seems like maybe it's not an M2…

I've asked blassey if we should take this off the M2 list. He said I we should renom and triage on Thursday.

It turns out that we're bad at opening links from non-remote tabs in e10s windows. My guess is that we're trying to pass a TabOpener for the non-remote tab, and that's going to break when we try to get the process for that TabOpener… that's the theory, anyhow.

Let me make sure…

windowCreator2->CreateChromeWindow2

^-- that's failing.

Specifically, because we're failing this assertion:

NS_ENSURE_STATE(xulWin->mPrimaryContentShell);

aOpeningTab is, correctly, nullptr, so my "accidentally passing some tab opener" theory is bunk.

So who normally sets mPrimaryContentShell?

Ah hah, so here's what's happening:

For popups:

The popup is trying to open, but it doesn't have a tab opener, so we get ready to open it not in a remote browser. However, the updateBrowserRemoteness stuff gets called in browser.js on delayed startup, which determines that yes, based on the URI, this thing we're opening should be in a remote tab… so it sets remote on it, which conflicts with the UseRemoteTabs assumptions built into the new window.

Waaaait - that shouldn't be the case. The new window should have useRemoteTabs set to false… so gMultiProcessBrowser should be false. Right?

Ah, interesting… it's not. gMultiProcessBrowser is being set to true on the popup that's opening even though the opening browser is not remote. Is that because the _root_ docshell of the opener says that we're loading remote tabs?

I think that's the case. We look at the XULParent (so that's the e10s enabled browser.xul window), and dig into its load context to see whether or not to use remote tabs. And we say "yes".

So there are a few things to fix:

1) Sidebar browser in e10s windows should be remote.
2) Find if there are any about: pages that use target="_blank" or window.open… if so, we need to handle the case where we open those tabs within the e10s window. If not, we might need some plan if such things ever get added before those about: pages get remotable.

This is not M2 so I'm not going to lean on it just yet.

And whooaaaaaa this is suddenly a high priority again. This is blocking bug 1063680 which is now a BFD.

Why is this blocking? Well, because we have quite a few blacklisted URI's that don't get remote browsers with e10s (most about: pages, for example, and chrome: URLs, at least for now).

I seem to recall having an interesting discussion with smaug about this, and I seem to recall writing a tiny little patch about it too… where is that stuff?

Ah, yes, I still have a commit on my tree about it. Let's see what I did… oh, ok, I just made it so that remoteTabs is only set to true on the load context of a window if we have browser.tabs.remote.autostart set to true - and, if there is a parent context, only if the parent context has UseRemoteTabs and an opening tab.

So even with the patch, what happens when I try to open a tab from a non-remote tab in an e10s window is that a new remote tab opens, and then a new _window_ opens with the link. Time to figure out why that is.

So by the time we even get to my patch in nsAppShellService::JustCreateTopWindow, we've already decided to open a new window. Why?

nsWindowWatcher::OpenWindowInternal must decide that it's the right thing to do. Where does it determine that?

Ah, ok, so here's the problem:

In nsBrowserAccess, we find out that we're opening a new tab, and handle that in openURI. We find out that we want to open the link in a new tab, which is great. We load up a new tab, which happens to be about:blank (remote), and then attempt to return the contentWindow of that browser - which is 100% null. So that fails out, and nsWindowWatcher responds by attempting to open a new window instead, which is does successfully.

Perhaps we can make it so that openURI only ever opens non-remote tabs. Who even calls openURI? That's nsContentTreeOwner::ProvideWindow…

In this comment , Mossop is concerned that if about:preferences opens a new window, all tabs in that window will be non-remote.

I suppose the constraint is only important if the new window needs a handle on the opener. If it's a clean-break, then we probably can go ahead and be remote.

Let's check to see what the current patch does and see if we can modify it to do what I just described.

I'll also comment in the bug to that effect.

Hm. So, back to basics - it seems like any time you open a window from some other window, either via _blank or window.open, the resulting window always has a handle back to the opener with window.opener. Interesting.

Hrm. From billm:

(In reply to Bill McCloskey (:billm) from comment #11)
> Comment on attachment 8485883
> Make it possible to load new windows from non-remote browsers within an e10s
> window. r=?
>
> Review of attachment 8485883:
> -----------------------------------------------------------------
>
> ::: browser/base/content/browser.js
> @@ +4138,5 @@
> >                                        referrerURI: referrer,
> >                                        fromExternal: aIsExternal,
> >                                        inBackground: loadInBackground});
> >      let browser = win.gBrowser.getBrowserForTab(tab);
> > +    win.gBrowser.updateBrowserRemoteness(browser, false);
>
> This doesn't seem right to me. We can get here from openURIInFrame, which is
> called whenever the child process asks to open a new tab:
>
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#440
>
> We definitely don't want to open all new tabs non-remotely.
>
> I haven't looked at this code for a while though. Maybe I'm missing
> something?


Ah, ok, so I put that line in the wrong function. I should have put it in openURI, where it's implied that the browser coming in is non-remote (should put in an assertion to make sure aOpener is not a CPOW).

I talked to billm about this. He's cool with the idea of putting this code in the openURI function instead, and maybe asserting that aOpener is not a CPOW.

Ok… I have a patch, but I think I should probably write tests for this.

Mossop gave feedback+. Feels good.

Patch with tests are now up.

r+ on everything, but it looks like my test is failing on at least Linux:

63 ERROR TEST-UNEXPECTED-TIMEOUT | chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug1047603.js | application timed out after 330 seconds with no output

64 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug1047603.js | application terminated with exit code 6

PROCESS-CRASH | chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug1047603.js | application crashed [@ linux-gate.so + 0x424]


:(

I've got a bc1 pass on Windows 8 opt…  and Windows XP… I should kick off a build on Linux. If it's Linux only, maybe I'll just OHHHH - I know why. I think… if we try to open an e10s window on Linux without OMTC, all sortsa shit breaks - so we display a prompt, which blocks, and of course this test won't work. AH HAH. Ok. Let me test to confirm.

Uh huh, that's exactly what's happening. Landed fix on a closed tree thanks to dholbert.

Fingers crossed!

https://treeherder.mozilla.org/ui/#/jobs?repo=fx-team&revision=1e0e069b5cc7

SHIT. This caused bug 1067164. What the hell. Backed out.

Ok, so this solution didn't take into account the possibility that other callers of nsIBrowserDOMWindow::OpenURI actually pass a URI and expect the browser to go to that URI! So what we did was tell a new tab to open at a URI, and then immediately set it to be non-remote. This kills the browser content loading.

Alternatives:
  1. Make openURI only set ensureNonRemote to true if a URI hasn't been passed in.
  2. Make use of aContext - only set ensureNonRemote to true is aContext is not external.
Or, how about we use aOpener and aContext?

Like… if aOpener && aContext is OPEN_NEW (so, internal), then ensure that the tab we open is straight-up not remote. Otherwise, leave it alone.

I think I can make that work - and then I should write a test to make sure I don't cause bug 1067164 again. Or, wait, does this test already do that?: http://hg.mozilla.org/mozilla-central/file/426497473505/browser/base/content/test/general/browser_bug562649.js

Let's try it. Yes - this test fails in e10s mode with my patch, once I get rid of the isBusy thing.

Now, let's try my solution to see if it fixes the test, and if it does… well, now we've got a stew goin'.

Ok, new patches up. Let's see what folks think.

Try build: https://tbpl.mozilla.org/?tree=Try&rev=e4c1f5922dd5

AH.

So, bittersweet story: I've been barking up the wrong tree and trying to solve a problem that doesn't need to be solved. That doesn't happen too, too often - but it's kinda got that bittersweetness where I go "Nice, I don't have to spend time solving this anymore", but also "Damn it, I spent all that time trying to solve this.".

Here's the story: Mossop is working on a separate bug ( bug 999239 ) that makes it so that transitions in and out of remote browsers don't make us lose history. While he's doing that, his patch also adds assurances so that every URL that is browsed to (regardless of _how_ it is browsed to), gets properly shunted to the right process.

Up until now, we've been doing our detection for when to use remote / non-remote browsers mostly in tabbrowser.xml - we see that the user is requesting a URL for one of our browsers at a particular URL, we check against our blacklist, and if it's blacklisted, we flip the browser to be non-remote.

Mossop's patch goes deeper into DocShell, and makes it so that we check the URL before the load begins at a really deep level - regardless of how the page started to be loaded. That's great, because it means that if we're at about:crashes (the canonical example for this bug), and we click on a link to take us to crash stats - we will transition from a non-remote to a remote browser!

This is great news.

So we're still busted if we (for some reason) try to open a tab from an about: page (and why aren't we using openUILinkIn there?), and opening popups / new tabs from the bookmarks sidebar browser. So Mossop has downgraded the severity / priority.

(MONTHS PASS)

And we're back! This has been biting us in the butt because, surprise, there are a ton of places that open links in the front-end.

Check out a few of them.

So it's time to bang this out, once and for all. LET'S DO THIS.


Our test case will be bug 1060346 - The Learn More link in about:privatebrowsing in a private window opens a blank tab in e10s.

STR:

1) With e10s enabled, open a private browsing window
2) Click on the Learn More link in the page that the first private browsing window shows you

Ba boom. We're in business - I can pos-a-lootly reproduce this.

So, some thoughts here:

If we have a non-remote browser in an e10s window… I think it's a pretty safe assumption that it's an "unusual" browser, like the sidebar browser, or a non-remote iframe in some chrome, or one of our blacklisted about: pages. In that case, I think it's likely OK that the opened tab be remote, and have no notion of the opener.

So perhaps the rule should be: if we're an e10s window, and a non-remote browser attempts to open a new tab, make the new tab's browser be remote by default.

That's the theory, but I believe the problem is that the consumer of nsBrowserAccess's openURI is being called by something that expects a nsIDOMWindow** back (nsContentTreeOwner::ProvideWindow).

Perhaps we can modify the caller of ProvideWindow (nsWindowWatcher)… have it use OpenURIInFrame instead? The problem with that is that OpenURIInFrame returns an nsIFrameLoaderOwner, and not an nsIDOMWindow.

Hrm. So this old blog post has some notes about how we go from clicking on a link in a non-remote browser to nsContentTreeOwner::ProvideWindow. Maybe that can give me some guidance.

Guh… it looks like the code that hand link-clicking over to nsWindowWatcher is expecting a nsIDOMWindow back…

nsILinkHandler::OnLinkClickSync expects an nsIDocShell to be returned, which is no good if we're opening a remote browser because we can't get at the nsIDocShell in the remote process. Damn it!

So let me look at what my old solution was, way back in the day before this bug was deprioritized...

Ah, heh - this is interesting. I apply the patches, and the new tab loads properly (though shows a perma spinner), and it's remote!

It looks like Mossop's work in bug 999239 is actually hijacking my attempt to force this browser to be non-remote. And so we go from remote, to non-remote, back to remote, and it works again.

That sounds crazy inefficient. It's time for me to figure out how Mossop's patch works in more detail.

Wow - it really looks like Mossop's patch is what we want to use here. Looks like it solves the exact problem we're hitting here, but now I have to find out why it's not actually solving it.

Oh! This is great! The reason why Mossop's solution wasn't working for all of these cases was because the tree owner in the case of some chrome documents having links is nsChromeTreeOwner. Mossop's patch do_GetInterface's the tree owner for nsIWebBrowserChrome3, which nsContentTreeOwner implements. I just need nsChromeTreeOwner to forward the GetInterface of nsIWebBrowserChrome3 over to its XULWindow, and then add some GetInterface glue in nsXULWindow to get to the nsContentTreeOwner.

So now I've got this puzzle - when I click on a link in about:privatebrowsing in a private window, I end up responding in a docshell that has a nsChromeTreeOwner as its nsIDocShellTreeOwner. When I click on a link in a about:crashes in a private window, I end up getting a docshell that has a nsContentTreeOwner as its nsIDocShellTreeOwner. Why is that?

What is the structure of this invisible tree?

It looks like nsXULWindow owns a nsChromeTreeOwner, and two nsContentTreeOwners. One is just called mContentTreeOwner, and one is called mPrimaryContentTreeOwner.

What is the difference between mContentTreeOwner and mPrimaryContentTreeOwner?

In nsXULWindow::ContentShellAdded we have an argument to the method bool aPrimary, and depending on what that value is, we assign mPrimaryContentTreeOwner as the owner of the new nsIDocShellTreeItem (if primary is true) or mContentTreeOwner as the owner of the new nsIDocShellTreeItem (if primary is false).

Who calls nsXULWindow::ContentShellAdded, and what makes them pass in aPrimary as true or false?

It looks like an nsIDocShellTreeItem can be primary or not primary. If they're primary, their owner is mPrimaryContentTreeOwner, and if not primary, it's mContentTreeOwner.

Ok, here's what I think is going on. The relationship seems to be about whether or not the current document being displayed is primary - and by primary, I mean can be accessed from the global window as window.content (like, it's got the content-primary value for type set on the browser).

https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Attribute/browser.type
https://developer.mozilla.org/en-US/docs/Web/API/window.content

Experiment:

If I make the links in about:crashes do target="_blank", do they get the same nsIDocShellTreeOwner as when I click on the Learn More link in about:privatebrowsing?

So I've noticed that I re-enter nsDocShell::InternalLoad because we're opening what seems to be a new window when I click on a link in a non-remote tab in a e10s browser window that targets _blank. That seems really weird. Is that normal behaviour?

That is not normal behaviour - it is true that we hit nsDocShell::InternalLoad twice in a row before the first time can exit because the new tab is probably trying to load something. My theory is about:blank, but I could be wrong.

The difference is that with a non-e10s window, we don't attempt to open a new window. There's no trace of such acts on the stack.

Stack when in a non-e10s window (good):

#0     0x0000000105494f53 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9387
#1     0x0000000105493960 in nsDocShell::LoadURI(nsIURI*, nsIDocShellLoadInfo*, unsigned int, bool) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:1632
#2     0x0000000105498c0e in _ZThn424_N10nsDocShell7LoadURIEP6nsIURIP19nsIDocShellLoadInfojb at /Users/mikeconley/Projects/mozilla-central/obj-debug/docshell/base/Unified_cpp_docshell_base0.cpp:1635
#3     0x0000000102ed396b in nsFrameLoader::ReallyStartLoadingInternal() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:461
#4     0x0000000102ed2da9 in nsFrameLoader::ReallyStartLoading() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:311
#5     0x0000000102e5258b in nsDocument::MaybeInitializeFinalizeFrameLoaders() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsDocument.cpp:7305
#6     0x0000000102e52357 in nsDocument::EndUpdate(unsigned int) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsDocument.cpp:4899
#7     0x0000000104899a95 in mozilla::dom::XULDocument::EndUpdate(unsigned int) at /Users/mikeconley/Projects/mozilla-central/dom/xul/XULDocument.cpp:3243
#8     0x0000000102c4640c in mozAutoDocUpdate::~mozAutoDocUpdate() at /Users/mikeconley/Projects/mozilla-central/dom/base/mozAutoDocUpdate.h:38
#9     0x0000000102c3d2a5 in mozAutoDocUpdate::~mozAutoDocUpdate() at /Users/mikeconley/Projects/mozilla-central/dom/base/mozAutoDocUpdate.h:36
#10     0x0000000102effe9b in nsINode::ReplaceOrInsertBefore(bool, nsINode*, nsINode*, mozilla::ErrorResult&) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsINode.cpp:2216
#11     0x0000000102d6a184 in nsINode::InsertBefore(nsINode&, nsINode*, mozilla::ErrorResult&) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsINode.h:1600
#12     0x0000000102d6a1c2 in nsINode::AppendChild(nsINode&, mozilla::ErrorResult&) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsINode.h:1604
#13     0x000000010374eae7 in mozilla::dom::NodeBinding::appendChild(JSContext*, JS::Handle<JSObject*>, nsINode*, JSJitMethodCallArgs const&) at /Users/mikeconley/Projects/mozilla-central/obj-debug/dom/bindings/./NodeBinding.cpp:598
#14     0x0000000103c4cd90 in mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) at /Users/mikeconley/Projects/mozilla-central/dom/bindings/BindingUtils.cpp:2484
#15     0x000000010716d714 in js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) at /Users/mikeconley/Projects/mozilla-central/js/src/jscntxtinlines.h:229
#16     0x00000001070ec5e9 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) at /Users/mikeconley/Projects/mozilla-central/js/src/vm/Interpreter.cpp:502
#17     0x000000010710cf67 in Interpret(JSContext*, js::RunState&) at /Users/mikeconley/Projects/mozilla-central/js/src/vm/Interpreter.cpp:2560
#18     0x000000010710076a in js::RunScript(JSContext*, js::RunState&) at /Users/mikeconley/Projects/mozilla-central/js/src/vm/Interpreter.cpp:452
#19     0x00000001070ec709 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) at /Users/mikeconley/Projects/mozilla-central/js/src/vm/Interpreter.cpp:521
#20     0x00000001070d53f2 in js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) at /Users/mikeconley/Projects/mozilla-central/js/src/vm/Interpreter.cpp:558
#21     0x0000000106ea584f in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) at /Users/mikeconley/Projects/mozilla-central/js/src/jsapi.cpp:4547
#22     0x00000001024866c8 in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) at /Users/mikeconley/Projects/mozilla-central/js/xpconnect/src/XPCWrappedJSClass.cpp:1205
#23     0x00000001024807d9 in nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) at /Users/mikeconley/Projects/mozilla-central/js/xpconnect/src/XPCWrappedJS.cpp:532
#24     0x000000010172cf39 in PrepareAndDispatch at /Users/mikeconley/Projects/mozilla-central/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_darwin.cpp:122
#25     0x000000010172b99b in SharedStub ()
#26     0x000000010556f19a in nsContentTreeOwner::ProvideWindow(nsIDOMWindow*, unsigned int, bool, bool, bool, nsIURI*, nsAString_internal const&, nsACString_internal const&, bool*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsContentTreeOwner.cpp:922
#27     0x000000010556f310 in _ZThn32_N18nsContentTreeOwner13ProvideWindowEP12nsIDOMWindowjbbbP6nsIURIRK18nsAString_internalRK19nsACString_internalPbPS1_ at /Users/mikeconley/Projects/mozilla-central/obj-debug/xpfe/appshell/Unified_cpp_xpfe_appshell0.cpp:925
#28     0x00000001054f82dd in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsITabParent*, nsIArray*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/embedding/components/windowwatcher/nsWindowWatcher.cpp:631
#29     0x00000001054fa4cc in nsWindowWatcher::OpenWindow2(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsITabParent*, nsISupports*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/embedding/components/windowwatcher/nsWindowWatcher.cpp:420
#30     0x00000001054fa5ce in _ZThn8_N15nsWindowWatcher11OpenWindow2EP12nsIDOMWindowPKcS3_S3_bbbP12nsITabParentP11nsISupportsPS1_ at /Users/mikeconley/Projects/mozilla-central/obj-debug/embedding/components/windowwatcher/Unified_cpp_windowwatcher0.cpp:421
#31     0x0000000102c87b08 in nsGlobalWindow::OpenInternal(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, bool, bool, bool, bool, bool, nsIArray*, nsISupports*, nsIPrincipal*, JSContext*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:11824
#32     0x0000000102c884f2 in nsGlobalWindow::OpenNoNavigate(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7690
#33     0x0000000102c88547 in _ZThn32_N14nsGlobalWindow14OpenNoNavigateERK18nsAString_internalS2_S2_PP12nsIDOMWindow at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7694
#34     0x0000000105495ff7 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9597


Stack when in an e10s window (bad):

#0     0x0000000105494f53 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9387
#1     0x0000000105493960 in nsDocShell::LoadURI(nsIURI*, nsIDocShellLoadInfo*, unsigned int, bool) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:1632
#2     0x00000001054a9d4d in nsDocShell::LoadURIWithBase(char16_t const*, unsigned int, nsIURI*, nsIInputStream*, nsIInputStream*, nsIURI*) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:4796
#3     0x00000001054a93c1 in nsDocShell::LoadURI(char16_t const*, unsigned int, nsIURI*, nsIInputStream*, nsIInputStream*) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:4669
#4     0x00000001054a9e4d in _ZThn432_N10nsDocShell7LoadURIEPKDsjP6nsIURIP14nsIInputStreamS5_ at /Users/mikeconley/Projects/mozilla-central/obj-debug/docshell/base/Unified_cpp_docshell_base0.cpp:4671
#5     0x00000001055612b6 in nsWebShellWindow::Initialize(nsIXULWindow*, nsIXULWindow*, nsIURI*, int, int, bool, nsITabParent*, nsWidgetInitData&) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsWebShellWindow.cpp:231
#6     0x000000010555df2e in nsAppShellService::JustCreateTopWindow(nsIXULWindow*, nsIURI*, unsigned int, int, int, bool, nsITabParent*, nsWebShellWindow**) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsAppShellService.cpp:602
#7     0x000000010555e80b in nsAppShellService::CreateTopLevelWindow(nsIXULWindow*, nsIURI*, unsigned int, int, int, nsITabParent*, nsIXULWindow**) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsAppShellService.cpp:193
#8     0x000000010557f130 in nsXULWindow::CreateNewContentWindow(int, nsITabParent*, nsIXULWindow**) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsXULWindow.cpp:1793
#9     0x000000010557eb6a in nsXULWindow::CreateNewWindow(int, nsITabParent*, nsIXULWindow**) at /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsXULWindow.cpp:1731
#10     0x000000010557ebbd in _ZThn16_N11nsXULWindow15CreateNewWindowEiP12nsITabParentPP12nsIXULWindow at /Users/mikeconley/Projects/mozilla-central/obj-debug/xpfe/appshell/Unified_cpp_xpfe_appshell0.cpp:1732
#11     0x000000010589d10e in nsAppStartup::CreateChromeWindow2(nsIWebBrowserChrome*, unsigned int, unsigned int, nsIURI*, nsITabParent*, bool*, nsIWebBrowserChrome**) at /Users/mikeconley/Projects/mozilla-central/toolkit/components/startup/nsAppStartup.cpp:658
#12     0x000000010589d44b in _ZThn8_N12nsAppStartup19CreateChromeWindow2EP19nsIWebBrowserChromejjP6nsIURIP12nsITabParentPbPS1_ at /Users/mikeconley/Projects/mozilla-central/obj-debug/toolkit/components/startup/Unified_cpp_components_startup0.cpp:688
#13     0x00000001054f89ab in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsITabParent*, nsIArray*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/embedding/components/windowwatcher/nsWindowWatcher.cpp:732
#14     0x00000001054fa4cc in nsWindowWatcher::OpenWindow2(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsITabParent*, nsISupports*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/embedding/components/windowwatcher/nsWindowWatcher.cpp:420
#15     0x00000001054fa5ce in _ZThn8_N15nsWindowWatcher11OpenWindow2EP12nsIDOMWindowPKcS3_S3_bbbP12nsITabParentP11nsISupportsPS1_ at /Users/mikeconley/Projects/mozilla-central/obj-debug/embedding/components/windowwatcher/Unified_cpp_windowwatcher0.cpp:421
#16     0x0000000102c87b08 in nsGlobalWindow::OpenInternal(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, bool, bool, bool, bool, bool, nsIArray*, nsISupports*, nsIPrincipal*, JSContext*, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:11824
#17     0x0000000102c884f2 in nsGlobalWindow::OpenNoNavigate(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsIDOMWindow**) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7690
#18     0x0000000102c88547 in _ZThn32_N14nsGlobalWindow14OpenNoNavigateERK18nsAString_internalS2_S2_PP12nsIDOMWindow at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7694
#19     0x0000000105495ff7 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9597

I … think? Something in ProvideWindow is breaking down...

Ok, now we're on to something. Something goes on in the bad case in nsWindowWatcher::OpenWindowInternal that is different in the bad case. I need to understand it.

I think the problem is here:

bool newWindowShouldBeModal = false ;
bool parentIsModal = false ;
if (!newDocShellItem) {
windowIsNew =
true ;
isNewToplevelWindow =
true ;


The call to CreateChromeWindow2 happens inside this block. I don't think we get there in the good case because we are successful in creating a newDocShellItem.

Let's test that hypothesis.

It is true that newDocShellItem is not nullptr at that point for the good case. So my hypothesis has been validated. Excellent! Now we need to find out where newDocShellItem gets defined for the good case, and find out why it doesn't get defined for the bad case.

This bit is what defines newDocShellItem for the good case:

rv = provider->ProvideWindow(aParent, chromeFlags, aCalledFromJS,
sizeSpec.PositionSpecified(),
sizeSpec.SizeSpecified(),
uriToLoad, name, features, &windowIsNew,
getter_AddRefs(newWindow));

if (NS_SUCCEEDED(rv)) {
GetWindowTreeItem(newWindow, getter_AddRefs(newDocShellItem));


It looks like ProvideWindow is returning nullptr for both the new window, and we're getting nullptr for GetWindowTreeItem. Excellent! I'll be those are related. Let's figure out why ProvideWindow is returning nullptr for newWindow.

ProvideWindow is actually implemented in nsContentTreeOwner.

Fuuuuu… so now I know why it's not working. I should have known. nsContentTreeOwner::ProvideWindow is calling OpenURI on the nsBrowserAccess assigned in browser.js, and that's not returning an nsIDOMWindow** properly.

Wow, what a tour. I've gone right around this whole thing and ended up where I started, but I at least have a better understanding of it.

So here's the story - we don't have enough time to hit Mossop's hooks. The DocShell hasn't even been created by the time we decide to open a new tab. The hook gets hit afterwards, once the new tab attempts to start loading some URI.

My solution is to add a forceNotRemote attribute to loadOneTab/addTab that will cause the new tab to open so that the browser is initially not remote. Then, when the non-remote browser starts to load some URI, Mossop's hook will kick in and, if appropriate, switch the browser to be remote.

I'm making an assumption here - I'm assuming that since nsIBrowserDOMWindow::OpenURI is supposed to return an nsIDOMWindow that any caller to it is going to want this kind of behaviour, so I'm making it so that openURI passes an argument to _openURIInNewTab that sets forceNotRemote to true.

Ok, so that's written up, and I actually have an old test that I can retry here.

The old test assumed that anything new that was opened from a non-remote tab would also be non-remote. This isn't true anymore - since the test opens up about:home, which is not blacklisted, the new tab can be remote. I've adjusted the test so that the new tab is confirmed loaded and is remote.

There is a problem, however, with new windows. New windows, which are opened when browser.link.open_newwindow is 2 for target="_blank" links, aren't showing up. Something's going wrong. I need to find out what.

Ah, ok - so the first problem was that (like early in this note), nsXULWindow was failure an NS_ENSURE_STATE check looking for mPrimaryContentWindow on a XULWindow because it didn't think we were opening a gMultiProcessBrowser window. I can fix that by making sure we check the UseRemoteTabs method on the LoadContext of the new window.

Now I've got a new problem… the page loads, but it's loading the document in the chrome window docshell, so the whole browser shows the webpage! No browser chrome - I mean we've got web content all over the place:



That's the whole browser window. That's some bad news bears right there.

The problem.. ohh boy. It looks like the problem is here, in OpenWindowInternal (again), right after we open the new window:

if (newChrome) {
/* It might be a chrome nsXULWindow, in which case it won't have
an nsIDOMWindow (primary content shell). But in that case, it'll
be able to hand over an nsIDocShellTreeItem directly. */

nsCOMPtr<nsIDOMWindow> newWindow(do_GetInterface(newChrome));
if (newWindow)
GetWindowTreeItem(newWindow, getter_AddRefs(newDocShellItem));
if (!newDocShellItem)
newDocShellItem = do_GetInterface(newChrome);
if (!newDocShellItem)
rv = NS_ERROR_FAILURE;
}

So newChrome is defined, and we GetInterface an nsIDOMWindow from newChrome. The problem is that newChrome doesn't GetInterface us to an nsIDOMWindow properly, and so we end up getting the DocShell for the entire chrome window and setting that as the newDocShellItem. Oh snap!

The reason that we can't get an nsIDOMWindow from newChrome is because at this point when the window is spawning, we've destroyed the mPrimaryContentShell because the onLoad handler in browser.js has set the remoteness of the lone browser to be  true. This destroys the content shell. The GetInterface defined on nsXULWindow is depending on a mPrimaryContentShell existing in order to get that nsIDOMWindow.

So the problem, it seems, is that we've set the remoteness of the new window browser too early for the nsWindowWatcher to respond correctly.

Straight-up not setting the remoteness of that first browser in onLoad does not solve the problem. It moves it instead:

[Parent 31672] WARNING: NS_ENSURE_TRUE(xulWin->mDocShell->GetOpenedRemote()) failed: file /Users/mikeconley/Projects/mozilla-central/xpfe/appshell/nsXULWindow.cpp, line 1812

Garrrrrrrrrrr. The problem here is that window.open from the content process is counting on creating that first remote browser in a new window to be synchronous. That's why we set remoteness in the onLoad handler (which runs in a nested event loop when the XUL window is opened).

Basically, I seem to need to differentiate between a window that has been opened from the content process (which needs remoteness to be set synchronously), or from the parent process, where remoteness will be set when the DocShell starts to load some URI.

I could (ab)use the "remote" argument passed to windows - meaning that if "remote" is passed, that means that the first browser in the window is forced to be remote while loading. Otherwise, the first browser will start non-remote (but might get updated once it starts to load some URI).

I think I need to talk to smaug. Let me try to formulate this into a good question for him and ask him tomorrow.

Context: It looks like the simplest solution for bug 1047603 is to make it so that when we open a new tab, that we force it to initialize as a non-remote browser. The DocShell hook that Mossop added in bug 999239 will notice it browsing to a non-blacklisted URL, and will then flip the remoteness of the browser.

The problem is when we have browser.link.open_newwindow set to 2. In that setting, we open a new window instead of a new tab. browser.js, in the onLoad method of gBrowserInit (which runs when a new window is opened) notices if we've been set to load remote tabs in a window from the load context. Then, it finds the first browser and updates the remoteness of it to true. That way, when we click on a window.open link, we're able to return the PBrowser of the first created remote window back to the caller.When we have browser.link.open_newwindow set to 2, in order to make use of Mossop's patch, the initial browser needs to be non-remote. If we don't, then when nsWindowWatcher::OpenWindowInternal returns from opening the window and attempts to GetInterface the nsIDOMWindow of the newly created window, it doesn't find one, and instead gives back the root chrome DocShell to the caller of OpenWindowInternal, which causes webcontent to display in the root window. Youch!

NEEDINFO TO SMAUG:

So this last patch seems to fix things for the case where we're opening a target="_blank" or window.open link from a non-remote tab in an e10s window.

It, however, is pretty horribly broken if we attempt to do this with browser.link.open_newwindow set to 2 (so that we open links in new windows instead of new tabs).

Here's what happens in that case:


The content gets loaded into the chrome docshell. :/

The code that does it is in here in nsWindowWatcher::OpenWindowInternal:

if (newChrome) {
/* It might be a chrome nsXULWindow, in which case it won't have
an nsIDOMWindow (primary content shell). But in that case, it'll
be able to hand over an nsIDocShellTreeItem directly. */
nsCOMPtr<nsIDOMWindow> newWindow(do_GetInterface(newChrome));
if (newWindow)
GetWindowTreeItem(newWindow, getter_AddRefs(newDocShellItem));
if (!newDocShellItem)
newDocShellItem = do_GetInterface(newChrome);
if (!newDocShellItem)
rv = NS_ERROR_FAILURE;
}

The problem is that we're not getting newWindow from newChrome - GetInterface is failing.

The reason that GetInterface is failing is that the initial browser is being forced to be non-remote during the loading of the window, right here:


Forcing non-remoteness removes the DocShell, which clears out the mPrimaryContentShell from the nsXULWindow, and that mPrimaryContentShell is what we look to in order to GetInterface an nsIDOMWindow from newChrome (which is an nsWebShellWindow).

And the reason we do that is because the code I added in bug 989501 wants to have new windows opened from remote tabs to have a remote browser in it right away so that we can pass the PBrowser back to the caller synchronously.


So until I hear back from smaug, the question is:

How do I reconcile the need to make the initial browser remote in some cases, and not remote in others? I guess the first step is to identify the cases in which I need one, and what cases I need the other. Then I need to check my assumptions - is it true that I must have things separated this way?

Let me write down my current reckoning, and then check:

Need the initial browser to be remote
  • Anything that goes through TabParent::AnswerCreateWindow in an e10s window
    • Why? This is because the caller is expecting us to pass back the PBrowser for the newly opened window synchronously.

Need the initial browser to be not remote
  • Anything that goes through nsWindowWatcher::OpenWindowInternal in an e10s window in the parent process (so a non-remote tab in an e10s window)
    • Why? This is because the OpenWindowInternal code is expecting to get an nsIDOMWindow back that it can direct to a particular URI.

There's a symmetry here. Both processes want an nsIDOMWindow back synchronously.

Is it as simple as having gBrowserInit somehow being smarter, and knowing whether or not the initial browser should be remote or not?

Test: Have gBrowserInit not set remote to true on the initial browser, and see if target="_blank" will work for new windows.

No - this doesn't work. Not clear why yet...

Ah, we fail an assertion: NS_ENSURE_STATE(xulWin->mDocShell->GetOpenedRemote());

Yes, if I bypass that assertion if aOpeningTab, then this works.

So, right - one solution is to make gBrowserInit somehow know that it needs to initialize that first browser as remote or not.

Ideas to solve this:
  1. Add a new chromeFlag to a window, like, init-as-remote or something along those lines. gBrowserInit reads that and uses it to determine if the first browser should be remote or not. TabParent::AnswerCreateWindow sets that chrome flag.
  2. Add a new method to nsIXULBrowserWindow to force initial remoteness and have TabParent call this.

Oooohhh - I actually kind of like solution #2. nsIXULBrowserWindow straight-up designed to be called from Gecko to interact with the browser UI.

So let's think this through...

Suppose I add a method to nsIXULBrowserWindow.idl called forceInitialBrowserRemoteness. TabParent::AnswerCreateWindow calls this after OpenWindow2, to force remoteness, which stashes the nsITabParent of the now remote browser into the docshell...

That way, by default we don't set the initial browser to be remote - we let Mossop's patch do the magic to determine if a URL is supposed to be in a remote browser or not.

I like this - at least a little bit. Let's try it.

So there are a few problems - the first is that nsXULWindow::CreateNewContentWindow is expecting GetOpenedRemote to be set on the root docshell of the new window, which is before TabParent::AnswerCreateWindow has a chance to call forceInitialBrowserRemoteness.

There's also the fact that getting to forceInitialBrowserRemoteness is a pain in the ass - I have to do a static_cast it seems, and that's always a bad smell.

nsIBrowserDOMWindow might be easier to get to. So that's a thing to keep in mind.

Ok…. ok, I've been able to simplify getting to the nsIXULBrowserWindow without casting, so that's good. And I've got things working...

OMG - this seems to work. Have I solved it?? HAVE I SOLVED IT??

I need to run these tests:

browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js
browser/base/content/test/general/browser_bug562649.js FAILS
dom/tests/browser/browser_test_new_window_from_content.js FAILS

Hm - the test for browser_bug562649.js and browser_test_new_window_from_content.js fails with --e10s… strange.

It looks like browser_test_new_window_from_content.js wasn't ever really tested with --e10s. What the hell.

Get browser_test_new_window_from_content.js working

Let's give this an hour or two and see where I get.

Ok, fixed! It turns out QI'ing a nsIDocShellTreeOwner to an nsIXULWindow only sometimes works - using do_GetInterface is much better. It does the right thing.

Also had to make it so that we could handle setting up the tab modal prompts asynchronously.

Next is browser_bug562649.js.

This is what I get when I enable and run the test with --e10s:

27 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | window is busy loading a page -
Stack trace:
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug562649.js:test:8
chrome://mochikit/content/browser-test.js:Tester_execTest:696
chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:593
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:750
null:null:0
28 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | userTypedValue matches test URI - Got null, expected data:text/plain,bug562649
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug562649.js:test:9
chrome://mochikit/content/browser-test.js:Tester_execTest:696
chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:593
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:750
null:null:0
29 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI - Got , expected data:text/plain,bug562649
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug562649.js:test:10
chrome://mochikit/content/browser-test.js:Tester_execTest:696
chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:593
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:750
null:null:0
30 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | userTypedValue matches test URI after switching tabs - Got null, expected data:text/plain,bug562649
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug562649.js:test:14
chrome://mochikit/content/browser-test.js:Tester_execTest:696
chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:593
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:750
null:null:0
31 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug562649.js | location bar value matches test URI after switching tabs - Got , expected data:text/plain,bug562649
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug562649.js:test:15
chrome://mochikit/content/browser-test.js:Tester_execTest:696
chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest</<:593
chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:750
null:null:0

The isBusy thing was always a problem according to the comment that disabled the test, so I'll comment that part out right now and come back to it.

The userTypedValue stuff is more interesting.. it's being properly set in "addTab", but by the time the test checks it, it's null. So something's gone and cleared it out. I wonder what?

There are 6 places that can happen. Putting breakpoints at each point now.

I *think* it's happening in LoadInOtherProcess, and is probably because the new tab that we're creating is… wait, is it being forced to be non-remote at first? Is that what's happening?

Ah, yes. Ok. So this test manually calls browserDOMWindow::OpenURI. I think I am being too aggressive with making that new tab not remote. I can key it off opener instead.

And that fixes things! Let's run the tests again!

browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js (no e10s)
browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js (e10s)
browser/base/content/test/general/browser_bug562649.js (no e10s)
browser/base/content/test/general/browser_bug562649.js (e10s)
dom/tests/browser/browser_test_new_window_from_content.js  (no e10s)
dom/tests/browser/browser_test_new_window_from_content.js  (e10s)

\o/

All passing! We fucking did it!

Test with variations on browser.link.open_newwindow
Get browser_test_new_window_from_content.js working? (Or maybe split that out into a separate bug…)
Get browser_bug562649.js working? (Or maybe split that out into a separate bug…)
Get rid of conflicting stuff in my patches
Try build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=427c7874fe7b
Post review request - using MozReview because I have multiple patches: https://reviewboard.mozilla.org/r/2461/

GRAHHHH - BULLSHIT. I have some failing tests on my try push. :/

Strangely, a lot of them center on Linux, so I'm building on my VM right now. While I wait for that, there are some other failures I can look at.

Let me put together a list:

bc1
  • TEST-UNEXPECTED-TIMEOUT | browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js
    • This appears to affect Linux opt/debug, both 32 and 64 bit. It's even causing a crash.

Ah hah - ok, this one is easy. We seem to display an alert when we try to open remote tabs from a session that doesn't have e10s enabled by default on Linux. I think I can safely disable this test for non-e10s.

bc1 (e10s)
  • TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug491431.js | Found an unexpected tab at the end of test run: about:blank
    • Affects Linux and OS X and Windows 7 on opt builds
    • This seems like legitimate breakage
  • TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug609700.js | Test timed out
    • Affects Linux and OS X and Windows 7 on opt builds
    • This seems like legitimate breakage
  • TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug623155.js | A promise chain failed to handle a rejection: - at chrome://global/content/bindings/browser.xml:398 - TypeError: this.docShell is null
    • Affects Linux and OS X and Windows 7 on opt builds
    • This appears to be running fine with --e10s… perhaps this is broken just because the previous two tests broke.

I've got some devtools failures, but they look like unrelated bug 1110110 .

Ok, so I've been able to eliminate a bunch of test failures due to our crappy alert message about OMTC (I'm curious to know how e10s on Linux is working at all if OMTC is disabled!)… actually, it looks as if what's going on is that OMTC is force-enabled if browser.tabs.remote.autostart.1 is set… or something.

Whatever. I'm more interested in this legitimate breakage I'm seeing.

browser/base/content/test/general/browser_bug491431.js

What does this test do? Luckily, it's a short one. Looks like it:

  1. Opens a dataURL in a new tab A
  2. Removes the tab A
  3. When the tab A has been closed, add a new tab B at the same URL
  4. Move that new tab B to a new window
  5. When the original tab B closes, make sure a detail was passed to the TabClose event saying that it was a move to a new window
  6. Close the new window

This is in the e10s run, which isn't in the non-e10s run:

JavaScript error: chrome://global/content/bindings/browser.xml, line 1104: NS_ERROR_NOT_IMPLEMENTED:

So that's interesting. What is that? That's the swapFrameloaders call. Ah hah...

Yes, ok - dragging a tab out used to kind of work, and with my patch, it very much doesn't. Let me confirm that. I did this confirm this.

So this is a problem because when gBrowser.replaceTabWithWindow(tabB); is called in the test, we open a new browser window, and that browser window is passed tabB as an argument, and then we run the following code in _delayedStartup:

else if (uriToLoad instanceof XULElement) {
// swap the given tab with the default about:blank tab and then close
// the original tab in the other window.

// Stop the about:blank load
gBrowser.stop();
// make sure it has a docshell
gBrowser.docShell;

gBrowser.swapBrowsersAndCloseOther(gBrowser.selectedTab, uriToLoad);

The problem is that the new window that opens attempts to swap the frameloader of its initial browser with the frameloader of a remote browser, but the initial browser in the new window is not remote. We cannot currently swap frameloaders between remote and non-remote browsers. We therefore throw NS_ERROR_NOT_IMPLEMENTED, and bob's your uncle.

The solution is to make that newly opened browser window have a remote browser that we can swap with right away. The question is where's the best place to do that?

I found a solution by putting a check for a remote browser in the XULElement handler for arguments in delayedStartup that forces the initial browser to be remote if we're tearing out a remote browser to this new window.

Tests pass!

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7fddf255a2a4

Ahh, damn it - another test failure. :( But this is why we do try builds, so :)

My guess is I didn't see this one before because my fix for the last test failures opened it. This one is:

browser/base/content/test/general/browser_bug880101.js:

29 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug880101.js | Received onLocationChange for correct URL - Got about:blank, expected http://example.com/browser/browser/base/content/test/general/dummy_page.html
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug880101.js:test/listener.onLocationChange:12
chrome://browser/content/tabbrowser.xml:_callProgressListeners/<:527
self-hosted:forEach:173
chrome://browser/content/tabbrowser.xml:_callProgressListeners:524
chrome://browser/content/tabbrowser.xml:mTabProgressListener/<._callProgressListeners:587
chrome://browser/content/tabbrowser.xml:mTabProgressListener/<.onLocationChange:808
resource://gre/modules/RemoteWebProgress.jsm:RemoteWebProgressManager.prototype._callProgressListeners:126
resource://gre/modules/RemoteWebProgress.jsm:RemoteWebProgressManager.prototype.receiveMessage:187
null:null:0
30 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug880101.js | leaked until shutdown [nsGlobalWindow #17 inner chrome://browser/content/browser.xul about:blank] - expected PASS
31 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug880101.js | leaked until shutdown [nsGlobalWindow #16 outer  about:blank] - expected PASS

So onLocationChange is getting about:blank when we should be getting the dummy_page.html. That's probably because as soon as the window opens, we browse it to about:blank as we transition it to a remote browser.

Grarrrrrr...

So, those leaks I'm seeing at the bottom are, at least, not caused by me it seems. Those leaks exist on tip as well.

I wonder if this is just a matter of not waiting for Mossop's hook to update the remoteness of the initial tab, but doing it in delayedStartup instead if gMultiProcessBrowser is true. Let's try that.

Yes, that seems to fix this. \o/

Now… all over again, let's run the tests.

LKSDJFGKL:JSDGKLDSJG - NO, this breaks browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js. :( In fact, we regress opening new window links with this change. So that's no good.

Maybe a better plan is to make it so that updating the remoteness of a browser does not cause location change events to fire.

AW SHIT

I've made a critical error. I haven't been basing any of this stuff on billm's work in bug 567058 . He even warned me about it, and I agreed to base my work off if it! What happened?? :(((

I might have to re-conceive parts of my solution here.

Let's apply billm's patches and try to figure out what I can salvage. *sigh*.

OK, applied. Now let's see what his patches change, and see how that re-orients my solutions.

billm's patch reorients things - before, AnswerCreateWindow would make a new TabParent and send the PBrowser down to the child to deal with. Now, the child constructs a TabChild and sends UP a PBrowser for the parent process to know about.

So there's some cleverness to create that PBrowser in the child, and stash it for later use.

Ah, interesting - so it's a bit racy, but basically we say - whenever we open a new window from the child, we create a new TabChild, send the PBrowser up to the parent, get the parent to store the TabParent, and the next time some browser goes remote, we use the stashed TabParent. Seems kinda race-y, but OK. Ah, he adds an assert for assuredness.

So I think my patch to use forceInitialBrowserRemote is still going to work. It's going to need to be re-based, but I think the procedure will work. Instead of getting the nsITabParent from being stashed in the docshell, we have forceInitialBrowserRemote return it to us.

Y'know, we might be OK here.

I'm going to start re-basing these commits one by one and see what happens.

Part 1 rebased fine. On to part 2 - which will likely break...

HOLY SHIT it merged all by itself. That's incredible. Wow. hg, you're awesome.

Building now. Let's see if this broke any new stuff. Wow. I expected that to go a lot worse.

OK, wow - stuff is working. That's incredible. I can't believe it.

Let's run the tests again.

browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js (no e10s)
browser/base/content/test/general/browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js (e10s)
browser/base/content/test/general/browser_bug562649.js (no e10s)
browser/base/content/test/general/browser_bug562649.js (e10s)
dom/tests/browser/browser_test_new_window_from_content.js  (no e10s)
dom/tests/browser/browser_test_new_window_from_content.js  (e10s) CRASH
browser/base/content/test/general/browser_bug880101.js (no e10s)
browser/base/content/test/general/browser_bug880101.js (e10s) FAILED
browser/base/content/test/general/browser_bug491431.js (no e10s)
browser/base/content/test/general/browser_bug491431.js (e10s)
browser/base/content/test/general/browser_bug609700.js (no e10s)
browser/base/content/test/general/browser_bug609700.js (e10s)

dom/tests/browser/browser_test_new_window_from_content.js  --e10s crashes!

I wonder if this is from billm's patch, or if it's something I did. Ah - ok, it crashes with just billm's patch.

So I'll file a follow-up bug to fix browser_testOpenNewRemoteTabsFromNonRemoteBrowsers.js for e10s, and I won't run that test anymore.

browser/base/content/test/general/browser_bug880101.js (e10s) failure

This is the same one I was dealing with at the end of day yesterday. It looks as if the tab progress listener for the tab that the test is checking is firing onLocationChange with about:blank when it's not supposed to. There seems to be provisions for that in mTabProgressListener with an mBlank property being true by default so that the initial load of about:blank is ignored.

Bad run:

2 INFO TEST-START | browser/base/content/test/general/browser_bug880101.js
State change called - flags: 65552 -- status: 2152398850 -- mBlank: true
State change called - flags: 983041 -- status: 0 -- mBlank: true
State change called - location: about:blank -- flags: 0 -- mBlank: true
State change called - flags: 786448 -- status: 0 -- mBlank: true
this.mBlank set to false
UPDATING REMOTENESS NOW
DONE UPDATING REMOTENESS

State change called - flags: 983041 -- status: 0 -- mBlank: true
State change called - flags: 786448 -- status: 2152398850 -- mBlank: true
this.mBlank set to false
State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - flags: 786448 -- status: 2152398850 -- mBlank: false
State change called - location: about:blank -- flags: 0 -- mBlank: false

5 INFO TEST-PASS | browser/base/content/test/general/browser_bug880101.js | Received onLocationChange from top frame
6 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_bug880101.js | Received onLocationChange for correct URL - Got about:blank, expected http://example.com/browser/browser/base/content/test/general/dummy_page.html
Stack trace:
chrome://mochikit/content/browser-test.js:test_is:851
chrome://mochitests/content/browser/browser/base/content/test/general/browser_bug880101.js:test/listener.onLocationChange:12
chrome://browser/content/tabbrowser.xml:_callProgressListeners/<:527
self-hosted:forEach:173
chrome://browser/content/tabbrowser.xml:_callProgressListeners:524
chrome://browser/content/tabbrowser.xml:mTabProgressListener/<._callProgressListeners:587
chrome://browser/content/tabbrowser.xml:mTabProgressListener/<.onLocationChange:813
resource://gre/modules/RemoteWebProgress.jsm:RemoteWebProgressManager.prototype._callProgressListeners:126
resource://gre/modules/RemoteWebProgress.jsm:RemoteWebProgressManager.prototype.receiveMessage:187
null:null:0
7 INFO MEMORY STAT vsize after test: 3292430336
8 INFO MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
9 INFO MEMORY STAT residentFast after test: 242262016
10 INFO MEMORY STAT heapAllocated after test: 109422776
11 INFO TEST-OK | browser/base/content/test/general/browser_bug880101.js | took 407ms


Good run (which forces remoteness on initial browser to be remote):

4 INFO TEST-START | browser/base/content/test/general/browser_bug880101.js
State change called - flags: 65552 -- status: 2152398850 -- mBlank: true
State change called - flags: 983041 -- status: 0 -- mBlank: true
State change called - location: about:blank -- flags: 0 -- mBlank: true
State change called - flags: 786448 -- status: 0 -- mBlank: true
this.mBlank set to false
UPDATING REMOTENESS NOW
DONE UPDATING REMOTENESS

State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - flags: 786448 -- status: 2152398850 -- mBlank: false
State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - location: http://example.com/browser/browser/base/content/test/general/dummy_page.html -- flags: 0 -- mBlank: false

7 INFO TEST-PASS | browser/base/content/test/general/browser_bug880101.js | Received onLocationChange from top frame
8 INFO TEST-PASS | browser/base/content/test/general/browser_bug880101.js | Received onLocationChange for correct URL
9 INFO MEMORY STAT vsize after test: 3277836288
10 INFO MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
11 INFO MEMORY STAT residentFast after test: 243974144
12 INFO MEMORY STAT heapAllocated after test: 109739784
13 INFO TEST-OK | browser/base/content/test/general/browser_bug880101.js | took 410ms


So, extraction - here's the broken case:

State change called - flags: 983041 -- status: 0 -- mBlank: true
State change called - flags: 786448 -- status: 2152398850 -- mBlank: true
this.mBlank set to false
State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - flags: 786448 -- status: 2152398850 -- mBlank: false
State change called - location: about:blank -- flags: 0 -- mBlank: false

and the good case:

State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - flags: 786448 -- status: 2152398850 -- mBlank: false
State change called - flags: 983041 -- status: 0 -- mBlank: false
State change called - location: http://example.com/browser/browser/base/content/test/general/dummy_page.html -- flags: 0 -- mBlank: false

It'd be good if I had a sense of what these flags and statuses were.

const unsigned long STATE_START          = 0x00000001;
const unsigned long STATE_REDIRECTING    = 0x00000002;
const unsigned long STATE_TRANSFERRING   = 0x00000004;
const unsigned long STATE_NEGOTIATING    = 0x00000008;
const unsigned long STATE_STOP           = 0x00000010;

const unsigned long STATE_IS_REQUEST     = 0x00010000;
const unsigned long STATE_IS_DOCUMENT    = 0x00020000;
const unsigned long STATE_IS_NETWORK     = 0x00040000;
const unsigned long STATE_IS_WINDOW      = 0x00080000;

const unsigned long STATE_IS_INSECURE     = 0x00000004;
const unsigned long STATE_IS_BROKEN       = 0x00000001;
const unsigned long STATE_IS_SECURE       = 0x00000002;

const unsigned long STATE_BLOCKED_MIXED_ACTIVE_CONTENT  = 0x00000010;
const unsigned long STATE_LOADED_MIXED_ACTIVE_CONTENT   = 0x00000020;

const unsigned long STATE_BLOCKED_MIXED_DISPLAY_CONTENT = 0x00000100;
const unsigned long STATE_LOADED_MIXED_DISPLAY_CONTENT  = 0x00000200;

const unsigned long STATE_BLOCKED_TRACKING_CONTENT = 0x00001000;
const unsigned long STATE_LOADED_TRACKING_CONTENT  = 0x00002000;

const unsigned long STATE_SECURE_HIGH     = 0x00040000;
const unsigned long STATE_SECURE_MED      = 0x00010000;
const unsigned long STATE_SECURE_LOW      = 0x00020000;

const unsigned long STATE_IDENTITY_EV_TOPLEVEL    = 0x00100000;

const unsigned long STATE_USES_SSL_3                = 0x01000000;
const unsigned long STATE_USES_WEAK_CRYPTO          = 0x02000000;

983041 is:
11110000000000000001

786448 is:
11000000000000010000

2152398850 = NS_BINDING_ABORTED
The async request failed because it was aborted by some user action. (probably switching to remote)

The bad case seems to have an extra NS_BINDING_ABORTED being thrown. Remember in that case, I think Mossop's code is running, so we might be aborting the connection to the URI in such a way that…guh, I don't know. :(


COMMON STATE CHANGES

State change called - flags: 65552 -- status: 2152398850 -- mBlank: true
#22     0x0000000106f617ff in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) at /Users/mikeconley/Projects/mozilla-central/js/src/jsapi.cpp:4568
#23     0x000000010248a678 in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) at /Users/mikeconley/Projects/mozilla-central/js/xpconnect/src/XPCWrappedJSClass.cpp:1205
#24     0x0000000102484789 in nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) at /Users/mikeconley/Projects/mozilla-central/js/xpconnect/src/XPCWrappedJS.cpp:532
#25     0x000000010172d829 in PrepareAndDispatch at /Users/mikeconley/Projects/mozilla-central/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_darwin.cpp:122
#26     0x000000010172c28b in SharedStub ()
#27     0x0000000105980540 in nsBrowserStatusFilter::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp:172
#28     0x0000000105980695 in _ZThn8_N21nsBrowserStatusFilter13OnStateChangeEP14nsIWebProgressP10nsIRequestj12tag_nsresult at /Users/mikeconley/Projects/mozilla-central/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp:177
#29     0x000000010271fd4b in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:1269
#30     0x000000010271f3ff in nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:1232
#31     0x000000010271f4f9 in nsDocLoader::doStopURLLoad(nsIRequest*, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:807
#32     0x000000010271f22e in nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:613
#33     0x000000010271f64d in _ZThn8_N11nsDocLoader13OnStopRequestEP10nsIRequestP11nsISupports12tag_nsresult at /Users/mikeconley/Projects/mozilla-central/obj-debug/uriloader/base/Unified_cpp_uriloader_base0.cpp:628
#34     0x000000010185de24 in nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/netwerk/base/src/nsLoadGroup.cpp:685
#35     0x000000010185c490 in nsLoadGroup::Cancel(tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/netwerk/base/src/nsLoadGroup.cpp:290
#36     0x000000010271de03 in nsDocLoader::Stop() at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:262
#37     0x00000001055af855 in nsDocShell::Stop() at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.h:188
#38     0x000000010555d549 in nsDocShell::Stop(unsigned int) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:5430
#39     0x00000001055723f0 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:10109
#40     0x000000010556dae0 in nsDocShell::LoadURI(nsIURI*, nsIDocShellLoadInfo*, unsigned int, bool) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:1634
#41     0x0000000105572d8e in _ZThn424_N10nsDocShell7LoadURIEP6nsIURIP19nsIDocShellLoadInfojb at /Users/mikeconley/Projects/mozilla-central/obj-debug/docshell/base/Unified_cpp_docshell_base0.cpp:1637
#42     0x0000000102edebfb in nsFrameLoader::ReallyStartLoadingInternal() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:461
#43     0x0000000102ede039 in nsFrameLoader::ReallyStartLoading() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:311
#44     0x0000000102e5d46b in nsDocument::MaybeInitializeFinalizeFrameLoaders() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsDocument.cpp:7311
#45     0x000000010496b39c in mozilla::dom::XULDocument::DoneWalking() at /Users/mikeconley/Projects/mozilla-central/dom/xul/XULDocument.cpp:3094
#46     0x000000010496ba0f in mozilla::dom::XULDocument::StyleSheetLoaded(mozilla::CSSStyleSheet*, bool, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/dom/xul/XULDocument.cpp:3172
#47     0x000000010496ba74 in _ZThn1904_N7mozilla3dom11XULDocument16StyleSheetLoadedEPNS_13CSSStyleSheetEb12tag_nsresult at /Users/mikeconley/Projects/mozilla-central/obj-debug/dom/xul/Unified_cpp_dom_xul0.cpp:3177
#48     0x0000000104cc597c in mozilla::css::Loader::SheetComplete(mozilla::css::SheetLoadData*, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:1790
#49     0x0000000104cc0f7d in mozilla::css::Loader::HandleLoadEvent(mozilla::css::SheetLoadData*) at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:2424
#50     0x0000000104cc0eb1 in mozilla::css::SheetLoadData::Run() at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:431

State change called - flags: 983041 -- status: 0 -- mBlank: true
#27     0x0000000105980540 in nsBrowserStatusFilter::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp:172
#28     0x0000000105980695 in _ZThn8_N21nsBrowserStatusFilter13OnStateChangeEP14nsIWebProgressP10nsIRequestj12tag_nsresult at /Users/mikeconley/Projects/mozilla-central/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp:177
#29     0x000000010271fd4b in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:1269
#30     0x000000010271f3ff in nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:1232
#31     0x000000010271ec08 in nsDocLoader::doStartDocumentLoad() at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:767
#32     0x000000010271ea0b in nsDocLoader::OnStartRequest(nsIRequest*, nsISupports*) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsDocLoader.cpp:470
#33     0x000000010271ed57 in _ZThn8_N11nsDocLoader14OnStartRequestEP10nsIRequestP11nsISupports at /Users/mikeconley/Projects/mozilla-central/obj-debug/uriloader/base/Unified_cpp_uriloader_base0.cpp:482
#34     0x000000010185d47c in nsLoadGroup::AddRequest(nsIRequest*, nsISupports*) at /Users/mikeconley/Projects/mozilla-central/netwerk/base/src/nsLoadGroup.cpp:563
#35     0x000000010183b9b8 in nsBaseChannel::AsyncOpen(nsIStreamListener*, nsISupports*) at /Users/mikeconley/Projects/mozilla-central/netwerk/base/src/nsBaseChannel.cpp:649
#36     0x000000010183ba27 in _ZThn80_N13nsBaseChannel9AsyncOpenEP17nsIStreamListenerP11nsISupports at /Users/mikeconley/Projects/mozilla-central/obj-debug/netwerk/base/src/Unified_cpp_netwerk_base_src1.cpp:654
#37     0x0000000102725cbe in nsURILoader::OpenURI(nsIChannel*, unsigned int, nsIInterfaceRequestor*) at /Users/mikeconley/Projects/mozilla-central/uriloader/base/nsURILoader.cpp:834
#38     0x00000001055738cc in nsDocShell::DoChannelLoad(nsIChannel*, nsIURILoader*, bool) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:10787
#39     0x000000010559d850 in nsDocShell::DoURILoad(nsIURI*, nsIURI*, bool, unsigned int, nsISupports*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, bool, nsIDocShell**, nsIRequest**, bool, bool, bool, nsAString_internal const&, nsIURI*, unsigned int) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:10616
#40     0x0000000105572b55 in nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:10186
#41     0x000000010556dae0 in nsDocShell::LoadURI(nsIURI*, nsIDocShellLoadInfo*, unsigned int, bool) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:1634
#42     0x0000000105572d8e in _ZThn424_N10nsDocShell7LoadURIEP6nsIURIP19nsIDocShellLoadInfojb at /Users/mikeconley/Projects/mozilla-central/obj-debug/docshell/base/Unified_cpp_docshell_base0.cpp:1637
#43     0x0000000102edebfb in nsFrameLoader::ReallyStartLoadingInternal() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:461
#44     0x0000000102ede039 in nsFrameLoader::ReallyStartLoading() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsFrameLoader.cpp:311
#45     0x0000000102e5d46b in nsDocument::MaybeInitializeFinalizeFrameLoaders() at /Users/mikeconley/Projects/mozilla-central/dom/base/nsDocument.cpp:7311
#46     0x000000010496b39c in mozilla::dom::XULDocument::DoneWalking() at /Users/mikeconley/Projects/mozilla-central/dom/xul/XULDocument.cpp:3094
#47     0x000000010496ba0f in mozilla::dom::XULDocument::StyleSheetLoaded(mozilla::CSSStyleSheet*, bool, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/dom/xul/XULDocument.cpp:3172
#48     0x000000010496ba74 in _ZThn1904_N7mozilla3dom11XULDocument16StyleSheetLoadedEPNS_13CSSStyleSheetEb12tag_nsresult at /Users/mikeconley/Projects/mozilla-central/obj-debug/dom/xul/Unified_cpp_dom_xul0.cpp:3177
#49     0x0000000104cc597c in mozilla::css::Loader::SheetComplete(mozilla::css::SheetLoadData*, tag_nsresult) at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:1790
#50     0x0000000104cc0f7d in mozilla::css::Loader::HandleLoadEvent(mozilla::css::SheetLoadData*) at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:2424
#51     0x0000000104cc0eb1 in mozilla::css::SheetLoadData::Run() at /Users/mikeconley/Projects/mozilla-central/layout/style/Loader.cpp:431

State change called - flags: 786448 -- status: 0 -- mBlank: true
0 anonymous(aWebProgress = [xpconnect wrapped nsIWebProgress @ 0x11a4651c0 (native @ 0x11ae06460)], aRequest = [xpconnect wrapped nsIRequest @ 0x11a4411c0 (native @ 0x11a017ba0)], aStateFlags = 786448, aStatus = 2152398850) ["chrome://browser/content/tabbrowser.xml":634]
this = [object Object]
1 anonymous(methodName = "onStateChange", args = [object Object],[object Object],786448,2152398850, [object Object], 786448, 2152398850) ["resource://gre/modules/RemoteWebProgress.jsm":126]
this = [object Object]
2 anonymous(aMessage = [object Object]) ["resource://gre/modules/RemoteWebProgress.jsm":167]
this = [object Object]

We just happen to get another onLocationChange with my patch for some reason. It's being sent up as a message from the content process.

Is the content process sending more onLocationChange in the bad case, or am I just responding to it more in the bad case?

It's sending it more in the child process. Here's why:

0 onLocationChange(aWebProgress = [xpconnect wrapped (nsISupports, nsIDocShell, nsIInterfaceRequestor, nsIWebNavigation, nsILoadContext, nsIWebProgress) @ 0x11a28a040 (native @ 0x118f87008)], aRequest = null, aLocationURI = [xpconnect wrapped nsIURI @ 0x11a2a3be0 (native @ 0x11a26f470)], aFlags = 0) ["chrome://global/content/browser-child.js":146]
this = [object Object]
1 anonymous(epoch = 1, tabData = [object Object], reloadCallback = () => {
// Inform SessionStore.jsm about the reload. It will send
// restoreTabContent in response.
sendAsyncMessage("SessionStore:reloadPendingTab", {epoch: data.epoch});
}) ["resource:///modules/sessionstore/ContentRestore.jsm":139]
this = [object Object]
2 anonymous( = [object Object]) ["chrome://browser/content/content-sessionStore.js":129]
this = [object Object]

AHHHHH - Ok, I have a theory.

Mossop's patch for bug 999239 makes it so that when you transition from remote/non-remote or vice versa, the history gets updated in the browser that you transition to. We're doing that when we transition from about:blank to the target URI in the test, and when the history gets updated in the newly remote browser, we hear the onLocationChange for the first document, which is about:blank.

I THINK. Let me see if I can pick this apart.

Jesus, this is so complicated. :/

This is what causes restoreTab to be called:

restoreTab called with stack: restoreTab@resource:///modules/sessionstore/SessionStore.jsm:2527:56
ssi_setTabState@resource:///modules/sessionstore/SessionStore.jsm:1608:5
ss_restoreTabAndLoad@resource:///modules/sessionstore/SessionStore.jsm:209:1
LoadInOtherProcess@chrome://browser/content/browser.js:10311:5
_loadURIWithFlags@chrome://browser/content/browser.js:10280:7
loadURIWithFlags@chrome://browser/content/tabbrowser.xml:5479:13
loadURIWithFlags@chrome://browser/content/tabbrowser.xml:2910:20
openLinkIn@chrome://browser/content/utilityOverlay.js:324:5
loadURI@chrome://browser/content/browser.js:11505:5
gBrowserInit._delayedStartup@chrome://browser/content/browser.js:10615:1

Window opens… we want to open test URL… about:blank gets loaded.

Can we make it so that if we're calling restoreTab or _restoreTabAndLoad or whatever, that if the tab only had about:blank in it, that we don't send yet another onLocationChange when we "restore" that tab as remote.

Huh. This appears to be a problem in general when changing remoteness - we _always_ fire an about:blank onLocationChange. I exposed this bug, but I don't think it's my bug. I think I'm just going to disable the test and see what Mossop thinks about this problem.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=567120fe01d9

Almost there...

Bug 1079617 - Intermittent browser_test_new_window_from_content.js | This test exceeded the timeout threshold. It should be rewritten or split up.

It looks like at least one of my add_tasks runs too long before completing. I need to fix that.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1eb5fceacee2

Huh. So now we're crashing in browser_test_new_window_from_content.js. *sigh*. Time to punt.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7fa311659a36

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e3eee363e50

This is maddening. Now I have a new test failing - one I didn't have back with try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=567120fe01d9 . I'm at a loss to explain it. :(

As usual, the first step is to try to reproduce it locally.

17:30:19     INFO -  1182 INFO TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_394759_behavior.js | There were 3 popup windows to repoen - Got 0, expected 3
17:30:19     INFO -  Stack trace:
17:30:19     INFO -  chrome://mochikit/content/browser-test.js:test_is:851
17:30:19     INFO -  chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_394759_behavior.js:openWindowRec:25
17:30:19     INFO -  chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_394759_behavior.js:test/openWindowRec/onTestURLLoaded/<:45
17:30:19     INFO -  chrome://mochikit/content/browser-test.js:testScope/test_executeSoon/<.run:881
17:30:19     INFO -  null:null:0
17:30:19     INFO -  1183 INFO TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_394759_behavior.js | There were 0 normal windows to repoen - Got 3, expected 0
17:30:19     INFO -  Stack trace:
17:30:19     INFO -  chrome://mochikit/content/browser-test.js:test_is:851
17:30:19     INFO -  chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_394759_behavior.js:openWindowRec:27
17:30:19     INFO -  chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_394759_behavior.js:test/openWindowRec/onTestURLLoaded/<:45
17:30:19     INFO -  chrome://mochikit/content/browser-test.js:testScope/test_executeSoon/<.run:881

Yes, I can reproduce.

In both cases the ss.getClosedWindowData() string just contains "[]", which is wrong.

When I run without --e10s, ss.getClosedWindowData() looks more like this:

[{"tabs":[{"entries":[{"url":"http://example.com/?window=0","title":"mochitest index /","ID":33,"docshellID":48,"docIdentifier":33,"persist":true}],"lastAccessed":1422240163515,"hidden":false,"attributes":{},"image":null,"index":1}],"selected":1,"_closedTabs":[],"width":1229,"height":771,"screenX":63,"screenY":44,"sizemode":"normal","title":"mochitest index /","closedAt":1422240163516},{"tabs":[{"entries":[{"url":"http://example.com/?window=1","title":"mochitest index /","ID":30,"docshellID":44,"docIdentifier":30,"persist":true}],"lastAccessed":1422240162981,"hidden":false,"attributes":{},"image":null,"index":1}],"selected":1,"_closedTabs":[],"width":1229,"height":771,"screenX":63,"screenY":44,"sizemode":"normal","title":"mochitest index /","closedAt":1422240162982},{"tabs":[{"entries":[{"url":"http://example.com/?window=2","title":"mochitest index /","ID":27,"docshellID":40,"docIdentifier":27,"persist":true}],"lastAccessed":1422240162332,"hidden":false,"attributes":{},"image":null,"index":1}],"selected":1,"_closedTabs":[],"width":1229,"height":771,"screenX":63,"screenY":44,"sizemode":"normal","title":"mochitest index /","closedAt":1422240162332}]

(at least in the "expected 3 normal windows" test).

Looks like they're never getting added to _closedWindows. Adding happens in onClosed at:

this._closedWindows.unshift(winData);

And we're not adding them because hasSaveableTabs is false, since I guess no tabs in the newly created windows fit the description of "worthy of saving".

Huh. And this is because when we call RedirectLoad for a new window, the history only contains an about:blank entry. What the hell?

So apparently, this is fine - according to Mossop, when the history is blasted down to the newly-remote browser, it's supposed to do the load, and then replace that about:blank entry with the actual history entry.

12:02 (Mossop) mconley: It should send down about:blank as the current history, plus the new URI to load (which will replace the about:blank history entry because it isn't marked to persist). Once the new URI has loaded it should just have the single history entry
12:03 (mconley) that latter half doesn't appear to be happening. Where in SessionStore does that occur?
12:05 (Mossop) mconley: Here is where we tell the child to start loading the new content: http://mxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/SessionStore.jsm#2675
12:05 (Mossop) loadArguments hold the new URI to load
12:07 (mconley) Mossop: cool. Do you know where the bit is where we swap out that about:blank entry?
12:08 JosiahOne has joined (Instantbird@ moz-mho344.biz.rr.com )
12:08 (Mossop) That's deep in docshell somewhere...
12:08 (mconley) ah, those magic words
12:09 joy has joined (Adium@ moz-m8it2e.hpl.hp.com )
12:09 (Mossop) mconley: Basically here is where the magic happens: http://mxr.mozilla.org/mozilla-central/source/docshell/shistory/src/nsSHistory.cpp#419
12:10 (mconley) OK, cool
12:10 (mconley) I can work with this
12:10 (Mossop) mconley: History entries can be marked to either persist or not, by default all persist except about:blank and about:newtab
12:10 (mconley) Mossop: thanks for the pointers, I might have a few more questions soon
12:10 (Mossop) bug 1077738 added storing the persist flag in sessionstore
12:10 (firebot) https://bugzil.la/1077738 — FIXED, dtownsend@mozilla.com — Session restored about:newtab pages are retained in session history
12:11 (Mossop) It's possible you might be hitting an edge case where that isn't working right
12:11 (mconley) possibly, yes

Hypothesis:

In a newly created window, when the content process loads the URI and sends out the load event, the content process DocShell has not yet added the loaded URI to the session history.

I will test this by running the test again, but this time, with a breakpoint on nsDocShell::AddToSessionHistory in the content process.

15:07 (___mconley) Mossop: it looks as if the docshell in the content process is successfully replacing the about:blank history entry, and sending it up to the parent process, which is stashing it in the TabState cache thing
15:07 (___mconley) Mossop: but it looks like when it comes time to collect the window state data on window close, instead of using the tabData that we sent up, we use __SS_data, which is attached to the browser
15:07 (___mconley) Mossop: which is out of date
15:08 (Mossop) ___mconley: Does that test need to call TabState.flush before checking things or something?
15:09 (___mconley) Mossop: let me check
15:10 (Mossop) Yeah it waits for the page to load, that doesn't necessarily mean the updated tab state has made it to the parent though so maybe it needs to wait for one of the session store events or force an update before proceeding
15:11 (___mconley) Mossop: well whaddya know - test passes. \o/
15:11 (Mossop) Excellent
15:11 (___mconley) Mossop: ok, so that's an acceptable solution? Force a TabState flush for the selected browser in the test before closing the window?

15:13 (Mossop) mconley: If instead of calling flush you just delay for say 500ms, does it still pass? That would verify that the state will get passed up automatically and not just because you force it
15:13 (Mossop) Not valid for a test but maybe a good manual sanity check
15:14 (mconley) testing
15:15 (mconley) Mossop: yeah, that was the dealio - waiting before closing the window also caused things to pass. \o./

AWESOME. IT WORKS. THANK FUCKING GOD.

Time for another try push.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0c5af8ead50

GAHHHHHH - still fails in some ways. FULL OF RAGE

At least it seems to be limited to Linux though. Baby steps.

Maybe wait for the SS_TabRestored event instead?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b865d2ac734a

GREEN. ALL GREEN. CRIPES. ~\o/~

Ok, pushed up for review.

Have review feedback! Also, the test was just wholesale disabled for Linux, due to bug 1126311. With that patch, and a fix for Browser:LoadURI firing before _delayedStartup on Linux, repushed to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=88fc7f949252

Awwwwwww yeah - full green.

I've altered my last fix there so that if Browser:LoadURI comes in before _delayedStartup, that we queue up an observer for delayed startup in that window, and cause redirection to happen immediately after. Noice.

Another try push for good measure

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ef629ecac84

ONLY ONE REVIEW TO GO. WOOOOOOOOO

Rebase and yet another one more try push for good measure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=06b84f9f9925

MOTHERFUCKER. I was backed out because of Mn-e10s failures. :((((

Failure logs: https://treeherder.mozilla.org/logviewer.html#?job_id=1891839&repo=fx-team
Treeherder.

Failing tests:

TEST-UNEXPECTED-ERROR | test_accessibility.py TestAccessibility.test_click_raises_element_not_accessible | TimeoutException: TimeoutException: Connection timed out
TEST-UNEXPECTED-ERROR | test_with_using_context.py TestSetContext.test_exception_raised_while_in_with_block_is_propagated | TimeoutException: TimeoutException: Connection timed out

Those errors seemed to show up for the patch that makes it so that the initial browser is non-remote. I think I need to train the Marionette server to check whether or not we're an e10s window, and only pass the frameloader back when the TabRemotenessChanged event has fired.

This is all kinds of bullshit. :(

The whole "switch initial browser" to remote thing seems to be burning me wrt automation and tests. I wonder if we're firing the "load" on a browser twice - once when it attempts to load a document and gets redirected to load in another process, and again when it loads the page in the other process. Is that true?



TEST-UNEXPECTED-FAIL | browser/components/sessionstore/test/browser_423132.js | A promise chain failed to handle a rejection: - at chrome://mochitests/content/browser/browser/components/sessionstore/test/browser_423132.js:61 - TypeError: cookie2 is undefined

GOOD
2 INFO TEST-START | browser/components/sessionstore/test/browser_423132.js
FIRING ss-test:loadEvent!
FIRING ss-test:loadEvent!
Adding ss-testLoadEvent message listener
FIRING ss-test:loadEvent!
FIRING ss-test:loadEvent!
Saw ss-test:loadEvent!
7 INFO TEST-PASS | browser/components/sessionstore/test/browser_423132.js | expected one cookie
8 INFO TEST-PASS | browser/components/sessionstore/test/browser_423132.js | cookie name successfully restored
9 INFO TEST-PASS | browser/components/sessionstore/test/browser_423132.js | cookie value successfully restored
10 INFO TEST-PASS | browser/components/sessionstore/test/browser_423132.js | cookie path successfully restored


BAD
2 INFO TEST-START | browser/components/sessionstore/test/browser_423132.js
FIRING ss-test:loadEvent!
FIRING ss-test:loadEvent!
Adding ss-testLoadEvent message listener
FIRING ss-test:loadEvent!
FIRING ss-test:loadEvent!
Saw ss-test:loadEvent!
7 INFO TEST-PASS | browser/components/sessionstore/test/browser_423132.js | expected one cookie

The problem actually appears to be that the cookie hasn't been set properly in the state by the time we try to gather it. What the hell?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2b2fc1ba6e9

The good news is that this solves the Marionette bug. The bad news is that while I fixed the failure in browser_423132.js, I caused a bunch of other tests to break. FUCK.

So, maybe don't modify promiseBrowserLoaded. Just special-case browser_423132.js like a chump.

After promiseBrowserLoaded, then add event listener for SSTabRestored with promiseTabRestored.

Try build

https://treeherder.mozilla.org/#/jobs?repo=tryrevision=b11b51878ac4

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2b2fc1ba6e9

OK, got the reviews I needed. Let's push to try again for good measure...

Here we go

LANDED AND MERGED. Thank god.

TODO
Address smaug's review feedback
File a bug to alter the alert dialog that tells Linux users that they can't open e10s windows because of OMTC being disabled. Make it so that there is no alert dialog if we have browser.remote.tabs.autostart.1 enabled. Filed bug 1126311
File a bug about about:blank onLocationChange being fired for every transition to remote browsers, and reactivate browser_bug880101.js - filed bug 1126316
Why is privateWindow commented out in the test?
Address Mossop's review comments