Bug 1069075 - Intermittent browser_CTP_data_urls.js | Test 3a, Waited too long for plugin to activate | Test 3c, Plugin should be activated
Ok, I can reproduce this locally using a debug Ubuntu build and running with --run-until-failure. I love that feature.
The failure always seems to be at the same place: "Test 3a" says that
it's waiting too long for a plugin to activate, and "Test 3c" checks for
an activated plugin.
I suspect that 3a is really the root cause here, and 3c is noise as a
result of 3a's failure. Let's see if we can figure out why 3a is going
wrong.
What is "Test 3a" doing? It seems to be part of a multi-part test - Test 3a, 3b, and 3c.
What's important is knowing what this test does, and what occurs
leading up to this test. The key to the failure will be in there
somewhere.
Actually, if we can narrow that down immediately, that'd be good. Let's
see if I can eliminate Test 1 and Test 2 and see if Test 3 still fails.
Ok, yes, I am able to run just Test 3a, 3b and 3c. I was able to get
the test harness to target 3a first, and I got 3c to shortcut to
finishing. So it's just these three.
Now, if I run with --run-until-failure, can I get it to fail? If so,
then the orange is likely caused by something within Test 3a. If not,
then something in the steps preceeding Test 3a is likely at fault, or at
least contributing to the problem. Both pieces of information are
useful.
AH HAH. I am
not
able to reproduce the problem with just Test 3a, 3b and 3c. So
something in either Test 1 or Test 2 is contributing. Let's see if we
can narrow that down. I will re-enable the Test 2 bits.
Huh… no failures with both Test 2 and 3 running. I'll admit I'm
surprised - I expected it to be 2, since I would have expected any
flim-flammery from 1 to have affected 2 and not 3. Interesting.
So my current hypothesis is that Test 1, in conjunction with or in spite of Test 2, is causing Test 3 to fail intermittently.
So I guess I'd better understand what the hell Test 1 and Test 2 are doing before I continue. *sigh*.
Test 1
plugin_data_url.html is loaded. This document has two links, which lead
to data URIs that show plugins (the first link shows just the test
plugin, the second link shows both the test plugin and the second test
plugin). The document also has a plugin (which I'll call X) displaying
on the page with the links.
We first make sure there's a click-to-play notification (the
doorhanger) being shown. We also test that plugin X is not activated.
We then click on data-link-1, which is the data URI showing just the single plug-in, which I'll call Y.
That's the end of Test 1a.
Test 1b starts when the data URI finishes loading and the plugin
binding has been attached. We make sure there's a click-to-play
doorhanger. We make sure that Y is not activated. Then we open the
click-to-play doorhanger, and click the primary button which "allows"
the plugin to play. We then wait for the plugin to activate.
Once the plugin is activated, we move on to Test 1c.
Test 1c clears all plugin permissions we've set up, and then re-loads plugin_data_url.html.
And that's the end of Test 1.
Test 2
plugin_data_url.html is loaded. We make sure there is a click-to-play doorhanger being displayed.
We make sure that the plugin X is not yet activated. And then we click
on data-link-2, which loads up the data URI with two plugins - which
I'll call Y and Z.
That's the end of Test 2a.
Test 2b starts with that data URI being loaded. We make sure that the
click-to-play doorhanger is still there, and we also open it up. We make
sure that both plugins are being listed in the doorhanger.
Then, we inspect those listings of plugins, and find the controls for
Y. We tell it to allow Y for now, and click on the primary button of the
doorhanger to make it so. We then wait for Y to be activated.
That's the end of Test 2b.
Test 2c starts … by also making sure that the plugin Y is activated.
Strange… and then it clears all plugin permissions, and reloads
plugin_data_url.html.
And that's it for Test 2.
Test 3
plugin_data_url.html is loaded. We make sure there's a click-to-play notification being shown.
We also make sure plugin X is not activated.
We then reshow the doorhanger, and click on the primary button ("allow") to make X active.
We then wait for X to go active.
And this is where the test fails - X never goes active.
Ok, now I understand what this test is doing. Now… now why is it failing?
Oh, interesting - now it looks like I can reproduce this with just Test
2 running before Test 3. I was doing a build in the background on my
host machine, so I'm guessing a heavily loaded CPU helped reveal that.
That's good.
Let's add some logging to PluginContent.jsm to see if we're even calling playPlugin on the thing.
Ah hah - it looks like in Test 3a, the button we're clicking in the
PopupNotification has label "OK", so we're sending the wrong message to
the content process.
Y'know, I think I actually investigated this briefly as part of bug
899347, but I think I dropped it. Let me read my notes there and extract
the important bits.
"Ah, interesting data from those logs. It looks like in the failing
case, which is rare, the button that we're pressing in the popup is "Ok"
and not "Allow now". Under what circumstances is the popup saying "OK"?
Er… interesting. None of those buttons should be "OK"… and the request
is sends down is "Block". Wtf?
Ah… that's button-accept from dialog.properties. "
So I suspect the panel is not fully initialized when we get it.
Nope, I was wrong - but I think I've narrowed it down.
The problem is that the framescript has a cache of all plugins its seen
for a page, and it only flushes it on signals from the parent when
notifications are removed. That's too late, because a notification might
be removed long after the next page has loaded, and so we'd be using
the old cache. That's why in some cases, we're sending up information on
multiple plugins when the page we're on has only ever seen one.
I'm going to modify PluginContent to flush the cache on pageshow
(pagehide is too late, since by that point, the page being loaded might
have already sent plugin events).
This seems to fix it!
HOWEVER. This seems to break browser_CTP_notificationBar, and I think
I've basically just regressed hidden plugin support, because the code
attempts to get the information out of pluginData… and it's been
cleared.
OK, I think I understand! Here's the story.
This all has to do with the lifetime of a cache inside a PluginContent
instance. That cache is called pluginData, and we use this cache to
remember what plugins are on a particular page, and map it to
information we have about that plugin.
We want that cache to clear when we move from page to page, because one
page might have completely different plugins from another.
We're currently clearing that cache when we see that the click-to-play
PopupNotification is removed. We send a message asynchronously to the
child, asking PluginContent to clear its cache.
But if we're fast enough, we have a pocket of time where we're viewing
some page, and our pluginData is out of sync, because we haven't yet
heard from the parent to clear our cache!
The natural inclination is to have the PluginContent object listen for
pagehide or pageshow, and clear the cache at that point - but both of
those points are too late. pageshow is fired _after_ the DOM is bound to
the tree, and after plugins report that they instantiate, so we might
send messages up to the parent about plugins with bad cache data just
before we clear it out.
Same goes for pagehide - it fires too late. We might have browsed to a
new page that shows some plugins, and sent messages to the parent with
bad cache data before we get cleared out.
The solution, I think, is to listen for unload events for a page, and
clear out the cache then. I also want to remove the stuff that clears
the cache when the notification is removed - with this new code, that
old stuff won't be necessary.
Ok, coming up with a patch.
Patch posted!
19 INFO TEST-PASS |
chrome://mochitests/content/browser/browser/base/content/test/plugins/browser_CTP_data_urls.js
| Test 2c, Plugin should be activated
Loading page plugin_data_url.html for next test3a!
For application/x-test found plugin
libnptest.so
PluginContent saw event type: PluginInstantiated
_showClickToPlayNotification knows about 1 plugins.
Sending request to show CTP notification from child: 7 from stack:
PluginContent.prototype._showClickToPlayNotification@resource:///modules/PluginContent.jsm:715:97
PluginContent.prototype.handleEvent@resource:///modules/PluginContent.jsm:424:7
Saw page hide!
Clearing plugin data cache! All is well!
++DOMWINDOW == 30 (0xa4d04da0) [pid = 30381] [serial = 30] [outer = 0xa4d02ac0]
PopupNotifications saw a location change
There are 1 notifications for the changing browser
After filtering, there are 0 notifications for the changing browser
Received CTP notification request 7 in parent for
http://127.0.0.1:8888
Showing a ClickToPlay notification
Creating an empty map for pluginData
PluginData has 0 items, and we received information on 2 plugins.
PluginContent saw event type: PluginClickToPlay
_showClickToPlayNotification knows about 1 plugins.
Sending request to show CTP notification from child: 8 from stack:
PluginContent.prototype._showClickToPlayNotification@resource:///modules/PluginContent.jsm:715:97
PluginContent.prototype.handleEvent@resource:///modules/PluginContent.jsm:424:7
pluginProblem_XBL_Constructor@chrome://mozapps/content/plugins/pluginProblem.xml:78:9
PluginContent.prototype.updateNotificationUI@resource:///modules/PluginContent.jsm:772:1
PluginContent.prototype.receiveMessage/<@resource:///modules/PluginContent.jsm:60:26
setTimeout_timer@resource://gre/modules/Timer.jsm:30:5
20 INFO Console message: [JavaScript Error: "The character encoding of
the HTML document was not declared. The document will render with
garbled text in some browser configurations if the document
contains characters from outside the US-ASCII range. The character
encoding of the page must be declared in the document or in the transfer
protocol." {file: "
http://127.0.0.1:8888/browser/browse
r/base/content/test/plugins/plugin_data_url.html" line: 0}]
Received CTP notification request 8 in parent for
http://127.0.0.1:8888
Showing a ClickToPlay notification
Re-using pre-existing click-to-play notifications pluginData!!
PluginData has 2 items, and we received information on 1 plugins.
There should have been a single click-to-play notification above
Nabbing click-to-play notification
21 INFO TEST-PASS |
chrome://mochitests/content/browser/browser/base/content/test/plugins/browser_CTP_data_urls.js
| Test 3a, Should have a click-to-play noti
for (let plugin of cwu.plugins) {… is where we suddenly get to the pluginProblem.xml XBL constructor…
What happened was this:
Test 2c runs, or whatever, where we've got the two plugins on screen, and we click to activate the first one.
We activate it, and wait for the plugin to marked activated. Meanwhile,
PluginContent.jsm sends a message back to browser-plugins.jsm updating
it about the plugins on the screen, because instantiated plugins require
that, I guess. But sometimes, we notice the activation, and move on
straight away - we load the next page, and THEN we see the update about
CTP and stuff and use stale plugin information.
So we wait for the CTP update after clicking on the first plugin, and all is well.
Q: Why does the principal check work?
A: Uh… weirdly, it seems as if the principal that gets sent up from that data_uri matches the page that it came from.
Example:
PluginContent is at
data:text/html,<embed%20id="test1"%20style="width:%20200px;%20height:%20200px"%20type="application/x-test"/><embed%20id="test2"%20style="width:%20200px;%20height:%20200px"%20type="application/x-second-test"/>
and our principal is:
http://127.0.0.1:8888
I think that's OK -
according to nsIPrincipal.idl
: "...
which allows the load of a data: URI (which inherits the principal of
its loader)"
Ok, so this is working as suspected. Basically, the Principal is
insufficient for things like blobs and data. What I need to do is send
the actual location up along with the principal, and do a comparison
against that as well. That should fix everything.
Try push with both patches:
https://tbpl.mozilla.org/?tree=Try&rev=92c141444ab2
gfritzsche wants to know:
10:06
(gfritzsche)
mconley: ok, now i remember - we cover up a race there for navigation
between different urls - but what if i just reload the same page?
10:08
(mconley)
gfritzsche: are you referring to the first patch or the second?
10:08
(gfritzsche)
mconley: the one titled "Send location along with CTP messages."
So georg is worried about the case where we start a reload while a message is still in transit.
That's a good point. How do we deal with that?
Ok, this is a pre-existing thing that we're going to have to solve at
some point, but for the sake of the treeherders, we're going to land
these changes, and then I'll file a follow-up bug to try to find a
solution for this entire class of problem.
And landed!