Bug 1259770 - Session restore regressed 400%-500% under e10s with changes in bug 1245891
non-e10s: [{"subtests": [{"replicates": [1046.0, 837.0,
876.0, 840.0, 867.0, 870.0, 841.0, 871.0, 860.0, 854.0], "name":
"sessionrestore", "value": 860.0}]
e10s: [{"subtests": [{"replicates": [1231.0, 918.0, 912.0, 859.0,
819.0, 801.0, 859.0, 778.0, 838.0, 822.0], "name": "sessionrestore",
"value": 838.0}]
Okay, I think I know what's going on.
The work in bug 1245891 takes advantage of a module that got added called StartupPerformance.jsm.
Briefly, StartupPerformance.jsm works by noticing when session restore
init has begun, and then records that time. It also starts a 10 second
timer. It then starts listening for SSTabRestored events to be fired.
Every time one of the SSTabRestored events comes in, we record the
current time, and the 10 second timer is reset.
Once the timer finally goes off (so we haven't seen an SSTabRestored
event in 10 seconds), the SSTabRestored event listener detaches itself,
and the observer that the sessionrestore talos test is listening for is
fired off. The talos test then takes the delta between sessionrestore
init and the last SSTabRestored event time.
So that's how the sessionrestore test is supposed to work.
There's a small problem though, and I think it's causing us to report
different results in the e10s case (even though e10s is not performing
worse).
The problem is that the talos test loads an index.html file along with
the session it's restoring from. That index.html file is not part of the
sessionrestore set, and is passed in the cmdline args to Firefox in
order to do the reporting of the test results back to talos. It's also
the selected tab.
The final puzzle piece is that the initial browser in a new browser
window is non-remote by default. In order for it to go anywhere, some
DocShell / SessionStore stuff occurs where any pre-existing state is
scooped out of the browser, the remoteness is flipped, and then the
scooped out state is sent back down to the new browser / DocShell.
That's important, because that scoop/remoteness-flip/sending of state
results in an SSTabRestored event to be fired, since we're using the
same tab restoration mechanism to do the sending of state.
So how this ties all together is that the browser starts up, it loads
up the session, and in the non-e10s case, because a non-session tab is
selected and loaded (the index.html file), an SSTabRestored event never
fires (since index.html wasn’t in the old session, remember - it was
loaded as part of the cmdline). When StartupPerformance never gets any
SSTabRestored events, the delta that gets computed is the delta between
session restore init and when restoration starts.
For e10s, it’s the same up until the index.html load. In order to load
that into the initial tab, we do the remoteness flip, which fires an
SSTabRestored event, which StartupPerformance notices.
That’s where the discrepancy is.
TL;DR:
non-e10s is measuring the delta between session restore init and when tab restoration starts.
e10s is measuring the delta between session restore init, and when the
index.html of the talos test content page has finished being restored.
To prove my case, here's a try push where I forced the initial browser in the window to be remote (this might also contribute to wins to tpaint, so this was a thing I was already pursuing):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba7b791b1ac8095ffd090b295909c2663c6c2cbe
Try to ignore the high scores from the Linux64 spot instances - I'm not sure how to hide those, and unfortunately, Perfherder mixes the AWS-instance results with the normal Linux64 talos machine results, which kinda throws off the numbers (see bug 1260926 ).
In this case, we appear to at least match non-e10s. On some platforms e10s beats non-e10s significantly.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba7b791b1ac8095ffd090b295909c2663c6c2cbe
Try to ignore the high scores from the Linux64 spot instances - I'm not sure how to hide those, and unfortunately, Perfherder mixes the AWS-instance results with the normal Linux64 talos machine results, which kinda throws off the numbers (see bug 1260926 ).
In this case, we appear to at least match non-e10s. On some platforms e10s beats non-e10s significantly.
So, to sum, I think e10s is not regressing sessionstore performance when we're restoring tabs on demand. In fact, I think it performs better. We just need to make sure the test starts comparing things properly.
Potential solutions:
- Make initial browser remote in the e10s case (useful for tpaint too)
- Make initial browser remote by default, force non-remote in non-e10s case
-
Make StartupPerformance.jsm ignore SSTabRestored for remoteness flips
- Hey, this was easy! I did this one in bug 1261657 , which I just opened a note for.
- Add test index.html to sessionstore.json as the selected tab. Will this help? Or will it just use the initial browser for that one too?
-
Make test open index.html after the test has finished recording. Probably pretty cheap!
- Would need to make the window have about:home selected in order to ensure that it loads.
- Will that work? Will that ensure that we don’t SSTabRestore?
WOOOO THIS IS FIXED MOTHERFUCKER