Bug 1128050 - [e10s] Save page as... doesn't always load from cache
Let's trace where Save Page As operates...
So this looks like it's in the same vein as
bug 1025146
...
function
saveDocument(aDocument, aSkipPrompt)
{
if ( ! aDocument)
throw "Must have a document when calling saveDocument" ;
// We want to use cached data because the document is currently visible.
var ifreq =
aDocument.defaultView
.QueryInterface(Components.interfaces.nsIInterfaceRequestor);
var contentDisposition = null ;
try {
contentDisposition =
ifreq.getInterface(Components.interfaces.nsIDOMWindowUtils)
.getDocumentMetadata( "content-disposition" );
} catch (ex) {
// Failure to get a content-disposition is ok
}
var cacheKey = null ;
try {
cacheKey =
ifreq.getInterface(Components.interfaces.nsIWebNavigation)
.QueryInterface(Components.interfaces.nsIWebPageDescriptor);
} catch (ex) {
// We might not find it in the cache. Oh, well.
}
if (cacheKey && Components.utils.isCrossProcessWrapper(cacheKey)) {
// Don't use a cache key from another process. See bug 1128050.
cacheKey = null ;
}
internalSave(aDocument.location.href, aDocument, null , contentDisposition,
aDocument.contentType, false , null , null ,
aDocument.referrer ? makeURI(aDocument.referrer) : null ,
aDocument, aSkipPrompt, cacheKey);
{
if ( ! aDocument)
throw "Must have a document when calling saveDocument" ;
// We want to use cached data because the document is currently visible.
var ifreq =
aDocument.defaultView
.QueryInterface(Components.interfaces.nsIInterfaceRequestor);
var contentDisposition = null ;
try {
contentDisposition =
ifreq.getInterface(Components.interfaces.nsIDOMWindowUtils)
.getDocumentMetadata( "content-disposition" );
} catch (ex) {
// Failure to get a content-disposition is ok
}
var cacheKey = null ;
try {
cacheKey =
ifreq.getInterface(Components.interfaces.nsIWebNavigation)
.QueryInterface(Components.interfaces.nsIWebPageDescriptor);
} catch (ex) {
// We might not find it in the cache. Oh, well.
}
if (cacheKey && Components.utils.isCrossProcessWrapper(cacheKey)) {
// Don't use a cache key from another process. See bug 1128050.
cacheKey = null ;
}
internalSave(aDocument.location.href, aDocument, null , contentDisposition,
aDocument.contentType, false , null , null ,
aDocument.referrer ? makeURI(aDocument.referrer) : null ,
aDocument, aSkipPrompt, cacheKey);
}
That last bit - we eliminate the cacheKey if the cacheKey came from the
content process. Welp, boom, there's our problem right there. And
smacks of
bug 1025146.
So… how does network caching work with e10s? Does it even work? I
don't think I can wait for mayhemer to come online. I have to
figure this out myself.
4:03 PM
<
jduell
> mconley: pong
4:04 PM
<
mconley
> jduell: hiya! was wondering if you knew the answer to (or knew someone who might know the answer to) this question:
https://bugzilla.mozilla.org/show_bug.cgi?id=1025146#c14
4:04 PM
<
mconley
> jduell: I haven't seen mayhemer online, unfortunately. :/
4:05 PM
<
jduell
> :michal is the other likely person to be able to answer.
4:05 PM
<
jduell
> I'm guessing he's asleep now despite being logged into IRC.
4:05 PM
<
jduell
> mconley: I'll needinfo him too
4:06 PM
<
mconley
> jduell: unfortunately, michal referred me to mayhemer. :/
4:06 PM
<
jduell
> meh
4:08 PM
<
jduell
> mconley: ok, let's see if we can figure this out
4:08 PM
<
jduell
> so...
4:08 PM
<
jduell
> nsICachingChannel has too many things in the interface to export it fully to the child
4:09 PM
<
jduell
> instead we split out a subset (nsICacheInfoChannel) of the fields and implement only that on the child
"So bug 1058251 was the original bug for making "Save Page As" work,
and what happened was that a band-aid fix was landed to get basic
functionality, and bug 1101100 was spun out to do the more hardcore
permanent fix.
The bandaid fix allows us to save pages in the e10s case, but we still
hit the network to do it in at least some cases. Bug 1101100 is the
long-term fix for this, as it will move all of the document walking into
the content process, and just blast up information to the parent to
write to disk.
Bug 1101100 is currently M8, so it's clear that full Save As
functionality isn't a priority. But this bug is M6 meaning that we want
the bandaid fix to be able to work with at least some caching.
So I guess I need to document which cases we hit the network on in the
e10s case when saving pages, and try to come up with some workarounds
until bug 1101100 is all set."
"I forgot to mention - one thing that we might want to do is to grab
and serialize the cache key from the nsISHEntry of the loaded document
from the content process, and re-attempt to use it in the parent. That
might be a potential workaround.
That would require bug 1156493."
Cases to test:
Save a basic page
Not exactly the same… why?
Save a page I got to via POST
Completely busted with e10s.
Re-requests with non-e10s.
Save a page with assets
e10s: Images are linked to
non-e10s: Images are downloaded, document is updated
Save a frame
Save Link As...
Save an image
Seems to work...
Save Page As when the page is an image.
Seems to work...
My report:
"Ok, I did (what I think to be) a reasonably thorough examination of
the Save Page As, and comparison between e10s and non-e10s.
My findings:
# Basic page (no image, script or style assets, just text with some links):
Saving a basic page works mostly the same. In e10s, we don't resolve
relative URLs in anchors to absolute URLs, whereas we do with non-e10s.
# A page reached via a POST request
This is busted with e10s - we throw a
NS_ERROR_XPC_CANT_PASS_CPOW_TO_NATIVE when calling
nsIWebBrowserPersist.savePrivacyAwareURI.
# Saving a page with image, script, style assets
We just save the document with e10s. In non-e10s, we save a folder full
of assets (css files, script files, images, subframes, and assets for
those subframes).
As with the basic page, the e10s case doesn't resolve relative URLs to absolutes.
# Save a frame.
Same behaviour as when saving a page with assets.
# Save Link As
This seems to work the same between e10s and non-e10s. That's not too
surprising, because I think the e10s case is using the Save Link As
mechanism as part of its "bandaid" fix.
# Save image as
This seems to work the same between e10s and non-e10s, and seems to
properly request assets from the cache. The Browser Toolbox did not
record any network activity when saving images in either case.
# Save Page As when the page is an image
Same as above."
and:
"It seems to me that all of the above will be fixed with bug 1101100 fixed.
Probably the most serious part of this is the total failure to save the
page if we reached it by way of a POST request. This is just flat-out
broken.
I assume if the rest of the "bandaid" fix was acceptable, and the
proper fix is milestoned at M8, then we're OK to keep the above
behaviour with the exception of the POST case.
So, until I hear otherwise, I will be attempting to make the Save Page
As case work in the POST case - though it might have the same
deficiencies as the rest of the Save Page As behaviour in e10s (no
assets, no absolute URLs)."
Plan:
Extract the nsISHEntry as the "cache key" instead of the
pageDescriptor. It looks like nsWebBrowserPersist::SaveURIInternal
will accept that:
nsCOMPtr<nsISupports> cacheKey;
if (aCacheKey)
{
// Test if the cache key is actually a web page descriptor (docshell)
// or session history entry.
nsCOMPtr<nsISHEntry> shEntry = do_QueryInterface(aCacheKey);
if (!shEntry)
{
nsCOMPtr<nsIWebPageDescriptor> webPageDescriptor =
do_QueryInterface(aCacheKey);
if (webPageDescriptor)
{
nsCOMPtr<nsISupports> currentDescriptor;
webPageDescriptor->GetCurrentDescriptor(getter_AddRefs(currentDescriptor));
shEntry = do_QueryInterface(currentDescriptor);
}
}
if (shEntry)
{
shEntry->GetCacheKey(getter_AddRefs(cacheKey));
}
else
{
// Assume a plain cache key
cacheKey = aCacheKey;
}
}
And then run it through a utility function to be added to BrowserUtils - make SHEntry from CPOW.
In that function, extract the postData from the CPOW, and set it on the newly created nsISHEntry.
Then, in contentAreaUtils, instead of getting the postData from the document, get it from the nsISHEntry.
Profit?
Ok, let's lay it out:
Gah… this might not work. I can't find a nice way of doing this without passing a CPOW to native code, which is not allowed. :(
Ok, new plan. Alter the saveDocument function signature to take the
browser (and optional outerWindowID) as well. Then, if the document is a
CPOW, send a message down to get a serialized version of the nsISHEntry
sent back up, which we deserialize, reconstruct, and pass (along with
the CPOW document) to internalSave.
It doesn't look like the nsISHEntry serialization / deserialization
that SessionHistory offers will serialize or deserialize the postData.
Maybe I should add that.
Copy serializeEntry and deserializeEntry to BrowserUtils. Probably will
have to move anything they call too, like serializeOwner.
Have contentAreaUtils send a message down to the child if the document
is a CPOW, requesting a serialized nsISHEntry. Adds a listener for a
response from the child, which takes the serialized nsISHEntry,
deserializes it, and then passes that along to internalSave
asynchronously.
function getPostDataStream(aStringData, aKeyword, aEncKeyword, aType) {
var dataStream = Cc["@mozilla.org/io/string-input-stream;1"].
createInstance(Ci.nsIStringInputStream);
aStringData = aStringData.replace(/%s/g, aEncKeyword).replace(/%S/g, aKeyword);
dataStream.data = aStringData;
var mimeStream = Cc["@mozilla.org/network/mime-input-stream;1"].
createInstance(Ci.nsIMIMEInputStream);
mimeStream.addHeader("Content-Type", aType);
mimeStream.addContentLength = true;
mimeStream.setData(dataStream);
return mimeStream.QueryInterface(Ci.nsIInputStream);
}
Oh, snap! Here's the start of the implementation for nsIMIMEInputStream:
using namespace mozilla::ipc;
class nsMIMEInputStream : public nsIMIMEInputStream,
public nsISeekableStream,
public nsIIPCSerializableInputStream
{
nsIIPCSerializableInputStream…. that's... that's exactly what I need.
So maybe I can just send this over the wire in the JSON group? Maybe?
Testing...
Whoa. Seems to work. \o/
But wait… when I examine the postData in the parent process… it's just an Object. So this is actually not working. :(
bent says I might be able to just do it through a String buffer.
Ahhh - and now Mossop has pointed me to
bug 1129957
which has everything I need! \o/ ASKING WORKS.
FUCK. I can't use nsIScriptableInputStream because I can't read the stream off of the main thread, because it's blocking… bah.
5:55 PM
<
michal
> mconley: you should be able to use the stream copier in JS
Hm...
I thought I had this working. But billm has given me a test case that appears to still be busted.
Go there. Put in a string and file. Then attempt to save the page. 500. :(
Ohhh - no, this works. I just built wrong. \o/
But for some reason, I can only save once!
return new Promise((resolve, reject) => {
let postData = aSHEntry.postData;
//if (!postData || !(postData instanceof Ci.nsICloneableInputStream)) {
return resolve(entry);
//}
// This function is called from the content process, which means
// that the postData nsIInputStream can't be read off of the main
// thread. We have to do some gymnastics to extract the data as
// a string to send up to the parent.
//
// This is all kind of depressing, since the data was in the parent
// to begin with. This will all get torn out when bug 1128050 is fixed.
let ss = Cc["@mozilla.org/storagestream;1"]
.createInstance(Ci.nsIStorageStream);
ss.init(1024, postData.available(), null);
let outStream = ss.getOutputStream(0);
let streamClone = postData.clone();
NetUtil.asyncCopy(streamClone, outStream, (result) => {
if (!Components.isSuccessCode(result)) {
return reject(result);
}
let inStream = ss.newInputStream(0);
let str = NetUtil.readInputStreamToString(inStream, inStream.available());
entry.postData = str;
return resolve(entry);
});
});
return new Promise((resolve, reject) => {
let postData = aSHEntry.postData;
if (!postData || !(postData instanceof Ci.nsISeekableStream)) {
return resolve(entry);
}
// This function is called from the content process, which means
// that the postData nsIInputStream can't be read off of the main
// thread. We have to do some gymnastics to extract the data as
// a string to send up to the parent.
//
// This is all kind of depressing, since the data was in the parent
// to begin with. This will all get torn out when bug 1128050 is fixed.
let ss = Cc["@mozilla.org/storagestream;1"]
.createInstance(Ci.nsIStorageStream);
ss.init(1024, postData.available(), null);
let outStream = ss.getOutputStream(0);
let copier = Cc["@mozilla.org/network/async-stream-copier;1"]
.createInstance(Ci.nsIAsyncStreamCopier2);
// Don't close the source so we can rewind it when we're done.
copier.init(postData, outStream, null, 0, false, true);
copier.asyncCopy({
onStartRequest(aRequest, aContext) {},
onStopRequest(aRequest, aContext, aStatusCode) {
if (!Components.isSuccessCode(aStatusCode)) {
return reject(aStatusCode);
}
let inStream = ss.newInputStream(0);
let str = NetUtil.readInputStreamToString(inStream, inStream.available());
entry.postData = str;
postData.seek(Ci.nsISeekableStream.NS_SEEK_SET, 0);
return resolve(entry);
},
}, null);
});
billm's idea of just grabbing the cache key from the CPOW, converting
it to an integer in the parent, and passing that to nsIWebBrowserPersist
is way better - way simpler too.
Follow-ups:
File a bug for the POST page after cache cleared problem -
bug 1163642