Bug 1128050 - [e10s] Save page as... doesn't always load from cache

Let's trace where Save Page As operates...

So this looks like it's in the same vein as bug 1025146 ...

function saveDocument(aDocument, aSkipPrompt)
{
if ( ! aDocument)
throw "Must have a document when calling saveDocument" ;

// We want to use cached data because the document is currently visible.
var ifreq =
aDocument.defaultView
.QueryInterface(Components.interfaces.nsIInterfaceRequestor);

var contentDisposition = null ;
try {
contentDisposition =
ifreq.getInterface(Components.interfaces.nsIDOMWindowUtils)
.getDocumentMetadata( "content-disposition" );
} catch (ex) {
// Failure to get a content-disposition is ok
}

var cacheKey = null ;
try {
cacheKey =
ifreq.getInterface(Components.interfaces.nsIWebNavigation)
.QueryInterface(Components.interfaces.nsIWebPageDescriptor);
} catch (ex) {
// We might not find it in the cache. Oh, well.
}

if (cacheKey && Components.utils.isCrossProcessWrapper(cacheKey)) {
// Don't use a cache key from another process. See bug 1128050.
cacheKey = null ;
}

internalSave(aDocument.location.href, aDocument, null , contentDisposition,
aDocument.contentType, false , null , null ,
aDocument.referrer ? makeURI(aDocument.referrer) : null ,
aDocument, aSkipPrompt, cacheKey);

}

That last bit - we eliminate the cacheKey if the cacheKey came from the content process. Welp, boom, there's our problem right there. And smacks of bug 1025146.

So… how does network caching work with e10s? Does it even work? I don't think I can wait for mayhemer to come online. I have to figure this out myself.

4:03 PM < jduell > mconley: pong

4:04 PM < mconley > jduell: hiya! was wondering if you knew the answer to (or knew someone who might know the answer to) this question: https://bugzilla.mozilla.org/show_bug.cgi?id=1025146#c14

4:04 PM < mconley > jduell: I haven't seen mayhemer online, unfortunately. :/

4:05 PM < jduell > :michal is the other likely person to be able to answer.

4:05 PM < jduell > I'm guessing he's asleep now despite being logged into IRC.

4:05 PM < jduell > mconley: I'll needinfo him too

4:06 PM < mconley > jduell: unfortunately, michal referred me to mayhemer. :/

4:06 PM < jduell > meh

4:08 PM < jduell > mconley: ok, let's see if we can figure this out

4:08 PM < jduell > so...

4:08 PM < jduell > nsICachingChannel has too many things in the interface to export it fully to the child

4:09 PM < jduell > instead we split out a subset (nsICacheInfoChannel) of the fields and implement only that on the child

My comments in the bug:

"So bug 1058251 was the original bug for making "Save Page As" work, and what happened was that a band-aid fix was landed to get basic functionality, and bug 1101100 was spun out to do the more hardcore permanent fix.

The bandaid fix allows us to save pages in the e10s case, but we still hit the network to do it in at least some cases. Bug 1101100 is the long-term fix for this, as it will move all of the document walking into the content process, and just blast up information to the parent to write to disk.

Bug 1101100 is currently M8, so it's clear that full Save As functionality isn't a priority. But this bug is M6 meaning that we want the bandaid fix to be able to work with at least some caching.

So I guess I need to document which cases we hit the network on in the e10s case when saving pages, and try to come up with some workarounds until bug 1101100 is all set."

"I forgot to mention - one thing that we might want to do is to grab and serialize the cache key from the nsISHEntry of the loaded document from the content process, and re-attempt to use it in the parent. That might be a potential workaround.

That would require bug 1156493."

Cases to test:

Save a basic page

Not exactly the same… why?

Save a page I got to via POST

Completely busted with e10s.

Re-requests with non-e10s.

Save a page with assets

e10s: Images are linked to

non-e10s: Images are downloaded, document is updated

Save a frame

Save Link As...

Save an image

Seems to work...

Save Page As when the page is an image.

Seems to work...

My report:

"Ok, I did (what I think to be) a reasonably thorough examination of the Save Page As, and comparison between e10s and non-e10s.

My findings:

# Basic page (no image, script or style assets, just text with some links):

Saving a basic page works mostly the same. In e10s, we don't resolve relative URLs in anchors to absolute URLs, whereas we do with non-e10s.

# A page reached via a POST request

This is busted with e10s - we throw a NS_ERROR_XPC_CANT_PASS_CPOW_TO_NATIVE when calling nsIWebBrowserPersist.savePrivacyAwareURI.

# Saving a page with image, script, style assets

We just save the document with e10s. In non-e10s, we save a folder full of assets (css files, script files, images, subframes, and assets for those subframes).

As with the basic page, the e10s case doesn't resolve relative URLs to absolutes.

# Save a frame.

Same behaviour as when saving a page with assets.

# Save Link As

This seems to work the same between e10s and non-e10s. That's not too surprising, because I think the e10s case is using the Save Link As mechanism as part of its "bandaid" fix.

# Save image as

This seems to work the same between e10s and non-e10s, and seems to properly request assets from the cache. The Browser Toolbox did not record any network activity when saving images in either case.

# Save Page As when the page is an image

Same as above."

and:

"It seems to me that all of the above will be fixed with bug 1101100 fixed.

Probably the most serious part of this is the total failure to save the page if we reached it by way of a POST request. This is just flat-out broken.

I assume if the rest of the "bandaid" fix was acceptable, and the proper fix is milestoned at M8, then we're OK to keep the above behaviour with the exception of the POST case.

So, until I hear otherwise, I will be attempting to make the Save Page As case work in the POST case - though it might have the same deficiencies as the rest of the Save Page As behaviour in e10s (no assets, no absolute URLs)."

Plan:

Extract the nsISHEntry as the "cache key" instead of the pageDescriptor. It looks like nsWebBrowserPersist::SaveURIInternal will accept that:

   nsCOMPtr<nsISupports> cacheKey;
    if (aCacheKey)
    {
        // Test if the cache key is actually a web page descriptor (docshell)
        // or session history entry.
        nsCOMPtr<nsISHEntry> shEntry = do_QueryInterface(aCacheKey);
        if (!shEntry)
        {
            nsCOMPtr<nsIWebPageDescriptor> webPageDescriptor =
                do_QueryInterface(aCacheKey);
            if (webPageDescriptor)
            {
                nsCOMPtr<nsISupports> currentDescriptor;
                webPageDescriptor->GetCurrentDescriptor(getter_AddRefs(currentDescriptor));
                shEntry = do_QueryInterface(currentDescriptor);
            }
        }

        if (shEntry)
        {
            shEntry->GetCacheKey(getter_AddRefs(cacheKey));
        }
        else
        {
            // Assume a plain cache key
            cacheKey = aCacheKey;
        }
    }

And then run it through a utility function to be added to BrowserUtils - make SHEntry from CPOW.

In that function, extract the postData from the CPOW, and set it on the newly created nsISHEntry.

Then, in contentAreaUtils, instead of getting the postData from the document, get it from the nsISHEntry.

Profit?

Ok, let's lay it out:

~~Add makeSHEntryFromCPOW, with postData copying powers.~~

Gah… this might not work. I can't find a nice way of doing this without passing a CPOW to native code, which is not allowed. :(

~~Have contentAreaUtils call that function with the nsISHEntry gotten from the document if the document is a CPOW.~~

~~Have contentAreaUtils get the postData from the nsISHEntry to pass to the nsIWebBrowserPersist~~

Ok, new plan. Alter the saveDocument function signature to take the browser (and optional outerWindowID) as well. Then, if the document is a CPOW, send a message down to get a serialized version of the nsISHEntry sent back up, which we deserialize, reconstruct, and pass (along with the CPOW document) to internalSave.

It doesn't look like the nsISHEntry serialization / deserialization that SessionHistory offers will serialize or deserialize the postData. Maybe I should add that.

Copy serializeEntry and deserializeEntry to BrowserUtils. Probably will have to move anything they call too, like serializeOwner.

Have contentAreaUtils send a message down to the child if the document is a CPOW, requesting a serialized nsISHEntry. Adds a listener for a response from the child, which takes the serialized nsISHEntry, deserializes it, and then passes that along to internalSave asynchronously.

~~Add postData support to serializeSHEntry and deserializeSHEntry~~

~~There is example code being used in browser.js that I can work off of.~~

~~To convert string data back into an nsIInputStream:~~

function getPostDataStream(aStringData, aKeyword, aEncKeyword, aType) {
  var dataStream = Cc["@mozilla.org/io/string-input-stream;1"].
                   createInstance(Ci.nsIStringInputStream);
  aStringData = aStringData.replace(/%s/g, aEncKeyword).replace(/%S/g, aKeyword);
  dataStream.data = aStringData;

  var mimeStream = Cc["@mozilla.org/network/mime-input-stream;1"].
                   createInstance(Ci.nsIMIMEInputStream);
  mimeStream.addHeader("Content-Type", aType);
  mimeStream.addContentLength = true;
  mimeStream.setData(dataStream);
  return mimeStream.QueryInterface(Ci.nsIInputStream);
}

Oh, snap! Here's the start of the implementation for nsIMIMEInputStream:

using namespace mozilla::ipc;

class nsMIMEInputStream : public nsIMIMEInputStream,
                          public nsISeekableStream,
                          public nsIIPCSerializableInputStream
{

nsIIPCSerializableInputStream…. that's... that's exactly what I need.

So maybe I can just send this over the wire in the JSON group? Maybe?

Testing...

Whoa. Seems to work. \o/

But wait… when I examine the postData in the parent process… it's just an Object. So this is actually not working. :(

bent says I might be able to just do it through a String buffer.

Ahhh - and now Mossop has pointed me to bug 1129957 which has everything I need! \o/ ASKING WORKS.

FUCK. I can't use nsIScriptableInputStream because I can't read the stream off of the main thread, because it's blocking… bah.

5:55 PM < michal > mconley: you should be able to use the stream copier in JS

Hm...

I thought I had this working. But billm has given me a test case that appears to still be busted.

http://www.cs.tut.fi/~jkorpela/forms/file.html

Go there. Put in a string and file. Then attempt to save the page. 500. :(

Ohhh - no, this works. I just built wrong. \o/

But for some reason, I can only save once!

~~Make it so I can save more than once! Rewind the stream?~~ I can't do that on the main thread. GAHHHH

return new Promise((resolve, reject) => {

let postData = aSHEntry.postData;

//if (!postData || !(postData instanceof Ci.nsICloneableInputStream)) {

return resolve(entry);

//}

// This function is called from the content process, which means

// that the postData nsIInputStream can't be read off of the main

// thread. We have to do some gymnastics to extract the data as

// a string to send up to the parent.

// This is all kind of depressing, since the data was in the parent

// to begin with. This will all get torn out when bug 1128050 is fixed.

let ss = Cc["@mozilla.org/storagestream;1"]

.createInstance(Ci.nsIStorageStream);

ss.init(1024, postData.available(), null);

let outStream = ss.getOutputStream(0);

let streamClone = postData.clone();

NetUtil.asyncCopy(streamClone, outStream, (result) => {

if (!Components.isSuccessCode(result)) {

return reject(result);

}

let inStream = ss.newInputStream(0);

let str = NetUtil.readInputStreamToString(inStream, inStream.available());

entry.postData = str;

return resolve(entry);

});

return new Promise((resolve, reject) => {

let postData = aSHEntry.postData;

if (!postData || !(postData instanceof Ci.nsISeekableStream)) {

return resolve(entry);

}

// This function is called from the content process, which means

// that the postData nsIInputStream can't be read off of the main

// thread. We have to do some gymnastics to extract the data as

// a string to send up to the parent.

// This is all kind of depressing, since the data was in the parent

// to begin with. This will all get torn out when bug 1128050 is fixed.

let ss = Cc["@mozilla.org/storagestream;1"]

.createInstance(Ci.nsIStorageStream);

ss.init(1024, postData.available(), null);

let outStream = ss.getOutputStream(0);

let copier = Cc["@mozilla.org/network/async-stream-copier;1"]

.createInstance(Ci.nsIAsyncStreamCopier2);

// Don't close the source so we can rewind it when we're done.

copier.init(postData, outStream, null, 0, false, true);

copier.asyncCopy({

onStartRequest(aRequest, aContext) {},

onStopRequest(aRequest, aContext, aStatusCode) {

if (!Components.isSuccessCode(aStatusCode)) {

return reject(aStatusCode);

}

let inStream = ss.newInputStream(0);

let str = NetUtil.readInputStreamToString(inStream, inStream.available());

entry.postData = str;

postData.seek(Ci.nsISeekableStream.NS_SEEK_SET, 0);

return resolve(entry);

}, null);

});

billm's idea of just grabbing the cache key from the CPOW, converting it to an integer in the parent, and passing that to nsIWebBrowserPersist is way better - way simpler too.

Follow-ups:

File a bug for the POST page after cache cleared problem - bug 1163642