OODT Product Service - Serving Large Products

Serving Large Products

In the last tutorial , we created a query handler and "installed" it in a product server. We could query it for products (mathematical constants) using the XMLQuery's postfix boolean stacks. The handler would return results by embedding them in the returned XMLQuery. Now we'll return larger products that live outside of the XMLQuery.

What's Large?

There's a giant clam at Pismo Beach, a giant ball of twine in Kansas, and for those who drive SUVs, a giant gas pump . For the OODT framework, large is similarly hard to define.

One of the original architects of the OODT framework thought that putting a products result in with the query meant that you'd never lose the separation between product and the query that generated it. I'm not sure I see the value in that, but regardless, it posed a practical challenge: an XMLQuery object in memory with one or two large results in it will exhaust the Java virtual machine's available memory.

It's even worse in when the XMLQuery is expressed as a textual XML document. In this case, a binary product must be encoded in a text format (we use Base64 ), making the XMLQuery in XML format even larger than as a Java object. Moreover, those XML documents must be parsed at some time to reconstitute them as Java objects. We use a DOM-based parser, which holds the entire document in memory. Naturall y, things tend to explode at this rate.

There is a way out of the quagmire, though. Instead of writing a QueryHandler , write a LargeProductQueryHandler . A QueryHandler puts Result objects into the XMLQuery which hold the entire product. A LargeProductQueryHandler puts LargeResult objects which hold a reference to the product .

Large Handlers and Large Results

The OODT framework provides an extension to the QueryHandler interface called jpl.eda.product.LargeProductQueryHandler . This interface adds two methods that you must implement:

retrieveChunk . This method returns a byte array representing a chunk of the product. The OODT framework calls this method repeatedly to gather chunks of the product for the product client. It takes a product ID (a string) that identifies which product is being retrieved. It also takes an byte offset into the product data and a size of the byte chunk to return. You return the matching chunk.
close . This method is called by the OODT framework to tell the query handler it's done getting a product. It takes a product ID that tells which product is no longer being retrieved. You use this method to perform any cleanup necessary.

Because it extends the QueryHandler interface, you still have to implement the query method. However, as a LargeProductQueryHandler , you can add LargeResult objects to the XMLQuery passed in. LargeResult s identify the product ID (string) that the OODT framework will later use when it calls retrieveChunk and close .

For example, suppose you're serving large images by generating them from various other data sources:

The query method would examine the user's query, consult the various data sources, and generate the image, storing it in a temporary file. It would also assign a string product ID to this file, use that product ID in a LargeResult object, add the LargeResult to the XMLQuery , and return the modified XMLQuery .
Shortly afterward, the OODT framework will repeatedly call the retrieveChunk method. This method would check the product ID passed in and locate the corresponding temporary file generated earlier by the query method. It would index into the file by the offset requested by the framework, read the number of bytes requested by the framework, package that up into a byte array, and return it. Eventually, the OODT framework will have read the entire product this way.
Lastly, the OODT framework will call the close method. This method would check the product ID and locate and delete the temporary file.

To put this into practice, let's create a LargeProductQueryHandler that serves files out of the product server's filesystem.

Writing the Handler

We'll develop a FileHandler that will serve files out of the product server's filesystem. Providing filesystem access through the OODT framework in this way is probably not a very good idea (after all, product clients could request copies of sensitive files), but for a demonstration it'll do.

Because files can be quite large, we'll use a LargeProductQueryHandler . It will serve queries of the form

file = path

where path is the full path of the file the user wants. The handler will add LargeResult s to the XMLQuery, and the product ID will just simply be the path of the requested file. The retrieveChunk method will open the file with the given product ID (which is just the path to the file) and return a block of data out of it. The close method won't need to do anything, since we're not creating temporary files or making network conncetions or anything; there's just nothing to clean up.

Getting the Path

First, let's create a utility method that takes the XMLQuery and returns a java.io.File that matches the requested file. Because the query takes the form

file = path

there should be three QueryElement s on the "where" stack:

The zeroth (topmost) has role = elemName and value = file .
The first (middle) has role = LITERAL and value = the path of the file the user wants.
The last (bottom) has role = RELOP and value = EQ .

We'll reject any other query by returning null from this method. Further, if the file named by the path doesn't exist, or if it's not a file (for example, it's a directory or a socket), we'll return null .

Here's the start of our FileHandler.java :

import java.io.File;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;
public class FileHandler
  implements LargeProductQueryHandler {
  private static File getFile(XMLQuery q) {
    List stack = q.getWhereElementSet();
    if (stack.size() != 3) return null;
    QueryElement e = (QueryElement) stack.get(0);
    if (!"elemName".equals(e.getRole())
      || !"file".equals(e.getValue()))
      return null;
    e = (QueryElement) stack.get(2);
    if (!"RELOP".equals(e.getRole())
      || !"EQ".equals(e.getValue()))
      return null;
    e = (QueryElement) stack.get(1);   	    
    if (!"LITERAL".equals(e.getRole()))
      return null;
    File file = new File(e.getValue());
    if (!file.isFile()) return null;
    return file;
  }
}

Checking the MIME Type

Recall that the user can say what MIME types of products are acceptable by specifying the preference list in the XMLQuery. This lets a product server that serves, say, video clips, convert them to video/mpeg (MPEG-2), video/mpeg4-generic (MPEG-4), video/quicktime (Apple Quicktime), or some other format, in order to better serve its clients.

Since our product server just serves files of any format , we won't really bother with the list of acceptable MIME types. After all, the /etc/passwd file could be a JPEG image on some systems. (Yes, we could go through the extra step of determining the MIME type of a file by looking at its extension or its contents, but this is an OODT tutorial, not a something-else-tutorial!)

However, we will honor the user's wishes by labeling the result's MIME type based on what the user specifies in the acceptable MIME type list. So, if the product client says that image/jpeg is acceptable and the file is /etc/passwd , we'll call /etc/passwd a JPEG image. However, we won't try to read the client's mind: if the user wants image/* , then we'll just say it's a binary file, application/octet-stream .

Here's the code:

import java.util.Iterator;
...
public class FileHandler
  implements LargeProductQueryHandler {
  ...
  private static String getMimeType(XMLQuery q) {
    for (Iterator i = q.getMimeAccept().iterator();
      i.hasNext();) {
      String t = (String) i.next();
      if (t.indexOf('*') == -1) return t;
    }
    return "application/octet-stream";
  }
}

Inserting the Result

Once we've got the file that the user wants and the MIME type to call it, all we have to do is insert the LargeResult . Remember that it's the LargeResult that tells the OODT framework what the product ID is for later retrieveChunk and close calls. The product ID is passed as the first argument to the LargeResult constructor.

We'll write a utility method to insert the LargeResult :

import java.io.IOException;
import java.util.Collections;
import jpl.eda.xmlquery.LargeResult;
...
public class FileHandler
  implements LargeProductQueryHandler {
  ...
  private static void insert(File file, String type,
    XMLQuery q) throws IOException {
    String id = file.getCanonicalPath();
    long size = file.length();
    LargeResult lr = new LargeResult(id, type,
      /*profileID*/null, /*resourceID*/null,
      /*headers*/Collections.EMPTY_LIST, size);
    q.getResults().add(lr);
  }
}

Handling the Query

With our three utility methods in hand, writing the required query method is a piece of cake. Here it is:

import jpl.eda.product.ProductException;
...
public class FileHandler
  implements LargeProductQueryHandler {
  ...
  public XMLQuery query(XMLQuery q)
    throws ProductException {
    try {
      File file = getFile(q);
      if (file == null) return q;
      String type = getMimeType(q);
      insert(file, type, q);
      return q;
    } catch (IOException ex) {
      throw new ProductException(ex);
    }
  }
}

The query method as defined by the QueryHandler interface (and extended into the LargeProductQueryHandler interface) is allowed to throw only one kind of checked exception: ProductException . So, in case the insert method throws an IOException , we transform it into a ProductException .

Now there are just two more required methods to implement, retrieveChunk and close .

Blowing Chunks

The OODT framework repeatedly calls handler's retrieveChunk method to get chunks of the product, evenutally getting the entire product (unless the product client decides to abort the transfer). For our file handler, retrieve chunk just has to

Make sure the file specified by the product ID still exists (after all, it could be deleted at any time, even before the first retrieveChunk got called).
Open the file.
Skip into the file by the requested offset.
Read the requested number of bytes out of the file.
Return those bytes.
Close the file.

We'll write a quick little skip method to skip into a file's input stream:

private static void skip(long offset,
  InputStream in) throws IOException {
  while (offset > 0)
    offset -= in.skip(offset);
}

And here's another little utility method to read a specified number of bytes out of a file's input stream:

private static byte[] read(int length,
  InputStream in) throws IOException {
  byte[] buf = new byte[length];
  int numRead;
  int index = 0;
  int toRead = length;
  while (toRead > 0) {
    numRead = in.read(buf, index, toRead);
    index += numRead;
    toRead -= numRead;
  }
  return buf;
}

(By now, you're probably wondering why we just didn't use java.io.RandomAccessFile ; I'm wondering that too!)

Finally, we can implement the required retrieveChunk method:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
...
public class FileHandler
  implements LargeProductQueryHandler {
  ...
  public byte[] retrieveChunk(String id, long offset,
    int length) throws ProductException {
    BufferedInputStream in = null;
    try {
      File f = new File(id);
      if (!f.isFile()) throw new ProductException(id
        + " isn't a file (anymore?)");
      in = new BufferedInputStream(new FileInputStream(f));
      skip(offset, in);
      byte[] buf = read(length, in);
      return buf;
    } catch (IOException ex) {
      throw new ProductException(ex);
    } finally {
      if (in != null) try {
        in.close();
      } catch (IOException ignore) {}
    }
  }
}

Closing Up

Because the OODT framework has no idea what data sources a LargeProductQueryHandler will eventually consult, what temporary files it may need to clean up, what network sockets it might need to shut down, and so forth, it needs some way to indicate to a query handler that's it's done calling retrieveChunk for a certain product ID . The close method does this.

In our example, close doesn't need to do anything, but we are obligated to implement it:

...
public class FileHandler
  implements LargeProductQueryHandler {
  ...
  public void close(String id) {}
}

Complete Source Code

Here's the complete source file, FileH andler.java :

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.product.ProductException;
import jpl.eda.xmlquery.LargeResult;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;

public class FileHandler
  implements LargeProductQueryHandler {
  private static File getFile(XMLQuery q) {
    List stack = q.getWhereElementSet();
    if (stack.size() != 3) return null;
    QueryElement e = (QueryElement) stack.get(0);
    if (!"elemName".equals(e.getRole())
      || !"file".equals(e.getValue()))
      return null;
    e = (QueryElement) stack.get(2);
    if (!"RELOP".equals(e.getRole())
      || !"EQ".equals(e.getValue()))
      return null;
    e = (QueryElement) stack.get(1);   	    
    if (!"LITERAL".equals(e.getRole()))
      return null;
    File file = new File(e.getValue());
    if (!file.isFile()) return null;
    return file;
  }
  private static String getMimeType(XMLQuery q) {
    for (Iterator i = q.getMimeAccept().iterator();
      i.hasNext();) {
      String t = (String) i.next();
      if (t.indexOf('*') == -1) return t;
    }
    return "application/octet-stream";
  }
  private static void insert(File file, String type,
    XMLQuery q) throws IOException {
    String id = file.getCanonicalPath();
    long size = file.length();
    LargeResult lr = new LargeResult(id, type,
      /*profileID*/null, /*resourceID*/null,
      /*headers*/Collections.EMPTY_LIST, size);
    q.getResults().add(lr);
  }
  public XMLQuery query(XMLQuery q)
    throws ProductException {
    try {
      File file = getFile(q);
      if (file == null) return q;
      String type = getMimeType(q);
      insert(file, type, q);
      return q;
    } catch (IOException ex) {
      throw new ProductException(ex);
    }
  }
  private static void skip(long offset,
    InputStream in) throws IOException {
    while (offset > 0)
      offset -= in.skip(offset);
  }
  private static byte[] read(int length,
    InputStream in) throws IOException {
    byte[] buf = new byte[length];
    int numRead;
    int index = 0;
    int toRead = length;
    while (toRead > 0) {
      numRead = in.read(buf, index, toRead);
      index += numRead;
      toRead -= numRead;
    }
    return buf;
  }
  public byte[] retrieveChunk(String id, long offset,
    int length) throws ProductException {
    BufferedInputStream in = null;
    try {
      File f = new File(id);
      if (!f.isFile()) throw new ProductException(id
        + " isn't a file (anymore?)");
      in = new BufferedInputStream(new FileInputStream(f));
      skip(offset, in);
      byte[] buf = read(length, in);
      return buf;
    } catch (IOException ex) {
      throw new ProductException(ex);
    } finally {
      if (in != null) try {
        in.close();
      } catch (IOException ignore) {}
    }
  }
  public void close(String id) {}
}

Compiling the Code

We'll compile this code using the J2SDK command-line tools, but if you're more comfortable with some kind of Integrated Development Environment (IDE), adjust as necessary.

Let's go back again to the $PS_HOME directory we made earlier; create the file $PS_HOME/src/FileHandler.java with the contents shown above. Then, compile and update the jar file as follows:

% javac -extdirs lib \
  -d classes src/FileHandler.java
% ls -l classes
total 8
-rw-r--r--  1 kelly  kelly  2524 25 Feb 15:46 ConstantHandler.class
-rw-r--r--  1 kelly  kelly  3163 26 Feb 16:15 FileHandler.class
% jar -uf lib/my-handlers.jar \
  -C classes FileHandler.class
% jar -tf lib/my-handlers.jar
META-INF/
META-INF/MANIFEST.MF
ConstantHandler.class
FileHandler.class

We've now got a jar with the ConstantHandler from the last tutorial and our new FileHandler .

Specifying and Running the New Query Handler

The $PS_HOME/bin/ps script already has a system property specifying the ConstantHandler , so we just need to add the FileHandler to that list.

First, stop the product server by hitting CTRL+C (or your interrupt key) in the window in which it's currently running. Then, modify the $PS_HOME/bin/ps script to read as follows:

#!/bin/sh
exec java -Djava.ext.dirs=$PS_HOME/lib \
    -Dhandlers=ConstantHandler,FileHandler \
    jpl.eda.ExecServer \
    jpl.eda.product.rmi.ProductServiceImpl \
    urn:eda:rmi:MyProductService

Then start the server by running $PS_HOME/bin/ps . If all goes well, the product server will be ready to answer queries again, this time passing each incoming XMLQuery to two different query handlers.

Edit the $PS_HOME/bin/pc script once more to make sure the -out and not the -xml command-line argument is being used. Let's try querying for a file:

% $PS_HOME/bin/pc "file = /etc/passwd"
nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
...

If you like, you can change the -out to -xml again and examine the XML version. This time, the product data isn't in the XMLQuery object.

What's the Difference?

On the client side, the interface to get product results in La rgeResult s versus regular Result s is identical. The client calls getInputStream to get a binary stream to read the product data.

There is a speed penalty for large results. What Result.getInputStream returns is an input stream to product data already contained in the XMLQuery. It's a stream to a buffer already in the client's address space, so it's nice and fast.

LargeResult overrides the getInputStream method to instead return an input stream that repeatedly makes calls back to the product server's retrieveChunk method. Since the product is not already in the local address space of the client, getting large products is a bit slower. To compensate, the input stream actually starts a background thread to start retrieving chunks of the product ahead of the product client, up to a certain point (we don't want to run out of memory again).

On the server side, the difference is in programming complexity. Creating a LargeProductQueryHandler requires implementing three methods instead of just one. You may have to clean up temporary files, close network ports, or do other cleanup. You may even have to guard against clients that present specially-crafted product IDs that try to circumvent access controls to products.

LargeResult s are more general, and will work for any size product, from zero bytes on up. And you can even mix and match: a LargeProductQueryHandler can add regular Result s to an XMLQuery as well as LargeResult s. You might program some logic that, under a certain threshold, to return regular Result s for small sized products, and LargeResult s for anything bigger than small.

Conclusion

In this tutorial, we impl emented a LargeProductQueryHandler that served large products. In this case, large could mean zero bytes (empty products) up to gargantuan numbers of bytes. This handler queried for files in the product server's filesystem, which is a bit insecure so you might want to terminate the product server as soon as possible. We also learned that what the advantages and disadvantages were between regular product results and large product results, and that LargeProductQueryHandler s can use LargeResult objects in addition to regular Result objects.

If you've also completed the Your First Product Service tutorial and the Developing a Query Handler tutorial, you are now a master of the OODT Product Service. Congratulations!