|
In the
last tutorial
, we created a query
handler and "installed" it in a product server. We could
query it for products (mathematical constants) using the
XMLQuery's postfix boolean stacks. The handler would return
results by embedding them in the returned XMLQuery. Now we'll
return larger products that live outside of the XMLQuery.
There's a
giant
clam
at Pismo Beach, a
giant
ball of twine
in Kansas, and for those who drive SUVs, a
giant
gas pump
. For the OODT framework, large is similarly hard to define.
One of the original architects of the OODT framework thought
that putting a products result in with the query meant that
you'd never lose the separation between product and the query
that generated it. I'm not sure I see the value in that, but
regardless, it posed a practical challenge: an
XMLQuery
object in memory with one or two large
results in it will exhaust the Java virtual machine's available
memory.
It's even worse in when the XMLQuery is expressed as a
textual XML document. In this case, a binary product must be
encoded in a text format (we use
Base64
),
making the XMLQuery in XML format even larger than as a Java
object. Moreover, those XML documents must be parsed at some
time to reconstitute them as Java objects. We use a DOM-based
parser, which holds the entire document in memory. Naturall
y,
things tend to explode at this rate.
There is a way out of the quagmire, though. Instead of
writing a
QueryHandler
, write a
LargeProductQueryHandler
. A
QueryHandler
puts
Result
objects
into the
XMLQuery
which hold the entire product.
A
LargeProductQueryHandler
puts
LargeResult
objects which hold
a reference to
the product
.
The OODT framework provides an extension to the
QueryHandler
interface called
jpl.eda.product.LargeProductQueryHandler
. This
interface adds two methods that you must implement:
-
retrieveChunk
. This method returns a byte
array representing a chunk of the product. The OODT
framework calls this method repeatedly to gather chunks of
the product for the product client. It takes a
product
ID
(a string) that identifies which product is being
retrieved. It also takes an byte offset into the product
data and a size of the byte chunk to return. You return the
matching chunk.
-
close
. This method is called by the OODT
framework to tell the query handler it's done getting a
product. It takes a
product ID
that tells which
product is no longer being retrieved. You use this method
to perform any cleanup necessary.
Because it extends the
QueryHandler
interface,
you still have to implement the
query
method.
However, as a
LargeProductQueryHandler
, you can
add
LargeResult
objects to the
XMLQuery
passed in.
LargeResult
s
identify the
product ID
(string) that the OODT
framework will later use when it calls
retrieveChunk
and
close
.
For example, suppose you're serving large images by
generating them from various other data sources:
-
The
query
method would examine the user's
query, consult the various data sources, and generate the
image, storing it in a temporary file. It would also assign
a string
product ID
to this file, use that product
ID in a
LargeResult
object, add the
LargeResult
to the
XMLQuery
, and
return the modified
XMLQuery
.
-
Shortly afterward, the OODT framework will repeatedly call
the
retrieveChunk
method. This method would
check the
product ID
passed in and locate the
corresponding temporary file generated earlier by the
query
method. It would index into the file by
the offset requested by the framework, read the number of
bytes requested by the framework, package that up into a
byte array, and return it. Eventually, the OODT framework
will have read the entire product this way.
-
Lastly, the OODT framework will call the
close
method. This method would check the
product ID
and locate and delete the temporary
file.
To put this into practice, let's create a
LargeProductQueryHandler
that serves files out of
the product server's filesystem.
We'll develop a
FileHandler
that will serve
files out of the product server's filesystem. Providing
filesystem access through the OODT framework in this way is
probably not a very good idea (after all, product clients
could request copies of sensitive files), but for a
demonstration it'll do.
Because files can be quite large, we'll use a
LargeProductQueryHandler
. It will serve queries
of the form
file =
path
where
path
is the full path of the file the user
wants. The handler will add
LargeResult
s to the
XMLQuery, and the
product ID
will just simply be the
path
of the requested file. The
retrieveChunk
method will open the file with the
given product ID (which is just the path to the file) and
return a block of data out of it. The
close
method won't need to do anything, since we're not creating
temporary files or making network conncetions or anything;
there's just nothing to clean up.
First, let's create a utility method that takes the
XMLQuery
and returns a
java.io.File
that matches the requested file. Because the query takes the form
file =
path
there should be three
QueryElement
s on the "where" stack:
-
The zeroth (topmost) has role =
elemName
and value =
file
.
-
The first (middle) has role =
LITERAL
and
value = the
path
of the file the user wants.
-
The last (bottom) has role =
RELOP
and
value =
EQ
.
We'll reject any other query by returning
null
from this method. Further, if the file named by the
path
doesn't exist, or if it's not a file (for
example, it's a directory or a socket), we'll return
null
.
Here's the start of our
FileHandler.java
:
import java.io.File;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;
public class FileHandler
implements LargeProductQueryHandler {
private static File getFile(XMLQuery q) {
List stack = q.getWhereElementSet();
if (stack.size() != 3) return null;
QueryElement e = (QueryElement) stack.get(0);
if (!"elemName".equals(e.getRole())
|| !"file".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(2);
if (!"RELOP".equals(e.getRole())
|| !"EQ".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(1);
if (!"LITERAL".equals(e.getRole()))
return null;
File file = new File(e.getValue());
if (!file.isFile()) return null;
return file;
}
}
Recall that the user can say what MIME types of products
are acceptable by specifying the preference list in the
XMLQuery. This lets a product server that serves, say,
video clips, convert them to
video/mpeg
(MPEG-2),
video/mpeg4-generic
(MPEG-4),
video/quicktime
(Apple Quicktime), or some
other format, in order to better serve its clients.
Since our product server just serves
files of any
format
, we won't really bother with the list of
acceptable MIME types. After all, the
/etc/passwd
file
could
be a JPEG
image on some systems. (Yes, we could go through the
extra step of determining the MIME type of a file by
looking at its extension or its contents, but this is an
OODT tutorial, not a something-else-tutorial!)
However, we will honor the user's wishes by labeling the
result's MIME type based on what the user specifies in the
acceptable MIME type list. So, if the product client says
that
image/jpeg
is acceptable and the file is
/etc/passwd
, we'll call
/etc/passwd
a JPEG image. However, we won't
try to read the client's mind: if the user wants
image/*
, then we'll just say it's a binary
file,
application/octet-stream
.
Here's the code:
import java.util.Iterator;
...
public class FileHandler
implements LargeProductQueryHandler {
...
private static String getMimeType(XMLQuery q) {
for (Iterator i = q.getMimeAccept().iterator();
i.hasNext();) {
String t = (String) i.next();
if (t.indexOf('*') == -1) return t;
}
return "application/octet-stream";
}
}
Once we've got the file that the user wants and the MIME
type to call it, all we have to do is insert the
LargeResult
. Remember that it's the
LargeResult
that tells the OODT framework what
the
product ID
is for later
retrieveChunk
and
close
calls.
The
product ID
is passed as the first argument to
the
LargeResult
constructor.
We'll write a utility method to insert the
LargeResult
:
import java.io.IOException;
import java.util.Collections;
import jpl.eda.xmlquery.LargeResult;
...
public class FileHandler
implements LargeProductQueryHandler {
...
private static void insert(File file, String type,
XMLQuery q) throws IOException {
String id = file.getCanonicalPath();
long size = file.length();
LargeResult lr = new LargeResult(id, type,
/*profileID*/null, /*resourceID*/null,
/*headers*/Collections.EMPTY_LIST, size);
q.getResults().add(lr);
}
}
With our three utility methods in hand, writing the
required
query
method is a piece of cake. Here
it is:
import jpl.eda.product.ProductException;
...
public class FileHandler
implements LargeProductQueryHandler {
...
public XMLQuery query(XMLQuery q)
throws ProductException {
try {
File file = getFile(q);
if (file == null) return q;
String type = getMimeType(q);
insert(file, type, q);
return q;
} catch (IOException ex) {
throw new ProductException(ex);
}
}
}
The
query
method as defined by the
QueryHandler
interface (and extended into the
LargeProductQueryHandler
interface) is allowed
to throw only one kind of checked exception:
ProductException
. So, in case the
insert
method throws an
IOException
, we transform it into a
ProductException
.
Now there are just two more required methods to implement,
retrieveChunk
and
close
.
The OODT framework repeatedly calls handler's
retrieveChunk
method to get chunks of the
product, evenutally getting the entire product (unless the
product client decides to abort the transfer). For our file
handler, retrieve chunk just has to
-
Make sure the file specified by the
product ID
still exists (after all, it could be deleted at any time,
even before the first
retrieveChunk
got
called).
-
Open the file.
-
Skip into the file by the requested offset.
-
Read the requested number of bytes out of the file.
-
Return those bytes.
-
Close the file.
We'll write a quick little
skip
method to skip
into a file's input stream:
private static void skip(long offset,
InputStream in) throws IOException {
while (offset > 0)
offset -= in.skip(offset);
}
And here's another little utility method to read a
specified number of bytes out of a file's input stream:
private static byte[] read(int length,
InputStream in) throws IOException {
byte[] buf = new byte[length];
int numRead;
int index = 0;
int toRead = length;
while (toRead > 0) {
numRead = in.read(buf, index, toRead);
index += numRead;
toRead -= numRead;
}
return buf;
}
(By now, you're probably wondering why we just didn't use
java.io.RandomAccessFile
; I'm wondering that
too!)
Finally, we can implement the required
retrieveChunk
method:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
...
public class FileHandler
implements LargeProductQueryHandler {
...
public byte[] retrieveChunk(String id, long offset,
int length) throws ProductException {
BufferedInputStream in = null;
try {
File f = new File(id);
if (!f.isFile()) throw new ProductException(id
+ " isn't a file (anymore?)");
in = new BufferedInputStream(new FileInputStream(f));
skip(offset, in);
byte[] buf = read(length, in);
return buf;
} catch (IOException ex) {
throw new ProductException(ex);
} finally {
if (in != null) try {
in.close();
} catch (IOException ignore) {}
}
}
}
Because the OODT framework has no idea what data sources a
LargeProductQueryHandler
will eventually
consult, what temporary files it may need to clean up, what
network sockets it might need to shut down, and so forth, it
needs some way to indicate to a query handler that's it's
done calling
retrieveChunk
for a certain
product ID
. The
close
method does this.
In our example,
close
doesn't need to do
anything, but we are obligated to implement it:
...
public class FileHandler
implements LargeProductQueryHandler {
...
public void close(String id) {}
}
Here's the complete source file,
FileH
andler.java
:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import jpl.eda.product.LargeProductQueryHandler;
import jpl.eda.product.ProductException;
import jpl.eda.xmlquery.LargeResult;
import jpl.eda.xmlquery.QueryElement;
import jpl.eda.xmlquery.XMLQuery;
public class FileHandler
implements LargeProductQueryHandler {
private static File getFile(XMLQuery q) {
List stack = q.getWhereElementSet();
if (stack.size() != 3) return null;
QueryElement e = (QueryElement) stack.get(0);
if (!"elemName".equals(e.getRole())
|| !"file".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(2);
if (!"RELOP".equals(e.getRole())
|| !"EQ".equals(e.getValue()))
return null;
e = (QueryElement) stack.get(1);
if (!"LITERAL".equals(e.getRole()))
return null;
File file = new File(e.getValue());
if (!file.isFile()) return null;
return file;
}
private static String getMimeType(XMLQuery q) {
for (Iterator i = q.getMimeAccept().iterator();
i.hasNext();) {
String t = (String) i.next();
if (t.indexOf('*') == -1) return t;
}
return "application/octet-stream";
}
private static void insert(File file, String type,
XMLQuery q) throws IOException {
String id = file.getCanonicalPath();
long size = file.length();
LargeResult lr = new LargeResult(id, type,
/*profileID*/null, /*resourceID*/null,
/*headers*/Collections.EMPTY_LIST, size);
q.getResults().add(lr);
}
public XMLQuery query(XMLQuery q)
throws ProductException {
try {
File file = getFile(q);
if (file == null) return q;
String type = getMimeType(q);
insert(file, type, q);
return q;
} catch (IOException ex) {
throw new ProductException(ex);
}
}
private static void skip(long offset,
InputStream in) throws IOException {
while (offset > 0)
offset -= in.skip(offset);
}
private static byte[] read(int length,
InputStream in) throws IOException {
byte[] buf = new byte[length];
int numRead;
int index = 0;
int toRead = length;
while (toRead > 0) {
numRead = in.read(buf, index, toRead);
index += numRead;
toRead -= numRead;
}
return buf;
}
public byte[] retrieveChunk(String id, long offset,
int length) throws ProductException {
BufferedInputStream in = null;
try {
File f = new File(id);
if (!f.isFile()) throw new ProductException(id
+ " isn't a file (anymore?)");
in = new BufferedInputStream(new FileInputStream(f));
skip(offset, in);
byte[] buf = read(length, in);
return buf;
} catch (IOException ex) {
throw new ProductException(ex);
} finally {
if (in != null) try {
in.close();
} catch (IOException ignore) {}
}
}
public void close(String id) {}
}
We'll compile this code using the J2SDK command-line tools,
but if you're more comfortable with some kind of Integrated
Development Environment (IDE), adjust as necessary.
Let's go back again to the
$PS_HOME
directory we
made earlier; create the file
$PS_HOME/src/FileHandler.java
with the contents
shown above. Then, compile and update the jar file as follows:
% javac -extdirs lib \
-d classes src/FileHandler.java
% ls -l classes
total 8
-rw-r--r-- 1 kelly kelly 2524 25 Feb 15:46 ConstantHandler.class
-rw-r--r-- 1 kelly kelly 3163 26 Feb 16:15 FileHandler.class
% jar -uf lib/my-handlers.jar \
-C classes FileHandler.class
% jar -tf lib/my-handlers.jar
META-INF/
META-INF/MANIFEST.MF
ConstantHandler.class
FileHandler.class
We've now got a jar with the
ConstantHandler
from the
last tutorial
and our new
FileHandler
.
The
$PS_HOME/bin/ps
script already has a system
property specifying the
ConstantHandler
, so we
just need to add the
FileHandler
to that list.
First, stop the product server by hitting CTRL+C (or your
interrupt key) in the window in which it's currently running.
Then, modify the
$PS_HOME/bin/ps
script to read
as follows:
#!/bin/sh
exec java -Djava.ext.dirs=$PS_HOME/lib \
-Dhandlers=ConstantHandler,FileHandler \
jpl.eda.ExecServer \
jpl.eda.product.rmi.ProductServiceImpl \
urn:eda:rmi:MyProductService
Then start the server by running
$PS_HOME/bin/ps
. If all goes well, the product
server will be ready to answer queries again, this time
passing each incoming
XMLQuery
to
two
different query handlers.
Edit the
$PS_HOME/bin/pc
script once more to
make sure the
-out
and not the
-xml
command-line argument is being used. Let's try querying for a
file:
% $PS_HOME/bin/pc "file = /etc/passwd"
nobody:*:-2:-2:Unprivileged User:/:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
...
If you like, you can change the
-out
to
-xml
again and examine the XML version. This
time, the product data isn't in the XMLQuery object.
On the client side, the interface to get product results in
La
rgeResult
s versus regular
Result
s
is identical. The client calls
getInputStream
to
get a binary stream to read the product data.
There is a speed penalty for large results. What
Result.getInputStream
returns is an input stream
to product data already contained in the XMLQuery. It's a
stream to a buffer already in the client's address space, so
it's nice and fast.
LargeResult
overrides the
getInputStream
method to instead return an input
stream that repeatedly makes calls back to the product
server's
retrieveChunk
method. Since the product
is
not
already in the local address space of the
client, getting large products is a bit slower. To
compensate, the input stream actually starts a background
thread to start retrieving chunks of the product ahead of the
product client, up to a certain point (we don't want to run
out of memory again).
On the server side, the difference is in programming
complexity. Creating a
LargeProductQueryHandler
requires implementing three methods instead of just one. You
may have to clean up temporary files, close network ports, or
do other cleanup. You may even have to guard against clients
that present specially-crafted product IDs that try to
circumvent access controls to products.
LargeResult
s are more general, and will work for
any size product, from zero bytes on up. And you can even mix
and match: a
LargeProductQueryHandler
can add
regular
Result
s to an XMLQuery as well as
LargeResult
s. You might program some logic that,
under a certain threshold, to return regular
Result
s for small sized products, and
LargeResult
s for anything bigger than small.
In this tutorial, we impl
emented a
LargeProductQueryHandler
that served large
products. In this case, large could mean zero bytes (empty
products) up to gargantuan numbers of bytes. This handler
queried for files in the product server's filesystem, which is
a bit insecure so you might want to terminate the product
server as soon as possible. We also learned that what the
advantages and disadvantages were between regular product
results and large product results, and that
LargeProductQueryHandler
s can use
LargeResult
objects in addition to regular
Result
objects.
If you've also completed the
Your First
Product Service
tutorial and the
Developing a Query Handler
tutorial, you
are now a master of the OODT Product Service.
Congratulations!
|