/* Conjugate gradient solver and Titanium/Java/PETSc introduction. Assumes a background in C and basic familiarity with OOP. Definitely not a substitute for a good intro book on Java, such as _Core Java_ or _Java in a Nutshell_ or even _Learn Java in 21 Days_. Also not a substitute for the Titanium reference manual and the official tutorial, available from the Ti webpage. */ /* This program reads a matrix in PETSc format from a file and solves Ax=b for a randomly generated b, using the conjugate gradient method. */ /* Here I need to say "import" to tell the compiler that I will be using some of the Java I/O classes, which are in the "java.io" package. The star does what you'd expect; it imports all the classes in that package. */ import java.io.*; /* Same thing, except to tell the compiler about the PETSc classes. Generally you need to import anything that's not in the java.lang package. */ import petsc.*; /* Just FYI, the "package" concept I referred to is a group of classes; the closest concept to it in C programming is a library. But libraries are different because they are not a feature of the language; if I have a "printf" function and some library has that also, only one of the the printf functions will make it into the executable. There's no way to specify in my code which library I want when I call printf(). The Java language lets you have classes with the same name in different packages, and it lets you specify which one you want by attaching the package name. All of the PETSc classes live in the "petsc" package. I will declare the code in this file as belonging to the petsc package, and then I can access any of the classes within it without having to specify a package name. */ /* This is the class declaration for the program. Something like this is always here, in every Java or Ti program that you write. In these languages, executable code cannot live by itself outside a class. Since I want to write some executable code, I need to make up a class to put it in. More about the syntax: "public" means that the class is visible outside of itself and its package. This isn't really important because there's no other code that needs to access the ConjGrad class, but it's a good habit to put it in because you're supposed to be thinking about extensibility when you write any object-oriented code. If I wanted to make ConjGrad a subclass of some other class, I would say "extends" and then the name of that class. */ public class SeqConjGrad { /* These are class variable declarations. "public" means that the variable can be accessed outside of the class. Again, this isn't really important for this code. It's just a habit I have.... "final" means that the variable can never be changed; it is a constant. This is supposed to help the compiler perform some optimizations, but I have my doubts. The types "double" and "int" should be familiar to you from C. ints are 32 bits long, and doubles are 64. String is the name of a class that implements strings. I don't need to import it before using its name because it's one of the Java standard utility classes. So what does "static" mean? It means that the variable or function qualified by the "static" is a member of the class, not the objects created (instantiated) by that class. When you perform a simple class declaration, like so: class foo { int i; } you are declaring that any objects of type "foo" will contain one integer, called "i". There can be many objects, and each object can have a different value for i. Unless I create an object with "new foo", i has no meaning. But what a lot of C++ books gloss over is that the class itself can have data and code. The class actually exists in your program independent of any objects of its type; when I say "new foo" I am telling the runtime code to use the class as a factory to create an object of type foo. If I change the class declaration to read: class foo { static int i; } I am now saying that i is a class variable. It is compiled right into the executable and exists when my program starts. I don't need to create any objects for i to have meaning or be accessible. But there is only one i in my program, because there is only one class foo, even though there can be many objects of type foo. I can also declare a method (function) to be static: class foo { static int i; // class variable int j; // object variable static func() { i = 5; } func2() { i = 5; j = 6; } } I call func() by saying "foo.func()". I call func2() by creating an object and calling func2() on it: foo f = new foo(); f.func2(); func() is a member function of the class, not of any objects created from it. This means that it can only access class variables. It would be an error for it to access j, because func() does not belong to any object, whereas there is a different j for each object of type foo created by the program. The compiler would not know which j func() would be referring to. func2() can access both i and j, because the code belongs to the object that I called it on ("f" in the example above), and the runtime system knows which j func2() is referring to---it is "f"'s j. Similarly, it knows which i func2() is referring to, because there can be only one i in the program. The advantage of having a static method like func() is that it is there when my program starts. I do not need to create an object to execute that code like I did with func2(). So I guess what's important to know is that you use objects and object variables when you want many independent sets of data, all with their own values. Object methods are where you put the code that manipulates the data for each object. Vec and Mat are coded like this because you want many vectors and matrices in your program, and code that works with each of these individually. You use class variables and methods when you only want to have one instance of your code or data, and you want it to be there when your program starts up. Notice that the function main() is static. It must be static, because there are no objects existing when the program starts, and you would not be able to execute it otherwise. Similarly, the variables epsilon, thisProc, numProcs, and filename are static because I only want one of those; I am not expecting to create multiple ConjGrad solver objects and have multiple independent values for these variables. The method parseArgs is the same way; it deals with the static class data for ConjGrad and does not need to be attached to an object. Read the comment above "main" if you've digested this and want to know about parallelism. */ public static final double epsilon = 1.0e-6; public static int thisProc; public static int numProcs; public static String filename = null; public static void parseArgs(String[] args) { for (int i = 0; i < args.length; i++) { if (args[i].equals("-f")) { filename = args[i+1]; i++; } } } /* So what about parallelism? Maybe I do want to create ConjGrad objects so I can have one on each processor. Well, if I were executing parallel code with Java's threads (which Titanium does not currently support), I would have to do exactly this. I would have to put the body of the CG solver into a non-static method, and have main create a bunch of CG objects and run them in parallel using that method as a starting point. Well, that's not how Titanium does things. The parallelism is *implicit*, like in the MPI model. The different threads or processes that will execute your code and hold your data are spawned by the operating system when your program loads; your program does not do it. So main() is fully parallel from the get-go; it is as if you were running a separate copy of your program on each processor. Remember when I said that there is only one instance of each static variable because there is only one instance of each class? I lied. There is one *per program*, and since the Titanium model is as if you were running one copy of your program on each processor, there turns out to be one instance of each static variable per processor. But you can't directly access static data on other processors, which is why I didn't mention parallelism before. Your static code and methods will not be affected by the copies of them on the other processors, but they can have a different value on each processor. You (and Andy Begel) might argue that it is more in keeping with the meaning of "static" to just have one, period. But this is hard to accomplish on distributed systems, so there's one per processor, even on shared-memory machines. Hmmm...what about the "single" keyword below? If you remember MPI, you know that there are MPI calls that every processor has to hit, because the processors can only communicate when they hit MPI calls. MPI_Broadcast() is a good example; all the processes have to agree to get together at some point in the program to exchange data. It is an error if one (or more) processes skip over the broadcast point with an if or other conditional construct. The gcc compiler and the MPI library do not figure out if your program is doing this when you compile; you will only find out at runtime *if* the broadcast is actually performed. It's tough (impossible, actually) for the compiler to determine if there are these sorts of errors in every possible program. The illustrious Titanium developers (primarily David Gay, I think) realized that it is possible to have these errors detected by the compiler if the user agrees to restrict his program in ways that make this analysis much easier. This is what "single" is for---to help the compiler figure out if you're doing something wrong. So the way this works is that any point in your program which all processes must hit is a "single point". Any methods which contain these single points, or any methods which call those methods, become single in themselves---all processes must now hit those method calls. You used to have to show the compiler that you understood this by explicitly declaring those methods to be "single", but I don't think you need to do this any more. But what you do have to do is tell the compiler that any variables involved in conditionals that could break the single points are single. Example: class foo { public static single void main(String args[]) { int i; ....do some stuff... if (i == 5) { barrier(); } } } This is an incorrect program; all processes must hit the barrier, and it is not clear to the compiler if i will be 5 on all processes. But you can help it out by declaring that i is single. That means you are promising the compiler that i will have the same value on each process all the time. The compiler doesn't make sure of this; *you* do. It will help you by telling you that you are breaking the rules: single int i; int j; i = j; // j could be different on each process, breaking the singleness of // i so for the above example you would have to declare j as single, or find some other way to write your code. Single is sort of like a contagious disease; anything declared single will force anything it touches to be single. You can fake out the single analysis if you're clever (mainly by casting things to single), and it's more conservative than it needs to be, but it does help you think about what you're doing when you perform communication. */ /* So do you communicate between processes? You do this two ways: explicitly with broadcasts, or semi-implicitly with global pointers. A broadcast is what you'd expect. You say: i = broadcast x from 0; which stores into every processor's i the value of x on processor 0. You can also do all-processor exchanges using arrays, but I'm not going to cover that here. Global pointers are another communication method: foo f; if (Ti.thisProc() == 0) { f = new foo(); } f = broadcast f from 0; if (Ti.thisProc() == 1) { f.i = 5; } What's happening here is that process 0 is creating an f object, and then broadcasting a pointer to this object to everyone else. There's no explicit pointer syntax in Titanium; every object reference is always a pointer. Then process 1 is writing the value 5 into the f object's i field. Since f is a pointer to an object that lives on process 0, this involves communication. Careless use of global pointers can lead to severe performance degradation on distributed systems. Every pointer is global by default, but it also points to things on the local process by default, so there's no communication normally done. Global pointers are the basic communication design pattern for Titanium code; you create data local to one process, and broadcast a pointer to that data to the others. Then they write and read that data, and the communication is done implicitly. */ /* From here on it's mostly just a bunch of math. */ public static single void main(String[] args) { /* Ti is a class; these are class methods that tell a process what its process number is, and how many processes there are. The process numbers go from 0 to N-1, just as you'd expect. I'm storing these into variables only because I want to save typing five characters.... */ thisProc = Ti.thisProc(); numProcs = Ti.numProcs(); /* Parse the command-line arguments to get the filename. Let all the processors do this even though it's redundant. The alternative is to let processor 0 parse the arguments, and then have it broadcast the results to the other processors. */ parseArgs(args); if (filename == null) { /* Only processor 0 will print out this error message. Titanium and Java have synchronized I/O, so it is not an error for multiple processors to print to stdout. But it is darned confusing. */ if (thisProc == 0) { /* System.out.print() and System.out.println() are how you write to stdout in Java and Titanium. There's no printf(). But you can save yourself some carpal tunnel wear by writing code like this: System.out.println("The result is: " + i + "."); The integer i gets automatically converted to a string and concatenated with the other two strings. */ System.out.println("Switches:"); System.out.println("-f filename"); System.out.println(" The input matrix, in PETSc binary format."); } System.exit(1); } /* Create an object to represent PETSc, which also initializes the PETSc library. There's only going to be one PETSc per process, so this might have been done with static methods; this is a design issue. */ Petsc local petsc = new Petsc(args, null, null, null); /* Create a viewer for output. PETSc performs I/O with "Viewer" objects, which the user must create. Viewers can be stdout/stdin, files, or even the X display. Comm.self is a constant (a final static variable in class Comm) that represents the MPI_SELF communicator, which informs PETSc and MPI that this is sequential code. */ Viewer local viewer = ViewerASCII.out(Comm.self); /* Create a viewer for the matrix file on disk. */ ViewerBinary local matinput = new ViewerBinary(Comm.self, filename, ViewerBinary.BINARY_RDONLY, null); /* Create a new matrix from the one on disk. */ /* Note the keyword "local". I mentioned earlier that every object reference in Titanium is global by default. So the runtime system doesn't know if it should perform communication when it accesses the object data or not. Checking to see what to do takes up time. You can make your programs more efficient by declaring object references you know to be local as "local". All PETSc objects in our interface are local, because that's how PETSc does things. Every process sees its own piece of a distributed data structure, not anyone else's. */ /* If the user gave a block size on the command line, we ask for the matrix in MATSEQBAIJ form: sequential, blocked, AIJ format. If the user did not give a block size on the command line, we request MATSEQAIJ form: sequential, non-blocked, AIJ format. MATSEQBAIJ and MATSEQAIJ are constants in interface MatType that tell PETSc how we want the matrix data stored. */ int format = Options.hasName(null, "-matload_block_size", null) ? MatType.MATSEQBAIJ : MatType.MATSEQAIJ; Mat local A = new Mat(matinput, format, null); /* The act of creating the matrix has also filled in two of its members, which tell you what its size is. */ int m = A.m; int n = A.n; /* Create the solution vector. */ Vec local x = new Vec(m, null); /* populate it with random values in the range [0, 1) */ PetscRandom local random = new PetscRandom(Comm.self, null); x.setRandom(random, null); random.destroy(null); /* Print out the solution vector. */ if (thisProc == 0) { System.out.println("Actual solution vector:"); } x.view(viewer, null); /* Create b, the RHS of the equation, by multiplying A by x. Don't forget the local qualifier.... */ Vec local b = new Vec(m, null); A.mult(x, b, null); /* Set the threshold value to be epsilon times the dot product of b with itself. */ double threshold = epsilon * b.dot(b, null); /* Forget we know what the solution vector is and create a new solution guess vector, x_p. */ Vec local x_p = new Vec(m, null); /* Set x_p to all 1's, as an initial guess. */ x_p.set(1.0, null); /* Create a vector r, the residual. */ Vec local r = new Vec(m, null); /* r = A * x_p */ A.mult(x_p, r, null); /* r = -r */ r.scale(-1.0, null); /* r = b + r */ r.axpy(1.0, b, null); /* Create a search direction vector, p. */ Vec local p = new Vec(m, null); /* p = r */ r.copy(p, null); /* Destroy b. */ /* We can't actually do this, because Java and Titanium don't allow you to destroy individual objects; memory is reclaimed through automatic garbage collection. And it is not reclaimed at all on distributed systems (it's a hard CS problem), which is kind of an annoyance you have to deal with by allocating big chunks of RAM and doing your own management. */ /* Create temporary vectors v and new_r. */ Vec local v = new Vec(m, null); Vec local new_r = new Vec(m, null); /* Declare s here because it's used outside the loop. */ double s; /* Main solver loop. */ do { /* Zero out v, because mult is an additive operation. */ v.set(0, null); /* v = A * p */ A.mult(p, v, null); /* s = (r' * r) */ s = r.dot(r, null); /* a = s / (p' * v) */ double a = s / p.dot(v, null); /* x_p = x_p + (a * p) */ x_p.axpy(a, p, null); /* new_r = r - a * v */ r.copy(new_r, null); new_r.axpy(-1.0 * a, v, null); /* g = (new_r' * new_r) / s */ double g = new_r.dot(new_r, null) / s; /* p = new_r + g*p */ p.aypx(g, new_r, null); /* r = new_r */ new_r.copy(r, null); System.out.println("Error: " + s); } while (s >= threshold); /* Print out the result, x_p. */ if (thisProc == 0) { System.out.println("Calculated solution:"); } x_p.view(viewer, null); petsc.finalize(null); } }