/* Conjugate gradient solver and Titanium/Java/PETSc
   introduction. Assumes a background in C and basic familiarity with
   OOP. Definitely not a substitute for a good intro book on Java,
   such as _Core Java_ or _Java in a Nutshell_ or even 
   _Learn Java in 21 Days_. Also not a substitute for the Titanium reference
   manual and the official tutorial, available from the Ti webpage. */

/* This program reads a matrix in PETSc format from a file and solves Ax=b for
   a randomly generated b, using the conjugate gradient method. */


/* 
   Here I need to say "import" to tell the compiler that I will be using some
   of the Java I/O classes, which are in the "java.io" package. The star does
   what you'd expect; it imports all the classes in that package.
*/
import java.io.*;

/*
   Same thing, except to tell the compiler about the PETSc classes. Generally
   you need to import anything that's not in the java.lang package.
*/
import petsc.*;

/* 
   Just FYI, the "package" concept I referred to is a group of
   classes; the closest concept to it in C programming is a
   library. But libraries are different because they are not a feature
   of the language; if I have a "printf" function and some library has
   that also, only one of the the printf functions will make it into
   the executable. There's no way to specify in my code which library
   I want when I call printf(). The Java language lets you have
   classes with the same name in different packages, and it lets you
   specify which one you want by attaching the package name. All of
   the PETSc classes live in the "petsc" package. I will declare the
   code in this file as belonging to the petsc package, and then I can
   access any of the classes within it without having to specify a
   package name.
*/


/*
   This is the class declaration for the program. Something like this is
   always here, in every Java or Ti program that you write. In these languages,
   executable code cannot live by itself outside a class. Since I want to write
   some executable code, I need to make up a class to put it in.

   More about the syntax:
   "public" means that the class is visible outside of itself and its package.
   This isn't really important because there's no other code that needs to 
   access the ConjGrad class, but it's a good habit to put it in because 
   you're supposed to be thinking about extensibility when you write any 
   object-oriented code.

   If I wanted to make ConjGrad a subclass of some other class, I would say
   "extends" and then the name of that class.
*/
public class SeqConjGrad {

  /* These are class variable declarations. 

     "public" means that the variable can be accessed outside of the
     class. Again, this isn't really important for this code. It's just
     a habit I have....

     "final" means that the variable can never be changed; it is a constant.
     This is supposed to help the compiler perform some optimizations, but
     I have my doubts.

     The types "double" and "int" should be familiar to you from C. ints are
     32 bits long, and doubles are 64. 

     String is the name of a class that implements strings. I don't need to
     import it before using its name because it's one of the Java standard
     utility classes.

     So what does "static" mean? 
     It means that the variable or function qualified by the "static" is
     a member of the class, not the objects created (instantiated) by that
     class. When you perform a simple class declaration, like so:

     class foo {
       int i;
     }

     you are declaring that any objects of type "foo" will contain one
     integer, called "i". There can be many objects, and each object
     can have a different value for i. Unless I create an object with
     "new foo", i has no meaning. But what a lot of C++ books gloss
     over is that the class itself can have data and code. The class
     actually exists in your program independent of any objects of its
     type; when I say "new foo" I am telling the runtime code to use
     the class as a factory to create an object of type foo. If I
     change the class declaration to read:

     class foo {
       static int i;
     }

     I am now saying that i is a class variable. It is compiled right into
     the executable and exists when my program starts. I don't need to create
     any objects for i to have meaning or be accessible. But there is only
     one i in my program, because there is only one class foo, even though 
     there can be many objects of type foo.

     I can also declare a method (function) to be static:

     class foo {
       static int i;   // class variable
       int j;          // object variable
       static func() {
           i = 5;
       }
       func2() {
           i = 5;
           j = 6;
       }
     }

     I call func() by saying "foo.func()". I call func2() by creating an object
     and calling func2() on it:

     foo f = new foo();
     f.func2();

     func() is a member function of the class, not of any objects
     created from it. This means that it can only access class
     variables. It would be an error for it to access j, because
     func() does not belong to any object, whereas there is a
     different j for each object of type foo created by the
     program. The compiler would not know which j func() would be
     referring to.  func2() can access both i and j, because the code
     belongs to the object that I called it on ("f" in the example
     above), and the runtime system knows which j func2() is referring
     to---it is "f"'s j. Similarly, it knows which i func2() is referring
     to, because there can be only one i in the program. The advantage of
     having a static method like func() is that it is there when my program
     starts. I do not need to create an object to execute that code like I
     did with func2().

     So I guess what's important to know is that you use objects and
     object variables when you want many independent sets of data, all
     with their own values. Object methods are where you put the code
     that manipulates the data for each object. Vec and Mat are coded
     like this because you want many vectors and matrices in your
     program, and code that works with each of these individually. You
     use class variables and methods when you only want to have one
     instance of your code or data, and you want it to be there when your
     program starts up. Notice that the function main() is static. It must
     be static, because there are no objects existing when the program starts,
     and you would not be able to execute it otherwise. Similarly, the 
     variables epsilon, thisProc, numProcs, and filename are static because
     I only want one of those; I am not expecting to create multiple ConjGrad 
     solver objects and have multiple independent values for these variables.
     The method parseArgs is the same way; it deals with the static class
     data for ConjGrad and does not need to be attached to an object. Read
     the comment above "main" if you've digested this and want to know about
     parallelism.
*/
     
  public static final double epsilon = 1.0e-6;
  public static int thisProc;
  public static int numProcs;
  public static String filename = null;

  public static void parseArgs(String[] args) {
    for (int i = 0; i < args.length; i++) {
      if (args[i].equals("-f")) {
	filename = args[i+1];
	i++;
      }
    }
  }

/* 
   So what about parallelism? Maybe I do want to create ConjGrad objects so
   I can have one on each processor. Well, if I were executing parallel code
   with Java's threads (which Titanium does not currently support), I would
   have to do exactly this. I would have to put the body of the CG solver into
   a non-static method, and have main create a bunch of CG objects and run
   them in parallel using that method as a starting point. Well, that's not
   how Titanium does things. The parallelism is *implicit*, like in the MPI
   model. The different threads or processes that will execute your code and
   hold your data are spawned by the operating system when your program loads;
   your program does not do it. So main() is fully parallel from the get-go;
   it is as if you were running a separate copy of your program on each 
   processor.

   Remember when I said that there is only one instance of each static
   variable because there is only one instance of each class? I
   lied. There is one *per program*, and since the Titanium model is
   as if you were running one copy of your program on each processor, there
   turns out to be one instance of each static variable per processor.
   But you can't directly access static data on other processors, which
   is why I didn't mention parallelism before. Your static code and methods
   will not be affected by the copies of them on the other processors, but 
   they can have a different value on each processor. You (and Andy Begel)
   might argue that it is more in keeping with the meaning of "static" to just
   have one, period. But this is hard to accomplish on distributed systems,
   so there's one per processor, even on shared-memory machines.

   Hmmm...what about the "single" keyword below? If you remember MPI,
   you know that there are MPI calls that every processor has to hit,
   because the processors can only communicate when they hit MPI
   calls.  MPI_Broadcast() is a good example; all the processes have
   to agree to get together at some point in the program to exchange
   data. It is an error if one (or more) processes skip over the
   broadcast point with an if or other conditional construct. The gcc
   compiler and the MPI library do not figure out if your program is
   doing this when you compile; you will only find out at runtime *if*
   the broadcast is actually performed. It's tough (impossible,
   actually) for the compiler to determine if there are these sorts of
   errors in every possible program.  The illustrious Titanium
   developers (primarily David Gay, I think) realized that it is
   possible to have these errors detected by the compiler if the user
   agrees to restrict his program in ways that make this analysis much
   easier. This is what "single" is for---to help the compiler figure
   out if you're doing something wrong. So the way this works is that
   any point in your program which all processes must hit is a "single
   point". Any methods which contain these single points, or any
   methods which call those methods, become single in themselves---all
   processes must now hit those method calls. You used to have to show
   the compiler that you understood this by explicitly declaring those
   methods to be "single", but I don't think you need to do this any
   more. But what you do have to do is tell the compiler that any variables
   involved in conditionals that could break the single points are single.
   Example:

   class foo {
      public static single void main(String args[]) {
        int i;
        ....do some stuff...
	if (i == 5) {
          barrier();
	}
      }
   }

   This is an incorrect program; all processes must hit the barrier, and it
   is not clear to the compiler if i will be 5 on all processes. But you can
   help it out by declaring that i is single. That means you are promising the
   compiler that i will have the same value on each process all the time.
   The compiler doesn't make sure of this; *you* do. It will help you by
   telling you that you are breaking the rules:

   single int i;
   int j;

   i = j;   // j could be different on each process, breaking the singleness of
            // i

   so for the above example you would have to declare j as single, or find 
   some other way to write your code. Single is sort of like a contagious 
   disease; anything declared single will force anything it touches to be 
   single. You can fake out the single analysis if you're clever (mainly by
   casting things to single), and it's more conservative than it needs to 
   be, but it does help you think about what you're doing when you perform
   communication.
*/

/*
   So do you communicate between processes?

   You do this two ways: explicitly with broadcasts, or semi-implicitly with
   global pointers.

   A broadcast is what you'd expect. You say:

   i = broadcast x from 0;

   which stores into every processor's i the value of x on processor 0. You
   can also do all-processor exchanges using arrays, but I'm not going to
   cover that here.

   Global pointers are another communication method:

   foo f;
   if (Ti.thisProc() == 0) {
      f = new foo();
   }
   f = broadcast f from 0;
   if (Ti.thisProc() == 1) {
      f.i = 5;
   }

   What's happening here is that process 0 is creating an f object,
   and then broadcasting a pointer to this object to everyone
   else. There's no explicit pointer syntax in Titanium; every object
   reference is always a pointer.  Then process 1 is writing the value
   5 into the f object's i field. Since f is a pointer to an object
   that lives on process 0, this involves communication. Careless use
   of global pointers can lead to severe performance degradation on
   distributed systems. Every pointer is global by default, but it also
   points to things on the local process by default, so there's no 
   communication normally done. Global pointers are the basic communication
   design pattern for Titanium code; you create data local to one process,
   and broadcast a pointer to that data to the others. Then they write and
   read that data, and the communication is done implicitly.
*/

/*
   From here on it's mostly just a bunch of math. 
*/
  public static single void main(String[] args) {
    /* 
       Ti is a class; these are class methods that tell a process what its
       process number is, and how many processes there are. The process
       numbers go from 0 to N-1, just as you'd expect.
       I'm storing these into variables only because I want to save typing
       five characters....
    */
    thisProc = Ti.thisProc();
    numProcs = Ti.numProcs();
   
    /* Parse the command-line arguments to get the filename. Let all the
       processors do this even though it's redundant. The alternative is to
       let processor 0 parse the arguments, and then have it broadcast the
       results to the other processors. */
    parseArgs(args);

    if (filename == null) {
      /* 
	 Only processor 0 will print out this error message. Titanium and
	 Java have synchronized I/O, so it is not an error for multiple 
	 processors to print to stdout. But it is darned confusing.
      */
      if (thisProc == 0) {
	/* 
	   System.out.print() and System.out.println() are how you write to
	   stdout in Java and Titanium. There's no printf(). But you can 
	   save yourself some carpal tunnel wear by writing code like this:
	   
	   System.out.println("The result is: " + i + ".");

	   The integer i gets automatically converted to a string and 
	   concatenated with the other two strings.
        */

	System.out.println("Switches:");
	System.out.println("-f filename");
	System.out.println("    The input matrix, in PETSc binary format.");
      }
      System.exit(1);
    }
      
    /* 
       Create an object to represent PETSc, which also initializes the PETSc
       library. There's only going to be one PETSc per process, so this 
       might have been done with static methods; this is a design issue.
    */
    Petsc local petsc = new Petsc(args, null, null, null);

    /* 
       Create a viewer for output. PETSc performs I/O with "Viewer" objects,
       which the user must create. Viewers can be stdout/stdin, files, or
       even the X display. Comm.self is a constant (a final static variable 
       in class Comm) that represents the MPI_SELF communicator, which 
       informs PETSc and MPI that this is sequential code.
     */
    Viewer local viewer = ViewerASCII.out(Comm.self);

    /* 
       Create a viewer for the matrix file on disk.
    */
    ViewerBinary local matinput = new ViewerBinary(Comm.self, filename, 
						   ViewerBinary.BINARY_RDONLY,
						   null);

    /* Create a new matrix from the one on disk. */
    /* 
       Note the keyword "local". I mentioned earlier that every object
       reference in Titanium is global by default. So the runtime system
       doesn't know if it should perform communication when it accesses the
       object data or not. Checking to see what to do takes up time. You can
       make your programs more efficient by declaring object references you
       know to be local as "local". All PETSc objects in our interface are
       local, because that's how PETSc does things. Every process sees its
       own piece of a distributed data structure, not anyone else's.
    */
    /* 
       If the user gave a block size on the command line, we ask for
       the matrix in MATSEQBAIJ form: sequential, blocked, AIJ
       format. If the user did not give a block size on the command
       line, we request MATSEQAIJ form: sequential, non-blocked, AIJ
       format. MATSEQBAIJ and MATSEQAIJ are constants in interface
       MatType that tell PETSc how we want the matrix data stored.
    */

    int format = Options.hasName(null, "-matload_block_size", null)
	? MatType.MATSEQBAIJ
	: MatType.MATSEQAIJ;
    Mat local A = new Mat(matinput, format, null);

    /*
       The act of creating the matrix has also filled in two of its members,
       which tell you what its size is.
    */
    int m = A.m;
    int n = A.n;

    /* Create the solution vector. */
    Vec local x = new Vec(m, null);

    /* populate it with random values in the range [0, 1) */
    PetscRandom local random = new PetscRandom(Comm.self, null);
    x.setRandom(random, null);
    random.destroy(null);

    /* Print out the solution vector. */
    if (thisProc == 0) {
      System.out.println("Actual solution vector:");
    }
    x.view(viewer, null);

    /* 
       Create b, the RHS of the equation, by multiplying A by x. Don't forget
       the local qualifier....
     */
    Vec local b = new Vec(m, null);
    A.mult(x, b, null);

    /* Set the threshold value to be epsilon times the dot product of b with
       itself. */
    double threshold = epsilon * b.dot(b, null);

    /* Forget we know what the solution vector is and create a new solution 
       guess vector, x_p. */
    Vec local x_p = new Vec(m, null);

    /* Set x_p to all 1's, as an initial guess. */
    x_p.set(1.0, null);

    /* Create a vector r, the residual. */
    Vec local r = new Vec(m, null);

    /* r = A * x_p */
    A.mult(x_p, r, null);

    /* r = -r */
    r.scale(-1.0, null);

    /* r = b + r */
    r.axpy(1.0, b, null);
    
    /* Create a search direction vector, p. */
    Vec local p = new Vec(m, null);

    /* p = r */
    r.copy(p, null);

    /* Destroy b. */
    /* 
       We can't actually do this, because Java and Titanium don't allow you
       to destroy individual objects; memory is reclaimed through automatic
       garbage collection. And it is not reclaimed at all on distributed
       systems (it's a hard CS problem), which is kind of an annoyance you 
       have to deal with by allocating big chunks of RAM and doing your own 
       management.
     */
    
    /* Create temporary vectors v and new_r. */
    Vec local v = new Vec(m, null);
    Vec local new_r = new Vec(m, null);

    /* Declare s here because it's used outside the loop. */
    double s;

    /* Main solver loop. */
    do {
      /* Zero out v, because mult is an additive operation. */
      v.set(0, null);

      /* v = A * p */
      A.mult(p, v, null);
      
      /* s = (r' * r) */
      s = r.dot(r, null);

      /* a = s / (p' * v) */
      double a = s / p.dot(v, null);

      /* x_p = x_p + (a * p) */
      x_p.axpy(a, p, null);

      /* new_r = r - a * v */
      r.copy(new_r, null);
      new_r.axpy(-1.0 * a, v, null);

      /* g = (new_r' * new_r) / s */
      double g = new_r.dot(new_r, null) / s;

      /* p = new_r + g*p */
      p.aypx(g, new_r, null);

      /* r = new_r */
      new_r.copy(r, null);

      System.out.println("Error: " + s);
    } while (s >= threshold);

    /* Print out the result, x_p. */
    if (thisProc == 0) {
      System.out.println("Calculated solution:");
    }
    x_p.view(viewer, null);

    petsc.finalize(null);
  }
}