CS201 Summaries for Russell Knight Student # 102664567 ----------------------------------------- Title: 201: Move over HAL... Autonomous Spacecraft in the New Millennium, Steve Chien, JPL & CIT Date: Thursday, October 4, 2001 Time: 4:15 p.m. - 6 p.m. Move over HAL... Autonomous Spacecraft in the New Millennium Steve Chien Jet Propulsion Laboratory California Institute of Technology Summary The presentation started off with a movie clip from UPN 13 about the movie AI. It had Russell Knight (me!) from JPL talking about AI in space, interspersed with random robot and AI clips from movies of the past. Steve Chien then went on to describe two missions on which the AI-group at JPL intends to fly AI software-3 Corner Sat and Techsat 21. 3 Corner Sat (3CS) is an Earth orbiting mission to observe clouds. It consists of 3 satellites that tumble (have no guidance control) while in orbit. The goal is to time the taking of photographs from each spacecraft to achieve stereoscopy of the clouds. It is being developed jointly by Arizona State University (ASU), The University of Colorado, Boulder (CU), and New Mexico State University (NMSU). To help reduce the overhead of storing bad pictures, on-board algorithms will determine whether or not a picture is in fact of a cloud. The primary contribution of JPL to the 3CS mission is CASPER, the automated planner/scheduler. CASPER will fly about 3CS and handle reasoning about states and resources. For example, CASPER can reason about downlink capability to ensure that we do not oversubscribe memory before the next downlink opportunity. CASPER is a real-time planner that is based on the ASPEN planner. ASPEN uses an iterative repair, heurist local search to find satisficing plans. Spacecraft Command Language (SCL) will also be used. SCL allows the formulation of control rules and scripts for use on spacecraft. For example, one could formulate a rule that states "when the battery is at 10% capacity, turn off the cameras." The Techsat 21 mission (also called the Autonomous Science-Craft mission or ASC) is being conducted from the US Air force. The mission is similar to 3CS in that they are flying a constellation of spacecraft, but these spacecraft are far more sophisticated. They have the ability to change relative configuration and have full guidance control. The primary instrument is a synthetic aperture radar. Again, CASPER is used for the on-board mission planning and SCL is used for low-level control. But, the science algorithms are far more sophisticated. These include object-recognition, change-detection and discovery algorithms. These are being developed in the Machine Learning Systems group at JPL. The constellation-configuration software is being developed by Princeton Satellite Systems. The overarching idea is that this AI technology allows these systems to be far more reactive than ever before, allowing for new space exploration capabilities. ----------------------------------------- Title: 201: Challenges and Opportunities in Programmable Logic, M. Sarrafzadeh & J. Cong, UCLA Computer Science Date: Thursday, November 15, 2001 Time: 4 p.m. - 6 p.m. Majid Sarrafzadeh & Jason Cong UCLA Computer Science Summary This talk focused mainly on changes in the hardware (e.g., Programmable Logic Arrays.) It was predicted that this will be a serious focus of research for the next 10 years. It was noted that parameterization is no the same thing as reconfiguration. This is because parameterization has a finite number of states, whereas (it was claimed without proof) reconfiguration has an infinite number of possible states. Currently, the trend is for shorter development times. Programmable Logic Devices are eating into the ASIC design. PLDs are also becoming more affordable. Also, many semiconductor companies are working toward programmable cores for their VLSI designs (e.g., Actel, etc.). There has been fast growth of PLDs, and there are now up to 50 million gates per chip. This capability has led to more demand, and PLD vendors are accruing more revenue. But, there remain a number of unsolved problems. One of these is that change is faster now. This leads to several advantages to using PLDs. These include faster turnaround from design to delivery, the delivered items are flexible and reprogrammable and thus can accommodate changing standards. All of this leads to a low cost for design errors. Another aspect of the industry that makes PLDs attractive is the skyrocketing mask costs for ASICs and the fact that one does not have to wait for wafer allotments. This all leads to the conclusion that PLDs are increasing in dominance in the industry. SEDA was discussed. It consists of an image capture system connected via Ethernet to a Wildstar FPGA board. A claim was made that existing FPGAs re too fine grained-what is really called for are functions like fast Fourier transforms, multipliers, etc. A "strategically programmable" system was discussed that handled generating a device that optimizes its performance for a certain algorithm or program. Every pair of operations in an operation graph was analyzed to try to find opportunities to contract the operations into hardware based macro operators. Note: the operators discussed were additions, etc.; there is a big difference between recognizing a contraction of a subtract and multiply versus recognizing the implementation of a Fourier transform. Applications for this technique included image restoration by neighbor averaging, shape coding by modeling outlines as polygons, and motion estimation for compression of video. Jason Cong then continued the talk. His work focuses on the synthesis and architecture exploration for field programmable technologies. His lab is the VLSI CAD lab. He mentioned that (Business Week magazine had said) chameleon chips are the number one item that will change our lives. The research he directs includes novel algorithms and architecture exploration/optimization. This entails the support of new PLD architectures and the support and development of higher performance architectures. He reports being able to use his group's techniques to solve previously unsolved Xilinx problems. He described much of the details of the approach (like using memory for logic blocks, etc.). The final conclusion was that PLD is expanding and has lots of challenges. ----------------------------------------- Title: 201: Scalable Processor Architectures, Glenn Reinman, UCLA Computer Science Date: Thursday, November 29, 2001 Time: 4 p.m. - 5:45 p.m. Glenn Reinman UCLA Computer Science Department Summary There are three aspects of design, 1) instruction set architecture, 2) organization, and 3) hardware. This work draws from many fields, including compiler design, CAD/VLSI, and device electronics. His particular focus is on organization and hardware, specifically super-scalar out of order execution processors, e.g., the Pentium 4. To measure the performance of one of these systems, we must consider the instructions per cycle, the performance for applications, and the compiler to be used. It is also very difficult to quantify complexity. For example, register file port reduction saves energy and reduces access time, but bypass hardware is parallel and a switching network might be needed. Also, we need to think about the memory gap. The memory gap is the difference in the rate of increase in capability of memory versus the capability of processors. Processor capability increases at 60% per year, whereas memory capability increases at a lowly 7% per year. This leads to an interconnect-scaling bottleneck. We must reduce the structure size to accommodate the clock speed, leading to clock scaling, deeper pipelines, etc. Concerns for design of these processors are power, area, reliability, security, and manufacturing cost. Related to power and area are packaging and cooling. He then discussed pipelining. The various hazards for pipelining include control hazards (for which they use branch prediction), data dependence hazards (for which they use bypassing, register renaming, speculative execution, out of order execution, value and address prediction, and memory disambiguation), and structural hazards (for which they use speculative execution, etc.). He covered some details concerning out of order execution. These included updating the speculative state with write-backs and updating the non-speculative state with commits. The use of reservation stations includes holding instructions until it is ready to execute. Instructions enter in order, but leave out of order. On the other hand, reorder buffers track instructions; the instructions arrive and leave in order. Pipelining bottlenecks were covered next. These included branch miss-predictions (the number one bottleneck), memory bottlenecks (such as cache misses, instruction latency, scheduling, limited cache ports), processor issue width, register file size (the number of registers is always a limit as to how far we can look ahead), number of reservation stations (how many instructions are there to choose from?) and true data dependencies. Front end stalls occur when an issue buffer fills. Branch prediction stalls when the instruction cache ports are exhausted. Solutions include bigger issue buffer. His solution is to split the front end into branch prediction and instruction fetch operational blocks. This leads to latency hiding, it can tolerate multilevel branch prediction, cache misses, insufficient ports. Also, we can glimpse at the future fetch stream to guide instruction or data prefetch and pre-scheduling. A trace cache holds snapshots of the decoded fetch stream. Is this scalable? Not really, so we use it only for the most important instructions, e.g., most frequently executed instructions, most frequently taken branches, etc. One technique is to use a predicated execution. The idea is that if you can' t predict a branch, get rid of it! But, this causes considerable code bloat, and impacts the ISA greatly. So, we only predicate hard to predict branches. Caching is a technique for hiding the latency of memory. But cache misses are a problem. Prefetching hides the latency of the first level cache. One way of implementing this is as stream buffers, but they have no notion of accuracy. More intelligent schemes might be possible. Speculative techniques include value and address prediction. But how do we deal with miss-predictions? Pipeline recovery is one way. This can be via squashing (flushing the pipeline) or re-executing (only squash dependent operations.) A technique concerning throttling speculation was described. A key issue is an analysis of the critical path. This includes dependency chain length, cache behavior, and instruction type. Once the critical path is known, we also need to know how to break it and then know how to determine the new critical path. Another issue is register file pressure. The register file can be scaled, but the problem is the bypass hardware. Not bypassing causes waits, and full bypassing is not scalable, thus he proposes an operand file to hold the operand and use bypass hardware. The goal is to have just enough register state to capture the live registers. This leaves little on the critical timing path for execution, and uses a non-critical timing path for recovery. This removes the requirement for register renaming and yet leaves us with precise exceptions. Other approaches to scalability include helper engines, simultaneous multithreading, and multiclustered computing. Recent trends force architects to consider more than just instructions per cycle when evaluating performance. These same trends will force consumers to consider more than just cycle time. Techniques to improve scalability include minimizing the logic on the critical path, multithreaded execution, coordinated functional unit clusters, multilevel predictions hierarchies, and early predictor access. ----------------------------------------- Title: An Architecture for Virtual Internets, Joe Touch, USC/ISI Date: Thursday, December 6, 2001 Time: 4 p.m. - 5:45 p.m. Joe Touch Director, Postel Center for Experimental Networking USC/ISI http://www.isi.edu/touch Abstract: Virtual networks (VNs) provide an abstraction for network infrastructure to simplify distributed (peer) applications, extend private infrastructure over public networks, and support the development and incremental deployment of new protocols (M-Bone, 6-Bone, A-Bone). VNs typically require custom protocols with operating system support, e.g., GRE, PPTP, or L2TP, or application support (peer nets, A-Bone). A Virtual Internet (VI) can be created from a combination of virtual interfaces and IP encapsulation, together with the judicious configuration of internal routes, and support existing applications without new protocols or operating system support. This VI architecture augments the Internet with host and router multihoming, and supports customized topologies and multi-level virtualization (layering). The VI uses two layers of IP encapsulation to support dynamic routing together with IPsec, and to allow smaller base networks to emulate larger virtual ones via revisting components. A VI can thus create a 100-node ring by visiting each of 5 routers 20 times. The architecture has been implemented as a system for dynamic overlay deployment and management called the 'X-Bone,' intended for rapid configuration of experiment testbeds and demonstration systems. Because it also supports layered (recursive) overlays, this VN architecture is currently being developed to deploy parallel overlays for fault tolerance, in a system called the 'DynaBone.'