CS201 Summaries for Russell Knight

Student # 102664567

-----------------------------------------

Title: 201: Move over HAL... Autonomous Spacecraft in the New Millennium,
Steve Chien, JPL & CIT

Date: Thursday, October 4, 2001

Time: 4:15 p.m. - 6 p.m.

Move over HAL... Autonomous Spacecraft in the New Millennium


Steve Chien

Jet Propulsion Laboratory

California Institute of Technology


Summary

The presentation started off with a movie clip from UPN 13 about the movie
AI. It had Russell Knight (me!) from JPL talking about AI in space,
interspersed with random robot and AI clips from movies of the past.


Steve Chien then went on to describe two missions on which the AI-group at
JPL intends to fly AI software-3 Corner Sat and Techsat 21.


3 Corner Sat (3CS) is an Earth orbiting mission to observe clouds. It
consists of 3 satellites that tumble (have no guidance control) while in
orbit. The goal is to time the taking of photographs from each spacecraft to
achieve stereoscopy of the clouds. It is being developed jointly by Arizona
State University (ASU), The University of Colorado, Boulder (CU), and New
Mexico State University (NMSU).


To help reduce the overhead of storing bad pictures, on-board algorithms
will determine whether or not a picture is in fact of a cloud.


The primary contribution of JPL to the 3CS mission is CASPER, the automated
planner/scheduler. CASPER will fly about 3CS and handle reasoning about
states and resources. For example, CASPER can reason about downlink
capability to ensure that we do not oversubscribe memory before the next
downlink opportunity. CASPER is a real-time planner that is based on the
ASPEN planner. ASPEN uses an iterative repair, heurist local search to find
satisficing plans.


Spacecraft Command Language (SCL) will also be used. SCL allows the
formulation of control rules and scripts for use on spacecraft. For example,
one could formulate a rule that states "when the battery is at 10% capacity,
turn off the cameras."


The Techsat 21 mission (also called the Autonomous Science-Craft mission or
ASC) is being conducted from the US Air force. The mission is similar to 3CS
in that they are flying a constellation of spacecraft, but these spacecraft
are far more sophisticated. They have the ability to change relative
configuration and have full guidance control. The primary instrument is a
synthetic aperture radar.


Again, CASPER is used for the on-board mission planning and SCL is used for
low-level control. But, the science algorithms are far more sophisticated.
These include object-recognition, change-detection and discovery algorithms.
These are being developed in the Machine Learning Systems group at JPL. The
constellation-configuration software is being developed by Princeton
Satellite Systems.


The overarching idea is that this AI technology allows these systems to be
far more reactive than ever before, allowing for new space exploration
capabilities.


-----------------------------------------

Title: 201: Challenges and Opportunities in Programmable Logic, M.
Sarrafzadeh & J. Cong, UCLA Computer Science

Date: Thursday, November 15, 2001

Time: 4 p.m. - 6 p.m.


Majid Sarrafzadeh & Jason Cong

UCLA Computer Science


Summary

This talk focused mainly on changes in the hardware (e.g., Programmable
Logic Arrays.) It was predicted that this will be a serious focus of
research for the next 10 years.


It was noted that parameterization is no the same thing as reconfiguration.
This is because parameterization has a finite number of states, whereas (it
was claimed without proof) reconfiguration has an infinite number of
possible states.


Currently, the trend is for shorter development times. Programmable Logic
Devices are eating into the ASIC design. PLDs are also becoming more
affordable. Also, many semiconductor companies are working toward
programmable cores for their VLSI designs (e.g., Actel, etc.).


There has been fast growth of PLDs, and there are now up to 50 million gates
per chip. This capability has led to more demand, and PLD vendors are
accruing more revenue.


But, there remain a number of unsolved problems. One of these is that change
is faster now. This leads to several advantages to using PLDs. These include
faster turnaround from design to delivery, the delivered items are flexible
and reprogrammable and thus can accommodate changing standards. All of this
leads to a low cost for design errors. Another aspect of the industry that
makes PLDs attractive is the skyrocketing mask costs for ASICs and the fact
that one does not have to wait for wafer allotments.


This all leads to the conclusion that PLDs are increasing in dominance in
the industry.


SEDA was discussed. It consists of an image capture system connected via
Ethernet to a Wildstar FPGA board.


A claim was made that existing FPGAs re too fine grained-what is really
called for are functions like fast Fourier transforms, multipliers, etc.


A "strategically programmable" system was discussed that handled generating
a device that optimizes its performance for a certain algorithm or program.
Every pair of operations in an operation graph was analyzed to try to find
opportunities to contract the operations into hardware based macro
operators. Note: the operators discussed were additions, etc.; there is a
big difference between recognizing a contraction of a subtract and multiply
versus recognizing the implementation of a Fourier transform.


Applications for this technique included image restoration by neighbor
averaging, shape coding by modeling outlines as polygons, and motion
estimation for compression of video.


Jason Cong then continued the talk.


His work focuses on the synthesis and architecture exploration for field
programmable technologies. His lab is the VLSI CAD lab.


He mentioned that (Business Week magazine had said) chameleon chips are the
number one item that will change our lives.


The research he directs includes novel algorithms and architecture
exploration/optimization. This entails the support of new PLD architectures
and the support and development of higher performance architectures.


He reports being able to use his group's techniques to solve previously
unsolved Xilinx problems. He described much of the details of the approach
(like using memory for logic blocks, etc.).


The final conclusion was that PLD is expanding and has lots of challenges.


-----------------------------------------

Title: 201: Scalable Processor Architectures, Glenn Reinman, UCLA Computer
Science

Date: Thursday, November 29, 2001

Time: 4 p.m. - 5:45 p.m.


Glenn Reinman

UCLA Computer Science Department


Summary

There are three aspects of design, 1) instruction set architecture, 2)
organization, and 3) hardware. This work draws from many fields, including
compiler design, CAD/VLSI, and device electronics. His particular focus is
on organization and hardware, specifically super-scalar out of order
execution processors, e.g., the Pentium 4.


To measure the performance of one of these systems, we must consider the
instructions per cycle, the performance for applications, and the compiler
to be used.


It is also very difficult to quantify complexity. For example, register file
port reduction saves energy and reduces access time, but bypass hardware is
parallel and a switching network might be needed.


Also, we need to think about the memory gap. The memory gap is the
difference in the rate of increase in capability of memory versus the
capability of processors. Processor capability increases at 60% per year,
whereas memory capability increases at a lowly 7% per year. This leads to an
interconnect-scaling bottleneck. We must reduce the structure size to
accommodate the clock speed, leading to clock scaling, deeper pipelines,
etc.


Concerns for design of these processors are power, area, reliability,
security, and manufacturing cost. Related to power and area are packaging
and cooling.


He then discussed pipelining. The various hazards for pipelining include
control hazards (for which they use branch prediction), data dependence
hazards (for which they use bypassing, register renaming, speculative
execution, out of order execution, value and address prediction, and memory
disambiguation), and structural hazards (for which they use speculative
execution, etc.).


He covered some details concerning out of order execution. These included
updating the speculative state with write-backs and updating the
non-speculative state with commits. The use of reservation stations includes
holding instructions until it is ready to execute. Instructions enter in
order, but leave out of order. On the other hand, reorder buffers track
instructions; the instructions arrive and leave in order.


Pipelining bottlenecks were covered next. These included branch
miss-predictions (the number one bottleneck), memory bottlenecks (such as
cache misses, instruction latency, scheduling, limited cache ports),
processor issue width, register file size (the number of registers is always
a limit as to how far we can look ahead), number of reservation stations
(how many instructions are there to choose from?) and true data
dependencies.


Front end stalls occur when an issue buffer fills. Branch prediction stalls
when the instruction cache ports are exhausted. Solutions include bigger
issue buffer.


His solution is to split the front end into branch prediction and
instruction fetch operational blocks. This leads to latency hiding, it can
tolerate multilevel branch prediction, cache misses, insufficient ports.
Also, we can glimpse at the future fetch stream to guide instruction or data
prefetch and pre-scheduling.


A trace cache holds snapshots of the decoded fetch stream. Is this scalable?
Not really, so we use it only for the most important instructions, e.g.,
most frequently executed instructions, most frequently taken branches, etc.


One technique is to use a predicated execution. The idea is that if you can'
t predict a branch, get rid of it! But, this causes considerable code bloat,
and impacts the ISA greatly. So, we only predicate hard to predict branches.


Caching is a technique for hiding the latency of memory. But cache misses
are a problem. Prefetching hides the latency of the first level cache. One
way of implementing this is as stream buffers, but they have no notion of
accuracy. More intelligent schemes might be possible.


Speculative techniques include value and address prediction. But how do we
deal with miss-predictions? Pipeline recovery is one way. This can be via
squashing (flushing the pipeline) or re-executing (only squash dependent
operations.)


A technique concerning throttling speculation was described. A key issue is
an analysis of the critical path. This includes dependency chain length,
cache behavior, and instruction type. Once the critical path is known, we
also need to know how to break it and then know how to determine the new
critical path.


Another issue is register file pressure. The register file can be scaled,
but the problem is the bypass hardware. Not bypassing causes waits, and full
bypassing is not scalable, thus he proposes an operand file to hold the
operand and use bypass hardware. The goal is to have just enough register
state to capture the live registers. This leaves little on the critical
timing path for execution, and uses a non-critical timing path for recovery.
This removes the requirement for register renaming and yet leaves us with
precise exceptions.


Other approaches to scalability include helper engines, simultaneous
multithreading, and multiclustered computing.


Recent trends force architects to consider more than just instructions per
cycle when evaluating performance. These same trends will force consumers to
consider more than just cycle time. Techniques to improve scalability
include minimizing the logic on the critical path, multithreaded execution,
coordinated functional unit clusters, multilevel predictions hierarchies,
and early predictor access.


-----------------------------------------

Title: An Architecture for Virtual Internets, Joe Touch, USC/ISI

Date: Thursday, December 6, 2001

Time: 4 p.m. - 5:45 p.m.


Joe Touch

Director, Postel Center for Experimental Networking

USC/ISI

http://www.isi.edu/touch


Abstract:


Virtual networks (VNs) provide an abstraction for network infrastructure to
simplify distributed (peer) applications, extend private infrastructure over
public networks, and support the development and incremental deployment of
new protocols (M-Bone, 6-Bone, A-Bone). VNs typically require custom
protocols with operating system support, e.g., GRE, PPTP, or L2TP, or
application support (peer nets, A-Bone). A Virtual Internet (VI) can be
created from a combination of virtual interfaces and IP encapsulation,
together with the judicious configuration of internal routes, and support
existing applications without new protocols or operating system support.
This VI architecture augments the Internet with host and router multihoming,
and supports customized topologies and multi-level virtualization
(layering). The VI uses two layers of IP encapsulation to support dynamic
routing together with IPsec, and to allow smaller base networks to emulate
larger virtual ones via revisting components. A VI can thus create a
100-node ring by visiting each of 5 routers 20 times. The architecture has
been implemented as a system for dynamic overlay deployment and management
called the 'X-Bone,' intended for rapid configuration of experiment testbeds
and demonstration systems. Because it also supports layered (recursive)
overlays, this VN architecture is currently being developed to deploy
parallel overlays for fault tolerance, in a system called the 'DynaBone.'