MPI at LLNL: Glossary

Privacy and Legal Notice

Home

Overview

News/Events

Tutorials/
Documentation

Libraries/
Building Executables

Environment Variables

SMT and
OpenMP

Performance Results

Open Issues, Gotchas, and Recent Changes

FAQs

Glossary

Contacts

Glossary

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

1X	An InfiniBand interface width. 1X defines an interface with two differential pairs, one transmit, one receive. Provides 2.5 Gbit/s full-duplex connections.
4X	An InfiniBand interface width. 4X defines an interface with eight differential pairs (four per direction), four transmit, four receive. Provides 10 Gbit/s full-duplex connections.
4X DDR	Double data rate InfiniBand interface. 4X defines an interface with eight differential pairs (four per direction). Providing 5.0 Gbits/s per differential pair for 20 Gbit/s full-duplex.
12X	An InfiniBand interface width. 12X defines an interface with 24 differential pairs (12 per direction), 12 transmit, 12 receive. Provides 30 Gbit/s full-duplex connections.
12X DDR	Double data rate InfiniBand interface. 12X defines an interface with 24 differential pairs (4 per direction). Providing 5.0 Gbits/s per differential pair for 60 Gbit/s full-duplex.
AMPI	Adaptive MPI. An MPI implementation developed at the University of Illinois that can exploit virtual processor techniques.
API	Application programmer's interface. Syntax and semantics for invoking services from within an executing application. All APIs shall be available to both Fortran and C programs, although implementation issues (such as whether the Fortran routines are simply wrappers for calling C routines) are up to the supplier.
ARMCI	Aggregate remote memory copy interface. A one-sided communication library that provides an extensive set of RMA. See http://www.emsl.pnl.gov/docs/parsoft/armci/
BLAB	Aggregate Bidirectional Link Bandwidth. BLAB is defined as the minimum of the aggregate memory bandwidth, aggregate bus bandwidth, or the sum bidirectional link peak user payload data bandwidth.
blocking operation	An operation that does not complete until the operation either succeeds or fails. For example, a blocking receive will not return until a message is received or until the channel is closed and no further messages can be received.
BlueGene/L BG/L	A jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the U.S. Department of Energy ASC Advanced Architecture Research Program. Application performance and scaling studies have recentlybeen initiated with partners at a number of academic and government institutions, including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame at price/performance and power consumption/performance targets unobtainable withconventional architectures. See http://www.research.ibm.com/bluegene/
broadcast operation	A communication operation in which one processor sends (or broadcasts) a message to all other processors.
buffer	A portion of storage used to hold input or output data temporarily.
Charm++	A parallel C++ library developed at the University of Illinois.
Clos	A network topology named after its inventor Charles Clos.
cluster	A set of SMPs connected via a scalable network technology. The network shall support high bandwidth, low latency message passing. It shall also support remote memory referencing.
collective communication	A communication operation that involves more than two processes or tasks. Broadcasts, reductions, and the MPI_Allreduce subroutine are all examples of collective communication operations. All tasks in a communicator must participate.
communicator	An MPI object that describes the communication context and an associated group of processes.
critical path	The serial chain of dependencies that most limits forward progress.
DDR	Double data rate
device driver	Software to function the host channel adapter devices on a node.
DHCP	Dynamic host configuration protocol. DHCP enables individual computers on an IP network to extract their configurations from a server (the 'DHCP server') or servers, in particular, servers that have no exact information about the individual computers until they request the information. DHCP is often used to reduce the work necessary to administer a large network (e.g., managing IP addresses).
DPCL	Dynamic Probe Class Library
fairness	A policy in which tasks, threads, or processes must be allowed eventual access to a resource for which they are competing. For example, if multiple threads are simultaneously seeking a lock, no set of circumstances can cause any thread to wait indefinitely for access to the lock.
FT	Fault tolerance or fault tolerant
GiB	gibibyte. Gibibyte is a billion base 2 bytes. This is typically used in terms of random access memory and is 2³⁰ (or 1,073,741,824) bytes.
GB	gigabyte. Gigabyte is a billion base 10 bytes. This is typically used in every context except for random access memory size and is 10⁹ (or 1,000,000,000) bytes.
GPL	General Public License. A legal software license arrangement developed by GNU to promote open software. The licenses for most software are designed to prevent users from sharing or changing it. By contrast, the GNU General Public License is intended to guarantee the freedom to share and change free software to ensure the software is free for all its users. The GPL is designed to make sure that anyone can distribute copies of free software (and charge for this service if they wish); that they receive source code or can get it if they want; that they can change the software or use pieces of it in new free programs; and that they know they can do these things. The GPL forbids anyone to deny others these rights or to ask them to surrender the rights. These restrictions translate to certain responsibilities for those who distribute copies of the software or modify it.
GUI	Graphical user interface. A type of computer interface consisting of a visual metaphor of a real-world scene, often of a desktop. Within that scene are icons, which represent actual objects, that the user can access and manipulate with a pointing device.
HCA	Host channel adapter. IBA expansion card that interfaces the IBA interconnect to the cluster node I/O subsystem.
HEC	High-end computing
hot spot	A memory location or synchronization resource for which multiple processors compete excessively. This competition can cause a disproportionately large performance degradation when one processor that seeks the resource blocks, preventing many other processors from having it, thereby forcing them to become idle.
HPC ULPs	High performance computing upper layer protocols. HPC ULPs include MPI, IPoIB, SDP, and Sandia Portals.
HT	HyperTransport is an I/O link. With clock speeds of up to 1.4 GHz and DDR signaling, HyperTransport technology provides an effective throughput of 2.8 gigatransfers per pin-pair on a 32-bit link. This results in a maximum aggregrate throughput of 22.4 gigabytes per second, per link. (See http://www.hypertransport.org/tech/index.cfm)
IBA	InfiniBand architecture
IBTA	InfiniBand Trade Association (See http://www.infinibandta.org/ibta/)
InfiniBand access layer	Includes the user-mode components for management services, SM query, connection management, and work request processing, and the kernel mode components for InfiniBand PnP, management services, resource management, connection management, work request processing, and user-level proxy agent.
IPoIB	Internet protocol over InfiniBand. IP specifies the format of packets (also called datagrams) and the addressing scheme.
iSCSI	Internet SCSI (Small Computer System Interface). An IP-based storage networking standard for linking data storage facilities, developed by the Internet Engineering Task Force (IETF). By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances.
iSER	iSCSI extensions for RDMA
kDAPL	Kernel Direct Access Programming Library defines a single set of kernel-level APIs for all RDMA-capable transports. The kDAPL mission is to define a transport-independent and platform standard set of APIs that exploits RDMA capabilities, such as those present in IB, VI, and iWARP.
kernel modules	Changes to the Linux kernel needed to support the rest of the OpenIB software stack.
LAB	Aggregate link bandwidth. LAB is defined as the minimum of the aggregate memory bandwidth, aggregate bus bandwidth, or the sum of unidirectional link peak user payload data bandwidth.
LAPI	Low-level application programming interface. An active-message-type API for optimal communication through the IBM SP switch. Provides reliable, unordered communication between all processes in the MPI world.
latency	The time interval between the instant at which an instruction control unit initiates a call for data transmission, and the instant at which the actual transfer of data (or receipt of data at the remote end) begins. Latency is related to the hardware characteristics of the system and to the different layers of software that are involved in initiating the task of packing and transmitting the data.
LVDS	Low voltage differential signaling. An electrical spec (EIA-644) used by InfiniBand. LVDS is designed with an output voltage swing of 350 mV at better then 400 Mbps into a 100 ohm load, across a distance of about 10 meters.
MB	Megabyte is a million base 10⁶ bytes. This is typically used in every context except for random access memory size and is 106 (or 1,000,000) bytes.
MiB	Mebibyte is a million base 2 bytes. This is typically used in terms of Random Access Memory and is 220 (or 1,048,576) bytes.
MPI	Message passing interface. An industry standard, message-passing protocol that typically uses a two-sided send-receive model to transfer messages between processes.
MPI-2	Extensions to the MPI standard.
MPI I/O	An MPI extension allowing for the manipulation of files on different file systems.
MR	Mandatory requirement. Mandatory requirements are items that are essential to the University and reflect the minimum qualifications an offeror must meet in order to have their proposal evaluated further for selection.
MTBF	A measurement of the expected reliability of the system or component. The MTBF figure can be developed as the result of intensive testing, based on actual product experience, or predicted by analyzing known factors.
NIC	Network interface card. An expansion board you insert into a computer so the computer can be connected to a network. Most NICs are designed for a particular type of network, protocol, and media, although some can serve multiple networks.
nonblocking operation	An operation, such as sending or receiving a message, that returns immediately whether or not the operation was completed. For example, a nonblocking receive will not wait until a message is sent, but a blocking receive will wait. A nonblocking receive will return a status value that indicates whether or not a message was received.
open source	Software products provided under an open source license(s) found at http://www.opensource.org.
parallelism	The degree to which parts of a program may be concurrently executed.
PCI Express	A dual-simplex, point-to-point serial differential low-voltage peripheral component interconnect. Previously known as 3GIO and Arapahoe. PCI Express allows a bandwidth up to 500 MB/s duplex for each link, 8 GB/s for sixteen lanes (x16).
PCI-X	A follow-on initiative to PCI (peripheral component interconnect). PCI-X allows a bandwidth up to 1 GB/s for 64 bit bus running at 133 MHz. [Note that we distinguish PCI-X and PCI Express.]
PERUSE	MPI performance examination and revealing unexposed state extension specification; the specified API.
PMPI	Profiling interface for MPI specified by the MPI standard.
Portals (Sandia Portals)	Low-level API providing reliable and ordered communication for Lustre.( See http://sourceforge.net/projects/sandiaportals.)
POSIX	Portable Operating System Interface. A set of IEEE standards designed to provide application portability between UNIX variants. IEEE 1003.1 defines a UNIX-like operating system interface, IEEE 1003.2 defines the shell and utilities and IEEE 1003.4 defines real-time extensions.
QDR	Quad data rate
RC	Reliable connection
RDMA	Remote direct memory access. RDMA capability allows processes executing on one node of a cluster to be able to "directly" access (execute reads or writes against) the memory of processes within the same user job executing on a different node of the cluster.
reduction operation	An operation, usually mathematical, that reduces a collection of data by one or more dimensions. For example, the arithmetic SUM operation is a reduction operation which reduces an array to a scalar value. Other reduction operations include MAXVAL and MINVAL.
RHEL	Red Hat Enterprise Linux
RMA	Remote memory access. A user-level communication protocol that provides ability for a task to access memory of another task by the use of put/get operations.
RTS	Run-time system
Scalability	Tested on 4,096 node physical fabrics and scaling properties simulated up to 16,384 nodes.
SDP	Sockets direct protocol. SDP is an IBA-specific protocol defined by the Software Working Group (SWG) of the IBA. The SDP specification maintains traditional sockets SOCK STREAM semantics as commonly implemented over TCP/IP, as well as support for byte-streaming over a message passing protocol, including kernel bypass data transfers and zero-copy data transfers.
SDSM	Software-based distributed shared memory
SMP	Shared memory multiprocessor. A set of CPUs sharing random access memory within the same memory address space. The CPUs are connected via a high speed, low latency mechanism to the set of hierarchical memory components. The memory hierarchy consists of at least processor registers, cache and memory. The cache shall also be hierarchical. If there are multiple caches, they shall be kept coherent automatically by the hardware. The main memory may be a non-uniform memory access (NUMA) architecture. The access mechanism to every memory element shall be the same from every processor. More specifically, all memory operations are done with load/store instructions issued by the CPU to move data to/from registers from/to the memory. A single SMP may be partitioned into one or more nodes.
SOW	Statement of work
SPMD	Single program multiple data
synchronization	The action of forcing certain points in the execution sequences of two or more asynchronous procedures to coincide in time.
Test harness and modules	Software to automatically test the functionality, performance, reliability, and robustness of the components of the OpenIB software stack.
TF	Teraflop. A measure of the peak computing power of a machine in 10¹² floating point operations per second.
thread	A single, separately dispatchable, unit of execution. There may be one or more threads in a process, and each thread is executed by the operating system concurrently.
TLP	Thread level parallelism
UD	Unreliable datagram
uDAPL	User Direct Access Programming Library defines a single set of user-level APIs for all RDMA-capable Transports. The uDAPL mission is to define a transport-independent and platform standard set of APIs that exploits RDMA capabilities, such as those present in IB, VI, and RDDDP WG of IETF.
ULP	Upper layer protocols. APIs for applications to perform IB communications operations. For instance, MPI-2, IPoIB, SDP, and Sandia Portals.
UPC	Unified Parallel C. An extension of the C programming language designed for high-performance computing on large-scale parallel machines.The language provides a uniform programming model for both shared and distributed memory hardware. The programmer is presented with a single shared, partitioned address space, where variables may be directly readand written by any processor, but each variable is physically associated with a single processor. UPC uses a SPMD model of computation in which the amount of parallelism is fixedat program startup time, typically with a single thread of execution per processor.
URDMA	Unacknowledged, unreliable RDMA capability.
UTR	University technical representative
VAPI	InfiniBand verbs applications programming interface
VP	Virtual processor. Used in the context of assigning multiple "virtual" processors to each of physical processors.

Last modified September 7, 2006
UCRL-MI-126792