Use of the User Level Internet Protocol Stack on the Intel Paragon Supercomputer Richard Prohaska GigaNet Incorporated July 16, 1996 Introduction User level protocol processing is used to improve the Internet protocol performa nce on the Intel Paragon supercomputer. This paper describes how to use the use r level protocols. Implementation The standard implementation of the sockets API and the Internet protocol stack r esides in the UNIX server, one of the processes that exists in every OSF node of the Paragon. Only OSF nodes that contain network interface hardware actually e xecute the stack. All other nodes interact with these UNIX servers using protoc ols that implement single system image semantics. Applications do not have to k now about the locations of the network interfaces in order to use them. Unfortu nately, the protocols that implement single system image semantics severely limi ts the network performance of the machine. The software path that data follows between an application address space and the network interface hardware is very complex in the standard implementation and i s the performance bottleneck. GigaNet decided to simplify the path between the application address space and the network interface hardware by moving the socke ts API and the Internet protocol stack into the application address space. All protocol operations are contained within the application space itself. The user level protocol implementation coupled with a lightweight Paragon mesh protocol resulted in network performance ten times faster than the standard implementati on. The Internet protocols run in a multiple threaded environment supported by the p threads library. An application linked with the new sockets library is changed from a simple sequential program into a multiple threaded program. Omissions Single system image semantics. The user level sockets library binds all of the sockets to a particular node in the Paragon that contains a network interface. The application must connect the library to the appropriate node. ATM network only. The user level sockets library only works for ATM network int erfaces. IP options. The IP datagram filtering software does not parse IP datagrams that contain IP options. Datagrams with options are not forwarded to the correct ap plication. IP fragments. The IP datagram filtering software does not reassemble IP fragmen ts. All IP datagrams must be less than or equal to the ATM network MTU. Route table synchronization with the UNIX server. The IP route table is contain ed in the UNIX server and is updated by the routed process running on the Parago n. Each application that is using the user level sockets library contains its o wn route table. The user level route tables are not cache coherent with the UNI X server route table. Arp table synchronization with the UNIX server. Similar situation to the route table. ICMP protocol. ICMP datagrams are not directed from the UNIX server to the appro priate socket library. TCP connection cleanup. The state of a TCP connection may live longer than any file descriptors that have referenced the connection. For example, a TCP connec tion will exist after an application has closed the last file descriptor referen ce to it until the three way disconnect handshake with the remote TCP endpoint h as been completed. TCP connections are stored in the UNIX server, which typical ly outlived all other processes. If the TCP protocol is moved into an applicati on, then the application may exit before all of the TCP connections in it have b een disconnected. Select system call. The select system call does not work for user level sockets . Fork system call. User level sockets are not inherited by child processes. UDP transmit buffers. GigaNet has implemented a zero copy UDP transmitter that does not provide identical semantics to the UNIX implementation. After a send s ocket operation has returned, the UDP datagram may be queued for transmission bu t not yet transmitted. Socket API changes extern int socket(int domain, int type, int protocol); The socket function creates a socket in the UNIX server. Extern int usocket(int domain, int type, int protocol); The usocket function creates a socket in the user level socket library. All arg uments are identical to the socket function. extern int netinit(void); The netinit function initializes the user level socket library and protocol stac k. Global Variables extern int net_server_node; The net_server_node variable contains the node number of the Paragon node that c ontains the network interface. The default value is 0. extern int sb_max; The sb_max variable controls the maximum amount of socket buffering. The defaul t value is 64K bytes (K=1024). Applications that need extended TCP windows can increase this variable. extern int nmbclusters; The nmbclusters variable controls the size of the internal socket buffer pool. The default value is 16. The recommended value is 256. extern int atm_io_trace; The atm_io_trace variable enables a trace of all ATM input and output operations . The default value is 0 (disable trace). extern int atm_io_if_cksum; The atm_io_if_cksum variable enables ATM hardware level checksumming. The defau lt value is 1 (enable hardware checksumming). Initialization The main function should initialize the socket library variables specified above and then call the netinit function. Compilation -D_SOCKADDR_LEN -D_NOREENTRANT See documentation on the pthreads library. Libraries libsockets.a is a new library that contains the socket API and the Internet prot ocol stack. libatm.a is a new library that contains the ATM AAL5 API. libpthreads.a is a library that contains the POSIX compatible threads package. Libc_r.a is the multi-threaded standard C library. Conclusion The TTCP network performance program has been ported to use the user level socke ts library. The source code to the modified TTCP program is available.