NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
  SAN Fabric Technology Evaluations and Results

SAN Fabric Technology Evaluations and Results

Storage area networks (SANs), by providing a high-performance network fabric oriented toward storage device transfer protocols, allow direct physical data transfers between hosts and storage devices. Currently, most SANs are implemented using Fibre Channel (FC) protocol-based fabric. Emerging alternative SAN protocols, such as iSCSI (Internet Small Computer System Interface), FCIP (Fibre Channel over IP), and SRP (SCSI RDMA [Remote Direct Memory Access] Protocol), are enabling the use of alternative fabric technologies, such as Gigabit Ethernet and the InfiniBand, as SAN fabrics.

One of the major GUPFS goals in FY 2003 was to evaluate the performance characteristics of SAN fabric technologies and the interoperability of these technologies in a hybrid, multiple-fabric environment.The ability of the file systems to operate in such a mixed environment is very important to the ultimate success of the GUPFS project.

A picture of the current GUPFS fabric configuration is shown in Figure 1.

 

 

Figure 1. The GUPFS fabric configuration.

1         2 Gb Fibre Channel Switch

The 2 Gb/s Fibre Channel (FC) technology has matured and become the standard product offering by most FC storage and switch vendors. Many FC vendors have introduced new switch products with higher port connectivity. The increase in port connectivity allows more clients and storage to be connected to the same switch, which allows fewer cascading switches to be used in the deployment and substantially simplifies fabric management. It is important to evaluate the improved FC technology to determine how this may affect the fabric topology and the selection of the file systems.

SAN scalability is another important area of investigation when looking at the deployment of new switches in an existing fabric. SAN scalability measures how a fabric design can grow without requiring a substantial re-layout of existing fabric topology. An effective SAN architecture needs to be able to accommodate additional servers, switches, and storage with minimal impact to the existing SAN operation.

The initial GUPFS testbed had a Brocade SilkWorm 2800, which has a 16-port 1 Gb/s switch. In early FY 2003, we added two additional switches to the testbed:

·       A Brocade SilkWorm 3800 16-port 2 Gb/s Fibre Channel switch

·       A Qlogic SANbox2-16 16-port 2 Gb/s Fibre Channel switch

Both switches are capable of sustaining 32 Gb/s (full duplex) nonblocking switch throughput. During FY 2003, we tested the new 2 Gb/s switches to determine the impact of 2 Gb/s FC on I/O performance in the SAN fabric. The tests were run in three configurations: Direct Attach, SAN fabric with the Brocade 3800 switch, and SAN fabric with the Qlogic SANbox2-16 switch. In the Direct Attach configuration, an EMC CX 600 2 GB/s FC port was directly connected to a Linux host. The results obtained from the Direct Attach configuration are used as the baseline for comparison. In the SAN fabric setups, the EMC FC port and the Linux host were connected to a switch. The two charts in Figure 2 show the PIORAW performance for in-cache (IC) reads and writes in the three test configurations.

 

 

 

 

 

 

 

 


Figure 2. Fibre Channel switch performance.

The results indicate that both the Brocade SilkWorm 3800 switch and Qlogic SANbox2-16 switch were capable of sustaining a 200 MB/s transfer rate. For in-cache writes, there was no noticeable performance difference between the tests in the direct-attach setup and the tests in a switched configuration. However, the read performance on the Brocade SilkWorm 3800 was about 3% to 4% slower than the performance in the direct-attach configuration or with the SANbox2-16 switch for I/O sizes larger than 16 KB. These results indicate that, from a performance point of view, the Qlogic SANbox2-16 switch is a better switch than the Brocade SilkWorm 3800 switch.

We also tried to put all three switches into a single fabric to investigate how switches from different vendors may work together in a hybrid environment; however, our attempt was a total failure. The switches could not interoperate with one another. The original Brocade 2800 that we obtained in FY 2002 only had a base license and did not have any optional licenses such as zoning support. FC zoning allows port isolation, which allows hosts to only access storage that is in the same zone. When we put the SilkWorm 2800 switch together with the SilkWorm 3800 switch with zoning defined, the SilkWorm 2800 switch failed to function properly. We also could not link the Brocade SilkWorm 3800 switch to the Qlogic SANbox2-16 switch. Interoperability between a Brocade switch and a non-Brocade switch would not work unless we changed the interoperablity mode on the SilkWorm 3800 switch. However, if we did that, according to Brocade Support, it would void the support contract and the SilkWorm 3800 switch would become unsupported by Brocade. Our experience seems to indicate that switch interoperability is still a big issue and is a long way from becoming a reality.

The interoperability issue also existed between the Cisco SN5428 iSCSI switch and the Brocade switch. This was because the embedded FC switch inside the Cisco SN5428 switch is actually a Qlogic switch. Because of this, there was no interoperability issue between the Cisco SN5428 switch and the Qlogic SNAbox2-16 switch.

Inter-Switch Link (ISL) trunking is another area in which we encountered interoperability problems during our evaluation. ISL trunking allows load balancing between multiple ISL links, which improves switch-to-switch performance and reduces traffic congestion. Many FC switch vendors provide ISL trunking up to four paths combined into one logical path up to 8 Gb/s for inter-switch connections. Unlike Ethernet or IP trunking, which are commonly available on the IP network switches, most FC switch vendors support ISL trunking only for their own switches, and often ISL trunking is not supported when switches are from different vendors. The lack of ISL trunking support for heterogeneous switches may become an issue for the GUPFS project if a hybrid fabric is required.

2         iSCSI Evaluation

Additional SAN fabric technologies are beginning to appear. Using Ethernet as a SAN fabric is becoming possible because of the iSCSI standard for doing SCSI storage traffic over IP networks. This is very attractive as it allows lower-cost SAN connectivity than can be achieved with Fibre Channel, although with lower performance. It will allow large numbers of inexpensive systems to be connected to the SAN and use the cluster file system through commodity components.

In FY 2003, we evaluated the iSCSI technology in the following areas:

·       Software iSCSI performance over traditional Gigabit Ethernet (GigE) interfaces

·       Hardware iSCSI performance over Intel iSCSI HBA (Intel PRO/1000 T IP HBA)

·       Interoperability among IP routers, the storage router, and FC switches

Figure 3 shows the difference between the software iSCSI performance and the native FC performance, with the FC performance as the baseline. The FC performance was obtained using a Qlogic QLA2300 HBA. Both tests used the same storage on the Yotta Yotta GSX 2400 NetStorager accessed through its 2 Gb/s FC ports.

The results indicate that the iSCSI performance was only about 60% to 80% of the baseline Gigabit Ethernet performance over the GigE interface. The main reason for the performance degradation of iSCSI was probably the overhead of the TCP/IP software stack, on top of which the iSCSI protocol is implemented. Using software iSCSI for storage access appears to cause the usual performance degradation caused by TCP/IP traffic. However, since TCP/IP is available across almost every fabric and interconnect, iSCSI can be a nearly universal mechanism for accessing storage with shared file systems. In addition, because software iSCSI does not require additional expensive interfaces, it is a cost-effective way for accessing storage with shared file systems for inexpensive client systems that can tolerate the lower performance it offers.

We also measured the differences in CPU utilization between doing transfers over FC and transfers using software iSCSI over a regular GigE interface in order to study the CPU overhead associated with different storage transfer protocols.

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 3. FC and iSCSI performance comparison.

 

 

Figure 4. CPU utilization comparison.

Figure 4 shows a comparison of the CPU overhead of a single-thread read when done with software iSCSI over a regular GigE interface and when done using the native FC interface. The CPU measurements were reported by the Linux vmstat command. The figure indicates that, for single-thread sequential read, software iSCSI used more CPU cycles while achieving lower throughput numbers, whereas FC used less CPU (about 1/8 for larger I/O sizes and 1/2 for smaller I/O sizes) and had higher throughput numbers.

We are interested in studying the capabilities, performance, and overhead of hardware-assisted iSCSI solutions for use in the deployed GUPFS solution. Because iSCSI HBAs appeared to offer the potential for the greater performance and lower CPU overhead by offloading both the iSCSI protocol processing and Ethernet processing, we chose to study these devices rather TOE cards.

To study the benefit of iSCSI HBAs, we evaluated the Intel iSCSI HBA (Intel PRO/1000  T IP HBA). Figure 5 gives a comparison between the performances of software-based iSCSI using a standard GigE interface and the hardware-assisted Intel iSCSI HBA. The tests were run with the Cisco SN5428 iSCSI storage router bridging the Gigabit Ethernet fabric to a 2 Gb/s FC port on the Chaparral controller. The figure shows that the performance of software iSCSI (using the Cisco’s iSCSI software driver and a regular GigE interface) was actually higher than Intel’s iSCSI HBA. This seems to contradict the general expectation that higher performance would be achieved with a hardware-assisted iSCSI HBA card  than with the software iSCSI implementation going through the regular TCP/IP stack.

 

 

Figure 5. Comparison between Software iSCSI and iSCSI HBA.

 

Figure 6. Normalized CPU overhead.

To gain further insight into the differences between these two technologies, we measured the CPU overhead associated with each. We did see lower CPU utilization when the iSCSI HBA was used. However, due to the lower I/O performance accomplished by the iSCSI HBA, it was difficult to do a fair comparison without normalizing the CPU utilization. Figure 6 shows the normalized CPU utilization comparison between these two technologies.

The normalized CPU utilization is defined as the %sys reported by the vmstat command divided by the throughput number. The normalized CPU utilization is used to compare the CPU overhead for transferring the same amount of data in each second. As the figure indicates, although the software iSCSI was able to achieve a better throughput number, it also used more CPU cycles to transfer the same amount of data. However, it was a surprise to us that the saving on the CPU utilization with the iSCSI HBA was only about 50% over the software iSCSI. This seems to indicate that, even with offloading TCP/IP processing to the iSCSI HBA, the CPU overhead of iSCSI is still significantly higher than the CPU overhead of a Fibre Channel HBA. In fairness, we must note that both the iSCSI HBA hardware and driver software were very new and clearly not yet optimized. This was also true of the purely software iSCSI implementation tested. Nonetheless, we are disappointed in the iSCSI HBA performance and CPU overhead.

Another consideration when evaluating the utility of iSCSI HBAs and their role in a deployed GUPFS solution is their cost. Although these early iSCSI HBAs are priced somewhat cheaper (25% to 50%) than the FC HBAs, they are still much more expensive than regular Gigabit Ethernet interfaces, which are under 10% of the cost of an FC HBA. Given their limited performance, substantial system overhead, and high cost, it is our conclusion that iSCSI HBAs are not cost effective and software iSCSI solutions using standard Ethernet interfaces should be used, unless there are compelling CPU overhead considerations for specific systems.

Another important consideration for deploying iSCSI technology is the availability of iSCSI target devices (storage devices with iSCSI protocol interfaces). Currently, very few storage vendors have supported or plan to support native iSCSI target interfaces on their storage systems. An alternative is to use the iSCSI routers or bridges that provide protocol conversion between the iSCSI protocol and the FC protocol. The Cisco SN 5428 iSCSI router and Cisco MDS 9509 switch are two examples of such devices. However, none of these technologies currently supports a large number of iSCSI ports for shared access to the same storage pool. The low iSCSI port count per storage system or iSCSI router/bridge will impact the maximal aggregate bandwidth of an IP-based SAN. In order to achieve high aggregate bandwidth as envisioned by the GUPFS project, a much more complicated fabric infrastructure may be needed, with several tiers of switches and routers, and the iSCSI routers/bridges on the edge of the fabric fanning out for host connectivity.

Despite some performance limitations and the limited availability of iSCSI storage devices and fabric bridges, iSCSI shows real promise for enabling systems to cost effectively access a block-based shared file system without requiring expensive fabric interfaces be added to those systems..

3         InfiniBand Switches

In addition to using Ethernet as a SAN, the InfiniBand interconnect shows promise for use in a SAN as a transport for storage traffic. InfiniBand offers performance (high bandwidth and low latency) beyond that of either Ethernet or Fibre Channel, with even higher bandwidths planned. With the possibility of commodity-level pricing and the likelihood of some future NERSC systems using InfiniBand as an interconnect , this is a technology that needs to be studied, even in its very early stages. As with Ethernet fabrics, an important part of the InfiniBand technology that needs to be examined is fabric bridges between InfiniBand and Fibre Channel and Ethernet SANs. Another important area of exploration is storage transfer protocols (e.g., SRP) and methodologies. We investigated both of these.

In 2003, we evaluated the InfinIO 7000 InfiniBand switch from InfiniCon Systems and the Topspin 90 switch from Topspin. Both the 1x and 4x InfiniBands were tested.  When 4x InfiniBand results become available, we will present those results instead of the older 1x results. Figure 7 shows a picture of the InfinIO 7000 and Topspin 90 switches in the GUPFS testbed environment.

Figure 7. The InfinIO 7000 shared I/O switch and the Topspin 90 switch.

3.1                  InfiniCon InfinIO 7000 InfiniBand Switch

The InfinIO 7000 shared I/O system is a multi-protocol networking system for shared I/O and InfiniBand switching. The server nodes attach to the InfinIO 7000 switch via a high-speed, 10 Gb/s (4x) InfiniBand connection to access a pool of virtual I/O resources, including Fibre Channel SANs, Ethernet SAN or network attached storage (NAS), and native InfiniBand fabrics. This shared I/O architecture eliminates the need for separate individual Ethernet NICs, FC HBAs, and cabling on the server nodes. This saving of operational and capital costs for infrastructure could be significant.

Each InfinIO chassis supports dual 4x InfiniBand switch modules and up to eight I/O “personality modules.” The I/O personality module can be a three-port 1 Gb Virtual Ethernet Exchange (VEx) card, a two-port 2 Gb Virtual Fibre Channel Exchange (VFx) card, or a six-port InfiniBand Expansion (IBx) card. Chassis slots can be populated with any mix of personality modules, which can be hot-swapped to accommodate configuration changes.

Figure 8 shows what a single 4x HCA was able to achieve with a single Linux host, using two Infin7000 Fibre Channel (VFx) line cards. Each VFx card has two 2 Gb/s FC ports. With two VFx line cards, the aggregate FC bandwidth is 8 Gb/s. A single 4x HCA is able to sustain a 10 Gb/s data transfer rate, so theoretically a single 4x HCA would be able to saturate the 8 Gb/s aggregate FC bandwidth with the two VFx cards. However, our results indicate that the best performance we were able to achieve was about 470 MB/s. The underlying storage used was the Yotta Yotta GSX 2400 storage system with four 2 Gb/s FC ports. The Yotta Yotta GSX 2400 was demonstrated to be able to sustain more than 750 MB/s for in-cache read and write with four 2 Gb/s FC ports, so the underlying storage was not likely a bottleneck. It seems that, in this test configuration, the Linux host seemed to be the bottleneck. However, it is not clear, without more investigation, whether the bottleneck was the CPU, the PCI-X bus, benchmark software, driver software, Vfx module, or something else.

 

 

Figure 8. Aggregate Infin7000 FC performance with a single 4xHCA.

Nonetheless, these SRP results are very encouraging. They show that a single 4x HCA can sustain twice the I/O performance of a single 2 Gb/s FC HBA, and the cards cost about the same. From a price-performance point of view, it seems to indicate that InfiniBand with the SRP protocol may be more cost-effective than the FC solution. However, since InfiniBand technology is still immature, both InfiniBand and the SRP technologies are still evolving. Currently, it may be a little bit risky for a full adoption of the InfiniBand technology for GUPFS, but it is likely that in the future InfiniBand will factor into the deployed GUPFS solution.

3.2                  Topspin TS90 InfiniBand Switch

The Topspin 90 contains 12 InfiniBand ports that are used to create a single 10 Gb/s fabric for inter-process communications, storage, and networking. The Topspin 90 switch can be expanded from the 12-port base configuration by inserting an optional Ethernet Gateway-Router Module or a Fibre Channel Gateway Module into the expansion slot. The Fibre Channel Gateway Module supports two 2 Gb/s Fibre Channel (FC) ports, while the Ethernet Gateway-Router Module supports four 1 Gb/s GigE ports.

Topspin also manufactures InfiniBand 4xHCA adapters that support a full 10 Gb/s of peak bandwidth. With the 4xHCA adapter, virtual NICs or HBAs can be created in every server for networking or storage access.

Figure 9 shows a comparison between the native 2 Gb/s FC HBA performance and the SRP performance of a single 4xHCA and the Topspin 90 switch with a Fibre Channel Gateway Module. Both tests used the same FC-connected storage device. The results indicate that a Linux host with the SRP driver was able to achieve a performance similar to a 2 Gb/s FC HBA. These results match the SRP performance we saw earlier with the InfiniCon 4xHCA (see Section 3.1, above). It is encouraging that there are more InfiniBand vendors that support the InfiniBand to FC bridging capability. (Voltaire is another InfiniBand vendor that support this bridging capability.) The fact that  more than one vendor is supporting this InfiniBand to FC bridging capability indicates that there is a perceived market for it and the competition may make the technology more robust with improved performance.

Figure 9. The SRP performance with the Topspin 90 switch.

4         Fabric Performance Comparison

A single-process read performance comparison was made using several fabric technologies: Fibre Channel (FC), SRP over InfiniBand (SRP), iSCSI over GigE (ISCSI_GE), and iSCSI over IPoIB (ISCSI_IB). 

The FC performance was obtained using a Qlogic QLA2300 HBA, which provides the baseline.  The SRP performance was obtained using a 1x HCA from InfiniCon. An InfiniCon InfinIO 7000 switch was used to provide InfiniBand to FC bridging.  The iSCSI over GigE (ISCSI_GE) performance was obtained using the software iSCSI driver over a GigE network interface card. A Cisco SN5428 iSCSI router was used to provide iSCSI to FC bridging. The iSCSI over IPoIB (ISCSI_IB) performance was obtained using the software iSCSI driver and the IPoIB (IP over InfiniBand) protocol, over a 1x HCA from InfiniCon. An InfiniCon InfinIO 7000 switch was used to provide InfiniBand to Ethernet IP bridging and a Cisco SN5428 iSCSI router was used to provide iSCSI Ethernet (IP) to FC bridging.

Figure 10 shows the results of single-thread reads of different I/O sizes using different fabric technologies. The best performance was achieved by the 2 Gb/s FC interface, followed by the SRP protocol over InfiniBand. Since the iSCSI traffic was passing through a single Gigabit Ethernet interface, the best iSCSI performance would be less than 100 MB/s. With the additional TCP/IP software stack overhead of IpoIB on top of the immature InfiniBand drivers, iSCSI over IPoIB delivered the lowest performance for single-thread reads. It is clear from these results that FC is the leader and delivers the best performance for storage access. However, when InfiniBand is used as the cluster interconnect, SRP may be the preferred mechanism for storage access as it eliminates the cost of additional FC HBAs for each host and still delivers a good I/O performance. 

 

 

Figure 10. Storage fabric performance.

 

Figure 11. CPU overhead of storage fabric.

Figure 11 shows the CPU overhead of different protocols for single-thread reads. FC, which delivered the best performance, also generated the least CPU overhead. The iSCSI protocol allows the standard SCSI packets to be enveloped in IP packets and transported over standard Ethernet infrastructure, allowing SANs to be deployed on any networks supporting IP. This option is very attractive as it allows lower-cost SAN connectivity than can be achieved with Fibre Channel, although with lower performance. It will allow large numbers of inexpensive systems to be connected to the SAN and use the shared file system through commodity-priced components. While attractive from a hardware cost perspective, this option does incur a performance impact on each host because of increased traffic through the host’s IP stack (Figure 11).

Note that the SRP results and the iSCSI over IB results shown here were obtained using the InfiniCon 1x HCA, which was the only InfiniBand HCA available when the tests were conducted. Improved results may be possible with the newer 4x HCA, but due to the lack of resources, the tests were not repeated. However, separate SRP tests with the 4xHCA seemed to demonstrate better I/O performance, as shown in Section 3.1.

5         Summary

The results of our 2 Gb/s Fibre Channel SAN fabric testing show that individually the 2 Gb/s FC  switches perform well and are able to achieve the entire 200 MB/s per port performance provided by 2 Gb/s FC.  The two 2 Gb/s Fibre Channel switches evaluated performed nearly identically except for the minor 3% to 4% lower read performance of the Brocade 3800 compared to the Qlogic SANbox2-16. The performance of storage transfers through 2 Gb/s FC fabrics based on each of these switches was nearly identical to that of the same 2 Gb/s FC storage devices when directly connected to the hosts. Overall, the performance of 2 Gb/s FC fabrics was good.

While the performance of 2 Gb/s FC fabrics was good, we did find interoperability problems between different vendors’ switches. Two areas of interoperability deficiency were identified. One problem was that Brocade FC switches were unable to coexist with other vendors’ switches without resorting to unacceptable configuration and support options. We recommend the use of FC switches by vendors other than Brocade for GUPFS deployment. A second problem area was in the use of trunking Inter-Switch Links (ISL) to achieve higher inter-switch bandwidth, which will be needed for larger fabrics. Those vendors that provide FC ISL support, normally do so only between their own switches. This may become a problem in large heterogeneous fabrics.

Standardization efforts are underway in the storage industry that are targeting improved Fibre Channel interoperability. We need to continue monitoring and testing FC fabric interoperability as GUPFS moves towards deployment.

Our evaluation of iSCSI has lead us to believe that it is a viable technology that allows block storage transfers across any fabric or system interconnect that supports IP protocol traffic. We tested both hardware and software iSCSI implementations. The results of our tests lead us to believe that software iSCSI works well and is inexpensive. However, software iSCSI has a lower performance level than FC, and this comes with the cost of substantially higher system overhead. The results of our tests of the hardware iSCSI HBA solutions lead us to conclude that at this time software iSCSI solutions perform better and that the iSCSI HBAs are currently not cost effective based on their lesser performance and still substantial system overhead.

Our iSCSI evaluations demonstrated that multiple fabrics can be bridged together for block storage transfers. We successfully bridged iSCSI traffic from Gigabit Ethernet–attached hosts to FC storage devices and bridged iSCSI traffic from InfiniBand-attached hosts to a Gigabit Ethernet fabric and then to FC storage devices. This illustrates the flexibility of iSCSI and also how lower-cost fabrics, such as Gigabit Ethernet, can be used to cost effectively access storage for block-based shared file systems running on low-cost systems. In order to prepare for a multiple fabric GUPFS deployment, we need to continue tracking iSCSI developments, especially in the areas of additional iSCSI fabric bridges and iSCSI attached storage devices, and conduct evaluations of these as they become available .

The evaluations that we conducted of InfiniBand fabrics from two different vendors showed that InfiniBand is a successful and effective interconnect for SAN storage traffic. We were successful at accessing FC-attached storage through the InfiniBand fabric using both the native IB SRP storage transfer protocol and iSCSI with IP over IB. This was accomplished through the use of FC and Gigabit Ethernet gateway modules in the IB switches, which allowed all three fabrics to be bridged together into a single fabric. The IB SRP protocol showed very good performance that easily matched and in some configurations exceeded that of the 2 Gb/s Fibre Channel.

InfiniBand is very interesting in that it allows both high-performance message passing and storage transfers to be conducted using a single interconnect and host adapter. We expect to see InfiniBand in future NERSC systems. Therefore, in order to determine how to best integrate IB into a GUPFS deployment, we need to continue tracking its development and evaluating its SAN performance as new switches, fabric bridge gateways, and higher speed fabrics appear.

 


LBNL Home
Page last modified: Wed, 23 Jun 2004 00:02:02 GMT
Page URL: http://www.nersc.gov/projects/GUPFS/results/network/SAN.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science