Several different server hardware configurations have been
evaluated. The picture below shows two of the packaged evaluation
units from ASA and Polywell computers. Also shown is a Fiber Channel
disk array which was connected via optical link to a CDF Level3
trigger node for evaluation.
ASA Computers Inc.
2354, Calle Del Mundo
Santa Clara CA 95054
Telephone : (408) 654-2901 xtn 205
(408) 654-2900 ask for Sean
(800) REAL-PCS (1-800-732-5727)
Fax: (408) 654-2910
E-mail :
sean@asacomputers.com
URL :
http://www.asacomputers.com
-- Intel STL2 Dual P3 ATX MB
-- Pentium III 1000MHz-133FSB 256Kcache
-- 1GB SDRAM DIMM 32x72 PC-133 ECC Reg
-- 4U Rackmount Chassis
-- 400W PS2 / Mini Redundant ATX power supply
-- Two 3Ware 7850 8-port Ultra-IDE RAID-5 Controllers
-- -- 16 WD 100GB Ultra-100 IDE 7200RPM Hard Drive
--- 2.4.2 kernel
The two pictures below show the ASA server:
Jerry Tighe
Polywell Computers, Inc.
1461 San Mateo Ave
So. San Francisco, CA 94080
E-mail :
jerrytighe@polywell.com
URL :
http://www.polywell.com
Phone:1-800-999-1278 Ext.128
650-583-7222 Ext.128
Fax:650/583-1974
-- Poly 865DU3 Dual P3-370 ATX MB w/2xU160
-- Pentium III 1000MHz-133FSB 256Kcache
-- 1GB SDRAM DIMM 32x72 PC-133 ECC Reg
-- 20-Bay Cube Server Chassis
-- 2x400W Redundant ATX Power Supply
-- 3Ware 6200 2-port Ultra-IDE RAID Controller
-- -- 2 WD 40GB Ultra-100 7200rpm IDE HD(Mirrored Sys)
-- 3Ware 7850 8-port Ultra-IDE RAID-5 Controller
-- -- 16 WD 100GB Ultra-100 IDE 7200RPM Hard Drive(Raid)
-- 2.4.3 kernel
The two pictures below show the Polywell server.
-- Tyan Thunder LE S2510 Dual P3 1 GHz
-- 256 MB RAM
-- QLogic ISP2200 FibreChannel-to-SCSI PCI controller (64 bit PCI)
-- Chaparral K7413 FiberChannel 1.3TB disk array (eight 170GB
SCSI disks, hardware RAID controller)
-- Linux with 2.4.16 kernel
-- Supermicro P3TDLE dual P3 866 MHZ
-- 1024 MB ram PC 133 Mhz ECC
-- Escalade 7800 storage switch
-- 8 WD 100 GB 7200 rpm drives
-- Raid 5 configuration
-- riesertfs
-- FNAL RH 7.1
Phuoc Vu - Account Manager
Rackable Systems
721 Charcot Avenue
San Jose, CA 95131
E-mail :
pvu@rackable.com
URL :
http://www.rackable.com
Phone:408-321-0290 Ext.328
408-835-6673 (mobile)
3U-13 Hswap IDE storage server:
-- 3U CM HS Chassis, 300W Redundant Power Supply
-- Intel SDS2 MB with 512 MB 133 ECC memory
-- 13 160GB Maxtor IDE drives (5400 RPM, 12ms avg seek time)
-- Two 3ware Escolade 7850 IDE Raid controllers
-- -- RAID 5 - 6 drives
-- -- RAID 5- 7 drives ( 6 RAID, 1 for OS, boot drive )
-- loaded with RH 7.2 and 3ware 7.4 firmware (beta).
See also:
Advantages of Rackable servers and storage integrated solutions.
Rackable
powers the Human Genome Project.
The four pictures below show the Rackable server:
-- Arena Indy 2400
-- 19'' Rackmount chassis
-- Ultra 160 SCSI-IDE
-- 12 160GB Maxtor 5400rpm hard drives
For the purposes of the CDF CAF, we are interested in evaluating how
fast the server hardware can deliver a secondary data set to the
worker nodes in the farmlet. This has two main pieces - local read
bandwidth (i.e how fast can it retrieve data from disk into memory?)
and network bandwidth (i.e. how fast can it send data from memory to
remote memory over the network connection via some protocol like
NFS?). Write bandwidth is a comparatively minor concern (but should
not be completely ignored as we'll have to load the servers with new
secondary data) and therefore not systematically benchmarked.
This section deals with local disk read performance.
The primary tool used in the local and network benchmarking
present here was written as a perl script (disk_bench.pl)
which forks and manages a user-requested number of parallel "dd"
read processes to simulate simultaneous client access to server
data. Several reasons for writing a benchmarking script over
simply using an existing multi-thread i/o benchmark tool
(e.g. tiobench) are:
-- knowledge of exactly what the test is doing w/o having to look
at source code
-- ease of customization to match specific tests (e.g. benchmark
multiple controllers at once)
The read benchmarking algorithm using disk_bench.pl is as
follows. For N simultaneous read threads to be benchmarked, N dd
processes are forked and timed. The only subtlety is that while
all the dd threads start at the same time, they do not all
end at the same time. The consequence of this is that the
last k dd jobs to finish will not be "competing" with
N-k other dd threads throughout their entire process, and
therefore the throughput will be overestimated. To circumvent this
problem, we do not just fork N dd processes but rather N dd
sequences, each of "several" dd's with only the first dd
timed to calculate aggregate throughput. In practice, the
"several" dd's in the sequence corresponds to enough dd jobs to
ensure that the last timed dd finishes before the final dd in any
sequence finishes (this is checked in the script and the user
warned of throughput over-estimation is this occurs). As soon as
all of the timed dd processes finish, all other dd's are
killed. Memory caching effects also need to be address, as they
can cause significant overestimates over disk bandwidth. This is
taken care of by first reading a large file from disk, where
"large" is typically 1-2 times the size of the system RAM. An
another alternative not implemented would be to simply remount the
filesystem before forking the dd sequences.
An important feature built into disk_bench.pl is the ability to
read benchmarking files from multiple directories. An example of
the utility of this feature is when data is in two different
directories that represent two different i/o controllers. In this
case, the bandwidth for data randomly accessed from these two
directories would be underestimated by a benchmark of the two
controllers separately. Recall that the ASA and Polywell servers
have two Raid 5 volumes attached to different controllers. This
feature is no longer needed when we stripe across the two
controllers in a Raid50 configuration (to be discussed later).
The results of running disk_bench.pl to measure local disk
read performace are presented in this section. Benchmarking file
sizes are 512 MB and read block sizes 1M unless otherwise noted.
For the Fiber channel disk array attached to a Level 3 node,
comparing different Raid configurations:
#threads Raid0 Raid3 Raid5 (aggregate throughput in MB/s) -------- ----- ----- ----- 1 42 39 39 2 48 47 47 4 59 57 59 8 59 53 59 16 60 57 60 32 59 55 58 60 50 46 48 100 44 40 44
All three of the above Raid configurations give similar read
throughput as would basically expect (all striped data arrays). If
benchmarked, we would expect slower write performance for Raid 3,5
compared to Raid 0 because of parity overhead.
Any file server O(1 TB) would use a journaling filesystem for this
size filesystem. An unjournaled filesystem (like ext2) requires
manual filesystem checking and cannot be restored as quickly and
easily in the event or data corruption as a filesystem which logs
filesystem modifications. Simple practicalities also limit the use of
unjournaled filesystems - fsck can take several hours to check a 1
TB filesystem!
There are currently four options for journaling filesystems:
1) ext3 - simple extension of ext2 for journaling. In fact, an
ext2 fs can be converted to ext3 jfs without much problem which
makes it attractive for existing systems. Supported in 2.4.15
and later kernels.
2) ReiserFS - supposedly better performance than other jfs on
small files. Supported in 2.4.16 kernel.
3) XFS - SGI's jfs. Needed to patch 2.4.16 kernel to get this to
work.
4) JFS - IBM's jfs. Needed to patch 2.4.16 kernel to get this to
work.
I've repeated the exact same benchmark previously described but
using each of these journaling filesystems. The results are show
below (all for Raid 3 configuration, same FC + Level 3 node configuration):
#threads ext2 ext3 ReiserFS XFS JFS (agg tput in MB/s) -------- ----- ----- -------- --- --- 1 39 38 37 42 43 2 47 46 50 48 49 4 57 57 57 59 58 8 53 52 52 52 56 16 57 57 56 58 58 32 55 55 54 55 56 60 46 44 42 49 44 100 40 39 34 43 39
We can see that there is also little variability in the read
performance for the various journaling and ext2 filesystem in
these very limited tests. This is again maybe not to be all that
surprising, as we would naively expect the write performance to
more significantly impacted by the journal updating overhead.
We ran disk_bench.pl on the ASA and Polywell servers to benchmark the disk read performances. In this case benchmarking was done separately for "one controller" (directory) and "two controller" (directory). For the two controller test and all but one dd thread (one is the same for one and two controllers) the timed dd threads operate on benchmarking files from alternate directories (controllers). This is case of perfect controller load on the server and the "one controller" benchmark represents maximally unequal controller load. Therefore, the "one controller" and "two controller" benchmarking should be considered worst and best case file server throughput, respectively, simply using two hardware Raid 5 arrays. The results are shown below.
ASA server Polywell server #threads 1 cont 2 cont 1 cont 2 cont (agg tput in MB/s) -------- ----- ------ ------ ------ 1 83 83 86 90 2 57 117 64 128 4 54 107 62 119 8 47 108 43 109 16 48 97 38 82 32 47 94 36 72 60 46 89 32 69 100 40 82 26 66
In this case, we stripe the data (and parity) across the two Raid 5 arrays. This later striping is done in software and setting up Raid50 was really quite simple. The advantage of Raid50 over two separate Raid 5 is quite substantial, as shown below (Polywell server, 2.4.16 kernel, 512k chunck size, bs=dd block size):
#threads 1M bs 64k bs (agg tput in MB/s) -------- ----- ------ 1 127 182 2 147 190 4 152 165 8 141 146 16 132 129 32 127 120
The CPU and memory utilization for Raid50 are higher than for the previous tests of simple hardware Raid5, which I suppose is expected from the additional software raid overhead. In fact, we started swapping in and out the dd threads starting at 60 simultaneous threads on our 1GB RAM system, which caused incorrect results (this is why they're not shown).
Ultra ATA employs Cyclic Redundancy Check (CRC) data
verification for host bus transfers. This checks that data is
transferred without error between the drive and the host
controller, a CRC being performed for each burst of data by both
the host and the drive. At the end of the burst, the host sends
this information to the drive for comparison with the drive's
CRC. Although the data integrity is CRC checked, the commands
(e.g. read, write) and command parameters (e.g. sector/cylinder
addressing) are not CRC checked. This means that corruption can
still occur if data is written to or read from the wrong location
on the disk or if incorrect commands are communicated to the
drive (the latter is probably less likely). With the very large
numbers of disks to contend with in a 100 server (~1600), data
integrity issues become a concern.
In order to check this, I wrote a simple Perl script (
md5_checks.pl
) which simply reads a file repeatedly from disk, computing the
md5sum each time (md5sum is a different checksum from CRC which is
considered more robust in the sense that it is less likely that
different files will have the same checksum) and comparing this to
the know checksum of the file.
This script was run overnight simultaneously on both of the ASA's
Raid 5 arrays without any failed checksums encountered. The script
completed 12000 iterations of reading a random-byte file. This
file was created with
dd if=/dev/urandom of=./random.dat bs=1M count=2048
This probed data integrity at the level of 2e13 bits without seeing any bit errors. An exerpt from a recent email from Frank Wuerthwein:
Maxtor claims about their drives that they have less than 1 non-recoverable data error per 10E14bits read.
This suggests that that we would have to run our test for a few weeks to probe the validity of Maxtor's claims with our ASA server (or get more servers). .
Steve Heaton- Account Manager
SysKonnect, Inc.
1922 Zanker Rd.
San Jose, CA 95112
408-437-3840 Direct
408-437-3866 Fax
1-800-752-3334 Toll Free
E-Mail:
sheaton@syskonnect.com
URL :
http://www.syskonnect.com
To measure the bare performance of the Gigabit ethernet card, we
used very short C program which uses the socket level
function. The server program allocates a fixed size of memory to be
used as a buffer, opens the socket and waits until the client program
connects. Once a connection is established, the server sends the same
content of the buffer repeatedly to the client and disconnects
after fixed number of repetitions. On the other side of the
connection, the client just connects to the server, and receives
the data to the same buffer by overwriting it (without looking
into the contentso f the data) until the connection is closed by
the server. This transfer was timed to get a measure of the network
throughput
The only possible overhead except the copy between the kernel
buffer and the NIC buffer is the copy between the kernel buffer
and the user space buffer, but this step should exist in a usual
application, anyway.
The table below shows the memory-to-memory results for different
IP and TCP packet sizes. "in" and "cs" refer to the number of
interrupt and context switches issued to the client processor per
second (via "vmstat 5"), respectively. Interrupts are used
by device drivers to get the processor's attention to perform some
set of tasks. Context switches represent switching between user
and system level processing which affects both CPU latency and
load. Therefore, in a very loose sense, these provide a relative
measure of how efficient the client is at processing the incoming
packets (higher numbers mean that the processor is working harder,
particularly for context switching).
IP packet size(bytes) TCP packet size(bytes) in cs MB/s --------------------- ----------------------- ---- ---- ---- 1000 10 34k 34k 7 1000 100 41k 41k 70 1000 1000 58k 85 67 1000 10000 58k 100 67 1000 100000 56k 200 67 9000 10 32k 32k 8 9000 100 19k 13k 71 9000 1000 20k 15k 118 9000 10000 21k 15k 118 9000 100000 21k 15k 118
Notice that for reasonably sized TCP packets, we are only able to get near the full bandwidth of the gigabit link (118 MB/s or 940 Mbits/s) with so called "Jumbo Frames" (9000 byte IP packets). The downside is that Jumbo Frames are not currently part of the IP standard so that they are not guaranteed to be universally compliant with all networking hardware.
To investigate what might be the maximum speed of an NFS server, we used a similar program to the memory-to-memory case. The difference is that this server program directly opens a given local file, reads the file into the user space buffer, and sends the contents of this buffer to the client over TCP. The client side is unchanged from the memory-to-memory case. The results are shown below (ASA server):
IP packet size(bytes) TCP packet size(bytes) in cs MB/s --------------------- ----------------------- ---- ---- ---- 1500 100 38k 42k 30 1500 1000 42k 5k 65 1500 10000 39k 4k 46 1500 100000 33k 3k 44 9000 100 12k 11k 43 9000 1000 15k 14k 63 9000 10000 13k 10k 62 9000 100000 10k 7k 58
It should be noted that we these tests were done, we were only getting approximately 70 MB/s for local disk reads, rather than the 83 MB/s previously shown (see "Raid 5 - One/Two Controller Results") for several reasons not worth mentioning. The point is that its not clear whether the Disk-to-Memory throughput is limited by disk access or network bandwidth. This test should be repeated with faster disk access to investigate the limits of remote file transfer with this method.
In the current CAF model, the secondary data volume on a given
server is NFS exported to each of the worker nodes in the
farmlet. Therefore, we are ultimately interested in determining
the aggregate throughput for multiple clients reading data on the
server over NFS. The procedure for benchmarking this is indentical
to what was done for local I/O (i.e. use of disk_bench.pl),
except now that that the test is performed on the client using a
benchmarking directory which is an NFS mounted disk physically
residing on the server.
The results for vanilla NFS (2.4.3 kernel, no mount options,
standard 1500 byte frames) with the ASA server exporting
benchmarking data (two controllers) to a single dual Athelon
client is shown below:
Client Server #threads %CPU in cs %CPU in cs MB/s -------- ---- ---- ---- ---- ---- ---- ---- 1 34 10k 12k 50 10k 20k 34 2 24 5k 9k 31 6k 10k 42 4 27 6k 15k 31 8k 12k 35 8 32 6k 32k 30 9k 14k 27 16 25 5k 40k 16 6k 9k 23 32 31 4k 55k 10 2k 5k 11 60 67 3k 72k 8 3k 4k 7 100 98 1k 63k 5 2k 3k --
Notice that as the number of client read threads increases, the CPU load on the client becomes very large and the server CPU consequently decreases. One should keep this in mind when try to interpret these throughput scaling results, because in the actual farmlet configuration each client will only have to process incoming packets for a few (say, three or less) read threads. Therefore, we might expect the aggregate throughput scaling in a switched environment to be better than the results above if we are truely client CPU-limited. Of course, the relative server load will increase in this case, so we really need a more realistic network configuration to test the throughput scaling behavior over NFS.
We found in our CAF prototype tests that simultaneous clients (here, truly multiple clients connected to the server through a Cisco 2950 switch - FE client connection, Gigabit server connection) accessing data on the server would max out the throughput at ~45 MB/s with 100% server CPU utilization. It was noticed that the gigabit card was issuing ~1 CPU interrupt per IP packet sent along the PCI bus. The SysKonnect 9843 supports as feature called "dynamic interrupt moderation" (DIM) which seeks to reduce the server CPU load by grouping together interrupts so that one interrupt can handle several data packets. The open source kernel driver module in the 2.4.18 kernel (sk98lin) supports this feature, however the modules needs to be rebuilt after uncommenting out some of the source code (diff of source) There is one tunable parameter which is the number of interrupts issued by the SysKonnect card per second. So you get fewer interrupts per packet at the expense of increased packet latency (ie packet transfer into/out of the card is delayed because it accumulates packets before it transfers them).
It was found through studies of network throughput (over NFS) on both the ASA and Rackable servers that enabling DIM increases aggregate dd read throughput to simultaneous client by 5-10%, and decreases CPU utilization and interrupt rate as expected. This increase in throughput was fairly insensitive to the number of interrupts per second specified in the device driver over a range of 15000 to 1000 int/sec, however throughput decrease if this was made too small (200 int/sec was tested) due to increased packet latency.
The are several ways that one can boost NFS performance. A good
start to learning about NFS-related network settings is the
Linux NFS-HOWTO. This section discusses some of the attempts we
made at boosting the NFS read performance measured by the
disk_bench.pl benchmarking script. The test setup was as
above - direct Gigabit connection (SysKonnect) between ASA server
and dual Athelon client.
The first thing we tried was 9000 byte jumbo frames with NFS. The
results are shown below (same vanilla 2.4.3 kernel NFS except
jumbo frames):
#threads 1 cont 2 cont -------- ------ ------ 1 47 44 2 28 54 4 19 45 8 17 37 16 11 30 32 8 14 60 7 11 100 - 10
Notice that we see ~30-35% increase in throughput for 512 MB file reads over NFS.
At the time of writing, the 2.4 kernel series has client-side NFS
over TCP but only UDP server-side NFS. The UDP protocol has the
advantage of providing no load on the server when the connection
is not active. However, the disadvantage of UDP is that if a packet is
lost on large read/write all packets are retransmitted, rather
than just the lost packet as in TCP. Experimental patches to the
2.4.17 kernel provide server-side NFS over TCP functionality.
An important feature of NFS version 3 (available in 2.4 kernels)
is support for large NFS transfer buffers (up to 32k). The current
default size in 2.4 kernels in 8k. The same set of patches which
provide NFS over TCP for 2.4.17 also increase the maximum NFS
buffer size to 32k.
Another handle on network throughput is the socket input queue
size (also sometimes referred to as "window size") used by the
kernel. The default in 2.2 and 2.4 kernels is 64k to be used to
store requests while the kernel processes them. This queue size
can be changed through the /proc facility by
echo 262144 > /proc/sys/net/core/rmem_default echo 262144 > /proc/sys/net/core/rmem_max
which changes the default and maximum kernel input queue size to 256k. The results of trying out the above changes (with patched 2.4.17 kernel) on NFS performance is shown below:
NFS buf(kB) IP packet(bytes) Protocol Socket window(bytes) MB/s ----------- ---------------- ----- ----------------- ---- 8 1500 UDP default 34 32 1500 UDP default 47 8 9000 UDP default 47 8 9000 TCP default 35 32 9000 TCP default 56 32 9000 TCP 256k 56 32 9000 TCP 1M 55 32 9000 TCP 4M 56
Here are the full read thread scaling results from two controllers for the patched 2.4.17, 32k NFS buffers, jumbo frames, and UDP:
#threads MB/s -------- ---- 1 55 2 72 4 63 8 69 16 52 32 33 60 26 100 19
In summary, there are configuration handles which can be used to increase NFS performance. We openly admit that we have taken a sort of "phenomenalogical" approach to these NFS performance studies. Some additional care and intelligence should be put into our optimization process on the final network configuration and system usage to ensure that we've found a robust maximum. Our intent was simply to demonstrate handles on NFS performance enhancement, which I believe has been achieved.
Rootd
is a file server program which provides access to root files over
a network. This program can be run by superuser (official TCP port
number 1094) or privately (with port number larger than 2048). To
benchmark only the rootd connection, we wanted to avoid the
overhead of analyzing root file structure. Therefore we used the
ReadBuffer() member function of the TNetFile class, which reads
the file contents serially while ignoring all the structure.
For the rootd case, we found that reading big chunk of data at once
improves the performance because of the inherent limit of script
interpreter. The maximum read throughput we were able to get
reading a root file over rootd was 51 MB/s local I/O and 36 MB/s
over gigabit ethernet, basically consistent with the statement on
the Root webpage that rootd performance is comparable to NFS.
As previously indicated, maintaining adequate cooling of internal
components is a serious consideration in evaluation of server
units. This is particularly important for this unit due to
increased component density compared to the 4U ASA server and Polywell
server. As with the ASA and Polywell servers, we found the
Rackable server adequate in this regard. Ultimately, we are
interested in the long term stability of the unit's accurate
serving of data files to clients- this is discussed in the Data
Integrity part of this section.
As with the other servers, we used an IR gun to measure the
temperature profile of the disks after a long period of heavy
activity, as follows. Two separate infinite loops the disk
benchmarking utility bonnie were run overnight - one on
each Raid 5 array (3ware controller). The tests were stopped, the
machine shutdown, and the disks pulled out and scanned with the IR
gun (over the rotation point of the disk, mostly) to measure the
temperature profile (all this done as quickly as possible). The
disks varied in temperature from 82 to 85 deg F, similar to the
ASA and Polywell and reasonably well below the point at which one
might become concerned (roughly 90F as a general rule of
thumb). Of course, long term data stability tests in a realistic
operating environment is the ultimate test system cooling.
Here we test the data throughput and integrity for local disk reads, as was done for the ASA and Polywell servers.
The benchmarking script disk_bench.pl was run on the server
to test the local scaling of simultaneous read requests. The tests
were run "as is" in terms of system configuration (i.e. tested
system as configured by the vendor). Some characteristics of the
vendor configuration:
-- RedHat 7.2 (2.4.7-10smp kernel)
-- Two separate ext3 volumes (751GB and 902GB) used for
benchmarking, one for each 3ware controller
The disk_bench.pl results are shown below, with ASA and
Polywell server results previously shown included for reference:
Rackable server ASA server Polywell server #threads 1 cont 2 cont 1 cont 2 cont 1 cont 2 cont (MB/s) -------- ----- ------ ----- ------ ------ ------ 1 60 66 83 83 86 90 2 47 97 57 117 64 128 4 30 66 54 107 62 119 8 25 47 47 108 43 109 16 25 42 48 97 38 82 32 25 40 47 94 36 72 60 23 35 46 89 32 69 100 21 32 40 82 26 66
There are differences in both hardware and OS/software which make
comparison of the Rackable unit with the ASA/Polywell servers
difficult to interpret. These differences can be noted from the
descriptions provided on this web page. They include:
-- MB (SDS2 vs STL2)
-- Kernel (2.4.7 vs 2.4.2/3)
-- Benchmarked filesystem (ext3 vs ext2)
-- IDE disks + 3ware driver (160GB+beta vs 100GB+std)
This last one (disks+) is mostly likely to account for most of the
performance difference. The Maxtor 100GB drives are 7200 RPM with
2 ms average seek time whereas the 160GB drives are 5400 RPM with
12 ms average seek time. One might expect the difference in RPM's
to dominate for a single thread, while seek time and driver issues
become increasingly important as the number of simultaneous reads
increases. In fact, the Rackable server read throughput is 25%
worse than the ASA server for one read thread, which is consistent
with the 28% lower RPM for the 160GB Maxtor disks. The throughput decrease
for the Rackable server becomes more substantial as the number of
threads increases, becoming a factor of 2-3 at the highest
threading. The source of this increased discrepancy in not clear,
but may be an issue with the beta driver or increased disk
seeking.
In order to help understand the differences between the Rackable and ASA server performance, Rackable System sent us 13 Western Digital 100GB 7200 RPM drives (the same as in the ASA server). These drives replaced the Maxtor 160GB drives in the Rackable server for testing, and the following results were obtained:
Rackable server 160GB Maxtor 100GB WD ASA server #threads 1cont 2cont 1cont(6) 1cont(7) 2cont 1cont 2cont -------- ----- ----- -------- -------- ----- ----- ----- 1 60 66 58 45 58(6) 83 83 2 47 97 37 32 88 57 117 4 30 66 50 49 68 54 107 8 25 47 39 40 84 47 108 16 25 42 30 29 77 48 97 32 25 40 28 28 59 47 94 60 23 35 26 28 54 46 89 100 21 32 25 27 52 40 82
where the number in parentheses after 1cont refers to the number of disks attached to the controller (recall that the Rackable server has 13 disks attached to two 3ware controllers). The most evident difference between the 160GB and 100GB drive results is the better scaling when the 100GB drives are used. The Rackable server throughput for our read tests is still below the ASA server performance. A dump of the Raid array summary from the 3ware 3dm utility is shown for the ASA server and Rackable server.
Here we investigate the integrity of the data read locally using the md5_checks.pl script (see previous discussion on this page). With the Rackable server we have our most extensive test yet of data integrity - 15800 iterations of reading a 1 GB file from the IDE RAID arrays without any bit errors (nearly one week of running on both arrays). This corresponds to ~3e14 bits read without error. This tests the quality and stability of the unit's ability to server data (locally).
The benchmarking script disk_bench.pl was run on fcdfcode1 to test the local scaling of simultaneous read requests. The results are shown below:
#threads (MB/s) -------- ----- 1 47 2 48 4 56 8 56 16 61 32 60 60 58 100 56
Here we investigate the integrity of the data read locally using the md5_checks.pl script (see previous discussions on this page). With the disk array attached to fcdfcode1, we have successfully read an 8GB file >1100 times with no data bit errors. This correponds to a bit error rate (BER) of < 1/8e13.