From ktpedre at sandia.gov Tue Feb 5 15:29:25 2008 From: ktpedre at sandia.gov (Kevin Pedretti) Date: Tue Feb 5 15:27:50 2008 Subject: [Mantevo-users] Re: hpccg scaling on barcelona In-Reply-To: <49701FE4-6756-4670-9BD1-0EC1B7BD6E0C@sandia.gov> References: <1198114935.19448.124.camel@hawkeye.sandia.gov> <49701FE4-6756-4670-9BD1-0EC1B7BD6E0C@sandia.gov> Message-ID: <1202250565.27561.404.camel@hawkeye.sandia.gov> The attached spreadsheet contains updated Barcelona HPCCG scaling results. The falloff at 8 cores for the 100x100x100 case that I originally observed goes away when using memory affinity, as Doug also found. By default, Linux will try to allocate *free* memory on the local socket and fall back to a different socket if that doesn't work. Memory used by the Linux file cache is not considered free memory. The 'numactl --hardware' command can be used to find out how much memory is available on each socket (node = socket for AMD platforms): [ktpedre@barcelona HPCCG-0.3]$ numactl --hardware available: 2 nodes (0-1) node 0 size: 4095 MB node 0 free: 1564 MB node 1 size: 4096 MB node 1 free: 3867 MB node distances: node 0 1 0: 10 20 1: 20 10 It's easy to reproduce the poor HPCCG scaling by using 'dd' to artificially inflate the file cache and then run HPCCG without memory affinity. [ktpedre@barcelona HPCCG-0.3]$ dd if=/dev/zero of=disk3.img count=1024 bs=1024k 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 7.48887 s, 143 MB/s [ktpedre@barcelona HPCCG-0.3]$ numactl --hardware available: 2 nodes (0-1) node 0 size: 4095 MB node 0 free: 578 MB node 1 size: 4096 MB node 1 free: 3781 MB node distances: node 0 1 0: 10 20 1: 20 10 Kevin On Thu, 2007-12-20 at 17:53 -0700, Douglas Doerfler wrote: > Kevin, > > > I've been able to verify your hypothesis that the hpccg results for > the 100^3 problem are due to memory affinity issues. Brian showed me > how to control processor and memory affinity using numactl and > OpenMPI. I think you'll find this very useful. I put a copy of the > script I used at /opt/ompi-numactl.sh on barcelona (look for usage in > the script). Once I was able to control affinity, here's my results > (these are for 75^^3, I wanted to be consistent with my other platform > tests for comparison purposes): > > > > > Total(MF) > DDOT(MF) > WAXPBY(MF) > SPARSEMV(MF) > 1 > 450.7 > 324.8 > 284.7 > 497.3 > 2 > 900.7 > 741.2 > 644.6 > 961.6 > 4 > 1557.5 > 1587.6 > 1051.8 > 1649.4 > 8 > 1986.1 > 2252.6 > 1267.9 > 2114.9 > > > Speedup: > > Total(MF) > DDOT(MF) > WAXPBY(MF) > SPARSEMV(MF) > 1 > 1.0 > 1.0 > 1.0 > 1.0 > 2 > 2.0 > 2.3 > 2.3 > 1.9 > 4 > 3.5 > 4.9 > 3.7 > 3.3 > 8 > 4.4 > 6.9 > 4.5 > 4.3 > > > > I'm also getting a little better absolute performance than what you > show below? I'm compiling with pgi mpiCC -fastsse. > > > Doug > > > On Dec 19, 2007, at 6:42 PM, Kevin Pedretti wrote: > > > Here are some preliminary scaling numbers for HPCCG on the new > > dual-socket quad-core AMD Barcelona system. Same socket means all > > ranks > > ran on the same quad-core, diff sockets means the ranks were split > > evenly across the two sockets. Each socket has its own memory > > controller, so more sockets = more bandwidth. > > > > > > 75x75x75 > > #cores same socket diff sockets > > 1 398 N/A > > 2 767 865 > > 4 997 1492 > > 8 N/A 1953 > > > > > > > > > > 100x100x100 > > #cores same socket diff socket > > 1 486 N/A > > 2 838 946 > > 4 1023 1632 > > 8 N/A 1226 > > > > > > > > > > The 100x100x100 8 core results are odd. The node has 8 GB of memory > > and > > the test only uses ~4 GB of memory, so there should be no swapping > > and > > there doesn't appear to be any disk I/O while the job is running. I > > think this is a funny Linux memory allocation issue where the ranks > > running on socket 0 end up using memory on socket 1 because Linux > > has a > > bunch of memory on socket 0 tied up with the file system cache. I'm > > sure there's a way to tell Linux to shrink its caches, but I don't > > know > > what it is. "numactl --hardware" while the node is idle supports > > this > > theory: > > > > > > [ktpedre@barcelona HPCCG-0.3]$ numactl --hardware > > available: 2 nodes (0-1) > > node 0 size: 4095 MB > > node 0 free: 826 MB > > node 1 size: 4096 MB > > node 1 free: 3141 MB > > node distances: > > node 0 1 > > 0: 10 20 > > 1: 20 10 > > > > > > Kevin > > > > > > > > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: results.xls Type: application/vnd.ms-excel Size: 25088 bytes Desc: not available Url : http://software.sandia.gov/pipermail/mantevo-users/attachments/20080205/03b36e65/results-0001.xls