General Information
The hardware acceleration server (HSS) will run on any Nvidia G8X based board. This includes boards in the GeForce 8800 line, the Quadro FX line, and the new Tesla hardware.
The maximum simulation size is limited by the host system's DRAM size, not the card's memory size.
The performance for large simulations depends primarily on how much memory the device has. Peak performance occurs when the entire simulation domain can fit on the board. Simulations that don’t fit in device memory are partitioned and time sliced onto the board.
We expect that performance to scale well to multiple cards. As a result, to get large simulations to run fast, we recommend getting a single large memory machine with multiple boards rather than multiple machines each with a single board.
If you can wait, we recommend getting the Tesla hardware which has more onboard memory (1.5GB per board vs. 768MB for GeForce 8800 and Quadro boards).The Tesla compute card (C870) and deskside (D870) are available now, and the Tesla 1U (S870) compute server is still in test and will be available in December 2007.
Power Requirements
The graphics cards require on the order of 175W of peak power each so you will want to have a large power supply. For a single card system, you’ll want a 600W supply, for a 2 card system, a 800W power supply. The cards plug into PCI Express x16 slots. Ideally you will want a motherboard with multiple PCIe x16 slots.
What’s Available Now
GeForce 8800 GTX and Ultra boards
Although the code will run on any of the 8x00 based boards, we recommended getting either the 8800 GTX or the 8800 Ultra which both have 768MB and are significantly faster than the 8700 series and lower. The 8800 Ultra boards are the same as the GTX boards but are clocked about 10% faster and cost about about $170 more than GTX.
8800 GTX boards: http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=2000380000+106791921+1067924921&name=GeForce+8800GTX
8800 Ultra boards: http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=2000380000+106791921+1067928976&name=GeForce+8800Ultra
Quadro FX Boards
HSS will run on the Quadro FX boards which support Direct X 10 or later (http://www.nvidia.com/object/IO_11761.html). As with the GeForce cards, you’ll want to get FX4600 or FX5600 (not available yet) to get at least 768MB of device memory.
We haven’t tested with these yet, but we expect these systems to have comparable performance to the equivalent GeForce models.
Tesla C870
The C870 is a card that plugs into a free PCI Express x16 slot. It requires about 200W (peak load) from your power supply. If you have an existing host computer you need to ensure that you have a free PCI Express x16 slot, and that the power supply can supply 200W power for each card. It is likely that the power supply can handle one card, but unlikely it can handle two cards - unless the system specifically specifies that it can (some systems for examples say: "dual SLI card capable").
Not only must the power supply be sufficiently large, but the cooling system (i.e. case fans) must be powerful enough to keep the system at a reasonable temperature.
If you are concerned about power supply and/or cooling system capabilities in your current system then you might want to consider the D870.
Tesla D870
The D870 is essentially two C870 cards in a box that sits outside the host computer. (The C870 cards go inside the computer).
D870 vs 2x C870:
The host system in both cases is a regular PC and should be a fast machine with a lot of memory i.e. 2.5GHz processor and 16GB of memory. For D870 the host system must have one PCI Express x16 slot available. For 2x C870 the host system must have two PCI Express x16 slots available. Each GPU requires 200W of power. For 2x C870 it is unlikely that the existing power supply and the cooling system in the host system could handle the extra power. The D870 only requires 10W of power from the host system. So....the main advantage of the D870 over 2x C870 is low power requirement from the host system.
The main disadvantage of the D870 is the price. (it costs 3 or 4 times more than 2x C870 cards). But cost is small compared to the cost of HSS.
Note: an HSS license is required for each GPU. So D870 would require two HSS licenses.
What We’ve Tested
eVGA 8800 GTX
- http://www.newegg.com/Product/Product.aspx?Item=N82E16814130072 which costs approximate $530, runs at 575MHz and has 768MB of DDR3 RAM.
- The AltPSM_Contacts sample with pitch set to 0.6 results in a 480x480x80 domain (64 domain cells + 16 PML cells in Z) that equates to a 663.5Mb domain size. This domains size fits entirely on the device and runs at 5.25 seconds/cycle which is 13.8X faster than the 72.4 seconds/cycle it takes the host system AMD Opteron 280.
- With the pitch set to 1.2, the domain is 960x960x80 cells that equates to a 2654.2 MB domain size. Here, the domain is partitioned into 4 parts which get swapped onto the single card. Although the domain size is only 4x bigger, the execution time is 8x slower at about 42.17 seconds/cycle because of the overhead of swapping. This is still 6.8X faster than the 289.7s/cycle of the host system.
2x C870 inside dual Opteron 280 box with 16GB DDR 400 memory
- about the same speed performance as eVGA 8800 GTX, but with more memory. So can run larger simulations entirely in the card memory.