Sabtu, 10 April 2010

Advantages

Advantages

According to an AMD-demonstrated system [14] with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on Microsoft Windows XP Professional, 1 teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad 3.0 GHz processor can achieve up to 48 GFLOPS.

Recent demonstrations showed that, in Kaspersky SafeStream anti-virus scanning tests optimized for AMD stream processors, the system with two AMD stream processors with dual Opteron processors spotted 6.2 Gbit/s (775 MiB/s) bandwidth, 21 times faster when compared to other dual-processor systems. The stream processor systems also showed only 1-2% CPU utilization, which indicates significant offloading from the CPU to the stream processor [15].
[edit] Limitations

* Compared, for example, to traditional floating point accelerators such as the 64-bit floating point (FP64) CSX600 math processor from ClearSpeed that is used in today's supercomputers, current and older GPUs from ATI (and NVIDIA) are running on 32-bit processors with only single-precision data capabilities.[16]
o Instead of the 64-bit double-precision capability of supercomputers [17], the second generation of stream processors (the AMD FireStream 9170) is able to handle double-precision data. This is a result of FP32 filtering support contained as part of the requirements of the DirectX 10.1 API. However, the double precision operations (frequently used in supercomputer benchmarks) can achieve only half of the performance in theory compared to single precision operations, the actual figures may be lower, as the GPU do not have full double-precision units implemented.
* Recursive functions are not supported in Brook+ because all function calls are inlined at compile time. Using CAL, functions (recursive or otherwise) are supported to 32 levels.[18]
* Only bilinear texture filtering is supported; mipmapped textures and anisotropic filtering are not supported at this time.
* Various deviations from the IEEE 754 standard. Denormal numbers and signaling NaNs are not supported; the rounding mode cannot be changed, and the precision of division/square root is slightly lower than single-precision.
* Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
* Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant.
* Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
* The bus bandwidth and latency between the CPU and the GPU may become a bottleneck, which may be alleviated in the future by introducing interconnects with higher bandwidth.

Tidak ada komentar:

Posting Komentar