May 20th 2005 10:37 pm PT
(Part 2 of 4)
DETAILED ANALYSIS OF PERFORMANCE SPECIFICATIONS
The Xbox 360 processor was designed to give game developers the power that they actually need, in an easy to use form. The Cell processor has impressive streaming floating-point power that is of limited use for games.
The majority of game code is a mixture of integer, floating-point, and vector math, with lots of branches and random memory accesses. This code is best handled by a general purpose CPU with a cache, branch predictor, and vector unit.
The Cell’s seven DSPs (what Sony calls SPEs) have no cache, no direct access to memory, no branch predictor, and a different instruction set from the PS3’s main CPU. They are not designed for or efficient at general purpose computing. DSPs are not appropriate for game programming.
Xbox 360 has three general purpose CPU cores. The Cell processor has only one.
Xbox 360’s CPUs has vector processing power on each CPU core. Each Xbox 360 core has 128 vector registers per hardware thread, with a dot product instruction, and a shared 1-MB L2 cache. The Cell processor’s vector processing power is mostly on the seven DSPs.
Dot products are critical to games because they are used in 3D math to calculate vector lengths, projections, transformations, and more. The Xbox 360 CPU has a dot product instruction, where other CPUs such as Cell must emulate dot product using multiple instructions.
Cell’s streaming floating-point work is done on its seven DSP processors. Since geometry processing is moved to the GPU, the need for streaming floating-point work and other DSP style programming in games has dropped dramatically.
Just like with the PS2’s Emotion Engine, with its missing L2 cache, the Cell is designed for a type of game programming that accounts for a minor percentage of processing time.
Sony’s CPU is ideal for an environment where 12.5% of the work is general-purpose computing and 87.5% of the work is DSP calculations. That sort of mix makes sense for video playback or networked waveform analysis, but not for games. In fact, when analyzing real games one finds almost the opposite distribution of general purpose computing and DSP calculation requirements. A relatively small percentage of instructions are actually floating point. Of those instructions which are floating-point, very few involve processing continuous streams of numbers. Instead they are used in tasks like AI and path-finding, which require random access to memory and frequent branches, which the DSPs are ill-suited to.
Based on measurements of running next generation games, only ~10-30% of the instructions executed are floating point. The remainders of the instructions are load, store, integer, branch, etc. Even fewer of the instructions executed are streaming floating point?”probably ~5-10%. Cell is optimized for streaming floating-point, with 87.5% of its cores good for streaming floating-point and nothing else.
I have closed comments on this series of posts, except part 4 in order to keep the discussion around this in one area. You can comment here