Skip to Main Content

Nvidia Unveils Ultra-Efficient Kepler GPU

With performance per watt as its new rallying cry, Nvidia puts a fresh spin on its eternal rivalry with AMD for desktop—and especially laptop—graphics supremacy.

March 22, 2012

It's customary for Nvidia and AMD to take turns leapfrogging each other's frame-rate scores with cost-no-object graphics cards for hardcore gamers. Sure enough, Nvidia boasts that its new flagship graphics processing unit, the , blazes through the latest titles faster than AMD's top-of-the-line Radeon HD 7970.

That means four times the DirectX 11 tessellation performance; that its 256-bit link to 6Gbps GDDR5 is the fastest memory interface ever; and that its GPU Boost feature, akin to Intel processors' Turbo Boost, automatically ramps up clock speed from the standard 1GHz to a higher figure for games that have headroom beneath the GPU's power ceiling (even before overclockers get their hands on the card).

But when it comes to the new architecture, codenamed "Kepler," beneath the GeForce GTX 680, Nvidia says lightweight laptops and ultrabooks are as important targets as high-end desktops. Indeed, the company claims, the GTX 680 (or GK104, to use its in-house chip name) is not only the fastest but the most efficient GPU ever built, with twice the performance per watt of its predecessor. And the big show at Nvidia's press preview day March 8 in San Francisco wasn't some world premiere from the annual Game Developers Conference across town, but a demo of Epic Games' Samaritan shown at GDC a year earlier—except then it was running on an SLI rig with three cards drawing 732 watts of power and generating 2,500 BTUs of heat and 51 dBA of noise. This year it required a single GTX 680 drawing 195 watts and generating 660 BTUs and 46 dBA.

On the desktop, Nvidia says, the GeForce GTX 680 is so efficient that it requires just two 6-pin power connectors instead of the one 6- and one 8-pin connector of today's top-end cards. On the laptop, Kepler's doubled efficiency applies to several new mobile GPUs—the GeForce GT 640M and 650M and gaming-laptop GeForce GTX 660M—as well.

The new architecture is making its briefcase debut in the GeForce GT 640M found in Acer's Aspire Timeline Ultra M3-581TG (price TBA), a 4.4-pound slimline with a 15.6-inch screen, onboard optical drive, Intel Core i7-2637M processor, and 256GB solid-state drive. At 20mm thick, the Timeline Ultra meets Intel's ultrabook specifications—and at 30-plus frames per second, it delivers playable gaming on its 1,366-by-768 display with titles that stymie other ultrabooks' integrated graphics. Nvidia anticipates a crop of design wins for laptops using Intel's forthcoming "Ivy Bridge" processors and aiming to combine performance, portability, and battery life.

Under the Hood
The building block or stream multiprocessor (SM) of the GeForce GTX 580's "Fermi" architecture combined control logic with 32 processing cores (CUDA cores, in Nvidia's terminology). By contrast, Kepler's design revolves around what's dubbed an SMX, with control logic plus 192 cores—more than enough to offset the power-saving move of running at 1x graphics clock instead of 2x. The GeForce GTX 680 features eight SMX blocks (1,536 cores) running at 1,006MHz; 128 texture units; eight geometry units; four raster units; and 32 ROP units. Each SMX's four warp schedulers have been redesigned with a focus on power efficiency, with a simple software pre-decode replacing complex hardware dependency checks. A new design for the processor execution core also focused on performance per watt, eliminating shader clock (introduced in the "Tesla" architecture as an area optimization, less of an issue for the 28-nanometer-process Kepler).

Kepler also introduces some image-quality improvements. Bindless textures (currently supported only in OpenGL, not DirectX) mean that the shader can reference a texture directly in memory, choosing from over a million unique textures for a scene rather than being restricted to just 128 slots in a binding table. Adaptive VSync technology dynamically turns vertical sync on and off to display frames at a more regular cadence, reducing stuttering while preventing tearing. TXAA is a new film-style antialiasing technique (an alternative to Nvidia's own FXAA, itself a higher-performance alternative to familiar multisample antialiasing or MSAA) that uses a high-quality resolve filter to smooth jagged lines and edges; Nvidia claims it offers the smoothness of 8x MSAA with the performance hit of 2x MSAA.

The GTX 680 also supports single-GPU 3D Vision surround, supporting Nvidia's 3D Vision stereoscopic viewing on three monitors, and 4K (3,840 by 2,160 by 60Hz) resolution. The company's reference-design card features two DVI, one HDMI, and one DisplayPort 1.2 interface for four simultaneous displays.

On the mobile side, the portable Keplers support Nvidia's battery-saving Optimus technology that automatically switches the GPU on and off—flipping from discrete to integrated graphics, for example, when the user is doing office productivity work, then firing up the GPU for visually demanding applications such as games. The GeForce GT 640M and 650M are positioned as performance parts for mid- to high-end laptops, while the GeForce GTX 660M is supposed to enable a new class of thinner and lighter gaming laptops.

Indeed, you could even say that Kepler—though clearly designed to excel in desktop performance—marks a sea change in PC graphics, from bigger-louder-hotter, power-at-any-price designs to saner, quieter, and cooler products. Performance per watt is the new performance.