I thought I would give this subject a topic in the forums instead of continuing the discussion in the comments section of the article. The following quote is from Winter in the comments thread on the front page.
How is it that every other game company has no problems making the NVidia hardware work right? It sounds like Carmack had no problems getting relatively equal performance out of the ATI and NVIdia cards
Well the short answer to this is every game up until Half Life 2 has been either OpenGL or at most DX8 compliant. Again the same is true with Carmack and Doom3, he is using OpenGL. Since OpenGL 2.0 has not been officialy released yet, Carmack is doing all of his shader coding in assembly. The whole point of DX9 is to give programmers a unified higher level API to write shader code with. Beyond that DirectX 9 allows more instructions in a shader program as well as specifies higher color precision.
Of course Valve did mention that they coded a NVidia specific path for the game. According to Valve this mixed mode path uses 16 bit precision wherever possible without sacrificing image quality], but they also say this specialized codepath took them five times as long as the general DX9 codepath.
Lets not forget this isn't the first sign of bad DX9 performance in Nvidia's latest $400 mega-card. There was of course the whole 3DMark 2003 fiasco as well as John Carmack's plan file which said the following:
The NV30 runs the ARB2 path MUCH slower than the NV30 path.
Half the speed at the moment. This is unfortunate, because when you do an
exact, apples-to-apples comparison using exactly the same API, the R300 looks
twice as fast, but when you use the vendor-specific paths, the NV30 wins.
The reason for this is that ATI does everything at high precision all the
time, while Nvidia internally supports three different precisions with
different performances. To make it even more complicated, the exact
precision that ATI uses is in between the floating point precisions offered by
Nvidia, so when Nvidia runs fragment programs, they are at a higher precision
than ATI's, which is some justification for the slower speed. Nvidia assures
me that there is a lot of room for improving the fragment program performance
with improved driver compiler technology.
Of course this explains why the mixed mode path in HL2 makes the Nvidia card perform better, later in Carmack's plan file he says this.
For developers doing forward looking work, there is a different tradeoff --
the NV30 runs fragment programs much slower, but it has a huge maximum
instruction count. I have bumped into program limits on the R300 already.
Which also indicates a general performance lead in shader program execution for ATI. With three different sources telling us the ATI cards are going to run general code faster than the latest Nvidia, I suspect Nvidia may be the party at fault here, not Valve.
Further lets go with a logical argument. Would it really make sense for Valve to alienate the huge installed base of Nvidia card users if they didn't think this was a major issue? No it wouldn't, the thinking here is that if users are going to upgrade to a new DX9 part for HL2 then Valve wants to let you know which card is going to be the best buy.