It's that time of year again. A time I always dread, when familiar hardware and disk images are replaced with the frightening introduction of a completely new beast. New motherboards, new CPUs, new hard drives, and new and scary drivers. I never know what's going to happen. Earlier this year we butted heads with Skulltrail and eventually lost going back to what we were using before. Hopefully this time around will be a bit different.
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
33 Comments
View All Comments
Stolf2012 - Thursday, October 30, 2008 - link
I agree with chizow, I would love to see you guys follow through with your Vista sp1 vs xp sp3. A couple comparative screen shots showing the difference between dx9 and 10 would be great to include, for users to decide weather the performance hit with dx10 is worth it.Looking forward to it,
Bill.
ebayne - Thursday, October 30, 2008 - link
AoC is a pig. It brings systems to their tiny little knees. I think it would be a worthwhile addition to the test suite..its the equivalent to Crysis '05.aguilpa1 - Thursday, October 30, 2008 - link
Is there any where were we could download the recorded game scripts to your new games so we could test our own systems against your testbed for comparison?Zar0n - Thursday, October 30, 2008 - link
The problem is not the performance but consistency.Take this article for ex:
http://www.behardware.com/articles/731-1/ssd-produ...">http://www.behardware.com/articles/731-...samsung-...
Several sites are reporting that the Intel SSD differs in performance depending on the type of data, and also takes some time to get stable performance after some types of write operations.
So use a fast hard drive like a WD raptor or a more "reliable" SSD like a samsung with SLC Nand.
lyeoh - Wednesday, October 29, 2008 - link
(sorry for dupes if any, my post doesn't seem to be showing up)I personally regard 3D Mark as a meaningless test and a waste of time except to overclockers who mainly use their computers to run 3D Mark, superpi, etc.
The time it takes to run a 3D Mark test might as well be used to run a benchmark of a real application/game.
Something like a flight simulator benchmark would definitely be more meaningful than 3D Mark. I believe there have only been a very few flight sim games released in the past few years, that could be a plus or minus depending on how you view it. I personally don't care :).
In fact 2D performance tests might be more useful to me - some 3D cards don't have as good 2D performance as others.
What I find annoying with some other benchmark sites is they only test resolutions like 2560 x 1600. Yes that's useful to test the really high end, but many people are still using 1280x1024 and 1680x1050. That's one of the reasons why I prefer Anandtech :).
Buying a bigger display is a lot of money- the display costs more and you need to spend tons more on graphic cards just to drive that display at a decent frame rate.
Regarding minimum frame rates - if frame rate graphs are not possible, posting minimum and maximum frame rates would be good (averaged over X seconds minimum).
Then there's SLI. I've heard that for some SLI stuff, the interframe delay going from card #1 to card #2 could be different from card #2 to card #1. Say the average frame rate is 60 fps. So on average there's 16ms between frames. However in theory card #2 could be producing a frame 2 milliseconds after card #1, and then nothing happens for 30ms. So the actual perceived display is not quite as smooth as the numbers might have you believe - it might appear closer to 30fps, or "jittery".
Last but not least, if possible try to measure _latency_ as well. e.g. measure the time it takes for mouse button down and/or key down to the action being displayed on the screen. A video card or video driver that produces higher frame rates but adds a lag of 50 milliseconds will be bad for most games where frames per second count. Testing latency should make for an interesting article. If you find that in general the latency is insignificant - say lower than 10ms, you can leave it out of the standard benchmarks and only do latency comparison tests for things like some fancy new tech wireless mouse.
blw37 - Wednesday, October 29, 2008 - link
The problem I have with testing low and mid-end cards on a high end system is it provides no information about the tradeoffs that you are actually making on the type of system that uses such cards. Say I have an E6550 and 8600GT at the moment and have a limited budget for upgrading. I want to know if upgrading the GPU or CPU provides the best payoffs. There is no point testing a GTX280 or 4870X2 on a low to mid-end system, but equally there is no point testing a 4350 with a QX9700.KeithP - Tuesday, October 28, 2008 - link
I understand why a high end system is being used to test video cards, but as someone looking for information on what to buy in the mid range to low end class I find it extremely frustrating.AT really needs two test beds, a lower spec system and the high end system. Seeing relative differences in the high spec system is useless because it still doesn't tell me if the frame rates I will be seeing will be playable.
I would think something in the 2-2.2 GHz range with 2GB of RAM and Win XP would be good.
-KeithP
crimson117 - Tuesday, October 28, 2008 - link
He explained that already...He can't possibly try every CPU with the new GPU, so he asks the question "When it's not limited by CPU, how well can this new graphics card perform?"
It'd be pretty boring to see games top out at 1024x768 with nVidia 280 SLI just because a slow CPU couldn't keep up.
cool - Tuesday, October 28, 2008 - link
"un-Vistaing Vista"?Just install XP32/64 and you can save yourself all the trouble. Who plays/cares about DX10 games anyway?
JimmiG - Tuesday, October 28, 2008 - link
All that talk about un-Vistaing Vista only applies if you want to run benchmarks and want consistent results. I find that most of the background programs and services in Vista are actually useful for me on a day to day basis. If you're really worried about getting 41.6 FPS one time and 42.8 FPS the next, if you were to play through the exact same level in the same game in exactly the same way twice, then by all means disable all the auto tuning features so the OS slows down over time and disable indexing to make file searches take 10 minutes instead of 2 seconds and turn off firewall, defender, updates etc. so your system is overrun by viruses and malware the minute you go online.Back when my machine featured a K6-2 CPU, 6GB of harddrive space and 128MB of RAM, I felt I had to do everything in my power to make it run faster. Now I've got terabytes of storage space and gigabytes of RAM - I no longer feel the need to strip my gaming system to the bare minimum just to save 15MB of RAM, free up 200MB of harddrive space and in the end gain 0.5 FPS, losing in the process much of the functionality that sets a modern PC apart from a Win98 machine.
So yes, I can understand why Vista would be frustrating if you run a hardware site and want to run benchmarks.. but those same features make it easier, smoother and faster to use for regular PC users on a daily basis. Instead of using the computing resources to produce even higher FPS numbers and 3DMarks, you're using them to make the PC experience more enjoyable. Maybe Microsoft is working on Windows 7 - Benchmark Edition for those who don't want the background services and auto tuning :)