Benchmark outcome 4

Across the last 2 years, there’s been a lot of headlines around manufacturers cheating on benchmarks. By this, those who make those accusations claim that – by detecting a benchmarking app is running, and amping up to full performance – manufacturers are somehow gaming the system.

However, unless a manufacturer actually causes the benchmark app to output a result that isn’t what was genuinely measured, is it really cheating?

Are there rules somewhere saying that they’re breaching the rules, or is this because the manufacturer has thought outside the box to achieve the highest possible score on the benchmark?

What’s happening and why do people think they’re cheating?

A benchmarking program runs tests against the hardware and importantly for this discussion, against a standard. In the case of mobile phone technology, the standard is essentially theoretical maximum capabilities  of the hardware under test. This means throughput of the CPU and the GPU capabilities to process instructions, make calculations and display images.

What an increasing number (probably now the majority) of manufacturers are doing is -through their software – recognising that benchmarking software is running and as a result the “need” for improved performance and maxing out all of their performance parts ready for the test.

The simple explanation is that this behaviour is being recognised by many as cheating because the performance you see during benchmarks isn’t reflective of the usual performance or “user experience” during daily use.

If the hardware ran at its full capability all the time, you’d have deplorable battery life and probably sustain some fairly significant burns from using your devices on a regular basis. This is why they’re throttled back to a level where the user experience isn’t affected, but reduces heat and saves battery.

Looking for manufacturers to not run their hardware flat out in benchmarking would be like asking a car manufacturer to only use gears 1 – 3 on a test track, then ask them to do the fastest possible lap time. They’ll give you the best result possible under the circumstances, but it won’t be indicative of what the car can actually do, without artificial restriction.

Benchmarks have been irrelevant for years

This is far from new behaviour, in fact it’s been going on for years now. There was discussion around the same topic back in 2013 when Samsung were first caught out, shortly after we wrote Benchmarks are meaningless when manufacturers game the system.

There have been multiple other articles from Ausdroid and other publications since then also. Generally speaking, the them is that increasing the performance for benchmarks is misleading at best, definitely deceptive and probably qualifies as cheating.

Huawei has directly spoken out about the issue, essentially saying “everyone else is doing it, it’s not cheating and we’d lose out if we didn’t”. The Register has a great write up on this statement from the manufacturer.

Mobile phones have come a very long way from the old candy bar phones, to the first generation of Smartphones and now the flagships we see. The mobile phone you have in your pocket now has higher computing power than the entire network that sent Apollo 13 into space and (somewhat more importantly) got them home safely. The performance personal mobile devices are capable of now is higher than a person can realistically expect to use at any given moment in normal operation.

Many readers would also have noticed that the vast majority of tech sites have long since stopped using benchmarks since they’re not reflective of user experience. Some still run them, but they rarely feature more than a fleeting mention in reviews. There is still some minor value in them as they show whether the CPU in one phone which could theoretically be the same CPU, gives the same throughput on the next device but that’s about the limit.

It’s not cheating, it’s benchmarking!

With the growing irrelevance of benchmarks in terms of reviews, benchmarking has steadily become more about measuring the potential performance peaks rather than “expected performance” or “normal user experience” which these days is very acceptable for the vast majority of users on mid range devices – so why are people still calling it cheating?

Honestly, I don’t see why people are so upset. If I was benchmarking my PC and had software throttling the performance I’d be upset because it’s about getting the maximum score possible.

You can’t (legally at least) use the full performance of your car on public road, the manufacturer will do it at a closed (controlled environment) test track so they and you know the potential. Power output and often the resulting 0 – 100 kph time will be listed, those are measured in perfect performance conditions.

So why the uproar?

What the manufacturers are doing is removing the artificial (software based) limitations they place on their GPU and CPU to allow them to their full capacity for the period that the test is run. Much the same way that many of them increase performance when you’re running a game on your phone. It’s demand and supply, you (or your software) needs more performance and the phone supplies it.

When you’re benchmarking electronic hardware, phones, tablets, speakers or otherwise – whether you’re going to utilize that peak capability or not, you want to see it. Let the manufacturers optimize the performance for these apps, it gives a line in the sand for everyone else to aspire to — so please stop calling it cheating – it’s a way for the manufacturers to show you the potential peak of performance for their new devices and grab some bragging rights.

Benchmarks haven’t been relevant to the mobile market for years with most phones running relatively smoothly on a day to day basis, so why the big deal over benchmark optimisations?

    5 Comments
    newest
    oldest
    Inline Feedbacks
    View all comments
    Gordon

    I think you’re forgetting what the purpose of benchmarks are… they are an attempt to measure *relative* performance given the same set of tasks. When a manufacturer “games” the benchmark to produce a result which is not indicative of a user can expect experience, then they are effectively *lying* to us. Which is NOT okay, no matter how many manufacturers do it. Whenever any manufacturer is caught doing this, they should be named and shamed – zero tolerance. Do many manufacturers do this? Probably. Does that mean benchmark results shouldn’t be trusted? Sadly, yes. Does this mean we just accept… Read more »

    Dennis Bareis

    Rubbish, cheating if it doesn’t reflect normal usage, otherwise the benchmark result is useless. If they made all apps perform quickly by default and showed you what to do for apps that heat up the device then that wouldn’t be cheating.

    Max Luong

    I think the perfect solution would be to try to run real-world tests alongside battery life results. Performance is fine, but it’s only half of the equation.

    Daniel Narbett

    Yeah but if you can ONLY get that performance when the benchmark software is running and NOT have it available at times of peak/high need actual use then it’s definitely cheating. Otherwise it’s like saying Volkswagen were fine with their emissions defeat devices because, after all, the emissions DID in fact drop during compliance tests

    Oldmike

    I look art benchmarks , but take them with a grain of salt , I’d bet if you took the same model and specced devices from 100 owners and ran the same benchmark you would get different scores on nearly every one . I am happy to be proven wrong , but that is my belief . In the past i have watched a benchmark done on a particular device the same model and type as my own and run the very same benchmark and got a different score. . The conclusion i came to is these are not 100… Read more »