How Much Performance Do You Need

In my last post, How Much Performance Do You Really Need? Part 1, we took a look at some of the technologies affecting CPU performance. Now, we’ll take a look at matching the right features to an application. So what’s best for your application?

The short answer that nobody likes? It depends.

If your application isn’t coded to take advantage of multiple CPU cores, paying extra for a quad-core CPU versus a dual-core doesn’t make any sense. If, however, your application can utilize more than two cores, a quad-core with lower clock speed might make more sense than a dual-core at a higher clock speed.

This is a very common question from digital surveillance customers, who think they need a quad-core Ivy Bridge CPU to process multiple video streams. The problem is that there are simply too many variables within each customer’s application. If they are using cameras that compress the image data before transmitting it, and they are simply recording to a HDD without displaying the data, a Celeron-class CPU is most likely perfectly capable.

If, however, the cameras transmit raw data, or it’s compressed but needs to be decompressed and displayed while being recorded, or if there is other software, such as edge detection or license plate reading, running at the same time, then an i5 or i7 will probably be necessary. Frame rate and resolution of the cameras also plays a large role in determining CPU requirements.

Example 1: Machine Vision/Digital Surveillance

A customer wants to use a NUVO-1300af as part of a manufacturing control system.

One option is to use four Basler ACA2500-14gm cameras connected via PoE. These are 5 MP cameras without compression; at 14 fps, they are each transmitting 70 MB/s. Testing from our friends at Neousys show the CPU utilization to be around 35% for an i5-520M.

If, instead of the Basler cameras, two HD cameras with compression are being used for digital surveillance recording, then the data stream would be much lower, and the CPU usage will be lower. But, if those streams need to be displayed rather than simply stored, the same i5-520M would be running at 25%-30%. This is quite significant, considering it is half the number of cameras.

Add to this the codec and software variables, and you can see why this isn’t such a black-and-white situation. What about more advanced image processing with edge detection or optical character recognition? Will 65% CPU overhead be enough to run proprietary software to detect the presence of a vehicle in a tow zone or to read license plates and reference a database while a police officer drives around? Hard to say; our Technical Sales team can help you get a reasonable idea of what you need, but in the end it just needs to be tested. The beauty of a system based on a socketed CPU is that it’s relatively easy to upgrade in the future if needed.

CPU versus GPU

Software coded to take advantage of General Purpose GPU (GPGPU) or heterogeneous computing may be able to run just fine on a T56N, or even a T40E, if the GPU requirements are moderate and the CPU requirements are low. This is often true for multimedia and some image processing applications.

Example 2: Digital Signage

We have several digital signage customers whose applications usually feature a scrolling news marquee, a rotating static image, and a video section.

Because most of the software was written several years ago, before HTML5 and Flash supported offloading to the GPU, these applications are CPU-intensive; an Atom-based system is out of the question, and these customers are forced to use more expensive Celeron or Core 2 Duo-based systems for their application.

An alternative approach to paying more per unit for higher CPU performance would be optimize their software for one of the new AMD APUs. This might require them to re-write their software with OpenCL and update to a modern browser that supports HTML5, CSS3.0, and can offload Flash and multimedia playback to the GPU. I don’t know what the up-front cost of this optimization would really be, but since it could save $100-$400 per system versus a Celeron/Core 2 Duo system, I think it’s an important option to evaluate. Depending on the number of units deployed, I would think that the potential per-unit savings would justify the cost of some CS interns for the summer explore the amount of work involved.

But, you’ve still got to test it to know for sure.

Let us know how you estimate performance requirements, unexpected variables that have popped up, and what you’ve done to optimize the entire ecosystem to reduce performance needs and cost, in the comments section below.