Posted on 13 September 2016
Did you know that the processing power of the Samsung Galaxy S6, is greater than the power available in a desktop PC or games console that was still made right up until 2013?
It's no secret that over the past 50 years, microprocessor technology has improved in leaps and bounds - but did you ever wonder exactly how different early processors were to those that we take for granted today?
Take an everyday item that we all use today, such as a mobile phone, to be precise a Samsung Galaxy S6. The processing power of an S6 is approximately equivalent to 5 x Sony PlayStation 2 consoles in terms of performance.
Figure 1 image taken from http://pages.experts-exchange.com/processing-power-compared/
It is quite unbelievable to think that something as small as a mobile phone could be 5 times more powerful than a top games console.
The rate of development is going at such a speed that none of us could have ever imagined that we'd be making the above comparison in such a short space of time.
The reason that this is possible is that the continuous enhancements in semiconductor manufacturing processes, enables chip manufacturers to shrink and squeeze ever more transistors per square inch than ever before.
Each shrink brings the ability to add more transistors per die, and / or to bring the power consumption of a processor down further and further.
The increase in the number of transistors that can be packed into a square inch has been predicted to double every two years by Gordon Moore, a co-founder of well-known processor and technology company. This prediction, although originally revised from doubling every 1 year to every 18 months or 2 years, has proved accurate for several decades and has been used to help guide long-term planning and set targets for research and development within the semiconductor industry.
The results of this increase in transistor density also bring about important advances in power consumption and performance per watt. As well as making CPU's more powerful, it is also just as important to make them more power efficient - especially when portability and battery power is essential to the product such as in the cases of tablets and mobile phones.
For example, the PlayStation 2 used up to 46 Watts comparing to the Galaxy S6 which although around 5 times faster uses approximately 6.8 Watts of power.
||Power Consumption (Watts)
||Performance Per Watt (GFLOPS/Watt)
|Samsung Galaxy S6
|Sony Playstation 2
That makes the Galaxy S6 a staggering 39 times more power efficient than a PS2
In the server market, both performance and efficiency are equally as important too.
The challenges of scientific applications demand the ultimate level of performance available to analyse and simulate scenarios with greater and greater accuracy as technology allows. Simultaneously and conversely, there is a limitation how much electricity and energy can be expelled in the operation and cooling of systems which are necessary for such activities.
It is essential for both industry and academia to be both environmentally conscious and economical with the purchase and running costs of the computing which they undertake.
In this market, the Intel Xeon processor is generally considered the go to device for general computing of all kinds, and has been for some time. It's evolved over time to become the backbone of IT in fields like datacentre, HPC and even embedded computing; so let's compare 2 different generations of the same product from to see how they differ.
||Turbo Frequency (max)
|Xeon E5-2699 v4
|Xeon E5-2690 v1
In the above table, we can see the two clock frequencies, base and turbo. These refer to the number of compute cycles per second which the processor can run through. The base is the typical operating frequency of the processor and is the one which the processor is generally advertised as having.
The turbo frequency is a mode that the CPU can use when it needs to deliver maximum performance to demanding tasks; but this can only be used for certain periods of time and when the CPU temperature allows for it. If the temperature rises too high during this mode, then the CPU will start to slow itself down until it returns to a lower temperature, at which point turbo mode can be used again. In most cases, turbo will work well with a small number of active cores, but as the number of active cores increases turbo potential diminishes. As a result, turbo works best for applications with fewer threads.
Now the above comparison shows that the Xeon E5-2690 v1 has both a faster clock speed and turbo frequency, however, this does not actually mean that the CPU is more powerful than the other, as there are other factors to take in to consideration other than raw cycles per second throughput.
The processor core count and the number of instructions per clock are very important and also need to be taken into consideration as we will explain.
The more cores and threads a CPU has means more separate calculations or threads can be run simultaneously, which typically results in better performance. Assuming your application can use multiple threads with 100% efficiency, a 2 core 1 GHz processor is as good as a single 2GHz processor in principle.
2 Cores x 1GHz = 2GHz or 1 Core x 2GHz = 2GHz - Simple!
Processor Threads are data paths to each CPU core. Using multiple threads per core allows multiple jobs to be run on a single core at the same time by optimising utilisation. If one job is waiting for memory access for example, the second job uses the CPU cycles which would otherwise be wasted.
Additional processor features and instructions are important too, for example the later v4 series processor includes AVX2 (Advanced Vector Extensions 2). AVX instructions improve an applications performance by processing large chunks of values at the same time instead of processing the values individually.
The result of AVX2 alone is that each core is able to handle double the number of instructions per cycle / clock of its predecessor, when using AVX2 optimised code.
To get a numerical representation of the theoretical maximum performance of a processor, we can use these numbers to make a calculation to use as a gauge. Coined GigaFLOPs or "Billions of Floating Point Operations per Second" this number gives us an absolute best case & maximum performance rating for the processor. Of course it is extremely unlikely we would ever achieve this in practice, but it helps to make comparisons between models.
The calculation is as follows:
Cycles per second (GHz) x instructions per clock x cores = Performance in GFLOPs
Intel Xeon E5-2699 v4
2.2GHz x 22 Cores x 16 Instructions per clock = 774.4 GFLOPs
774.4 GFLOPs / 145W TDP = 5.3 GFLOPs per Watt
Intel Xeon E5-2690 v1
2.9 GHz x 8 Cores x 8 Instructions per clock = 185.6 GFLOPs
185.6 GFLOPs / 135W TDP = 1.37 GFLOPs per Watt
Interestingly, you'll notice that the v4 processor is around 3 times more powerful and efficient than the v1 - in fact it's slightly more power efficient than the Samsung Galaxy S6 processor featured earlier in the article.
In the real world however, things are driven by application, so at Boston we cater for a wide range of applications with different combinations of cores, threads and power consumption.
Certain cases, such as high frequency trading require the highest possible clock frequency with less concern as to the number of cores. This is because they need to make trading decisions in milliseconds to get ahead of the competition, so nanoseconds really do count.
Boston's range of HFT optimised SKU's not only offer the highest frequency processors available from Intel today, but we also add a little extra through enterprise grade overclocking.
Equally, for HPC and scale out applications where more cores can make a difference, we have an optimised set of platforms in our Quattro family. Utilising our Boston labs facility, we've already done the hard work and found the combinations that deliver the best ratio of performance per dollar per watt.
If you are interested in finding out more about how Boston can help you find the right processor for your application, contact our friendly sales team today.