Mali graphics accelerators characteristics comparison of models.
Mali GPUs are the intellectual property of ARM Limited and are the graphical part of the ARM microprocessor architecture used in the production of SoC chips, well known as Snapdragon, Helio, MT, Exynos, Kirin mobile processors.
ARM Limited develops the processor architecture, ]MediaTek[/anchor], Qualcomm, HiSilicon, Samsung - buy a license with the corresponding documentation. They make their own adjustments - choose the class, generation, number of cores and CPU frequency, change the final characteristics of the Mali GPU: the number of computational units and operating frequency. Qualcomm products occupy a special place; in the production of mobile processors, they use graphics accelerators of their own design - Adreno.
To make things easier to understand, the main building material of GPUs is computational units, which are designated in processor specifications as MP (Mali T830 MP2, for example). The number of these “smart” bricks, in tandem with the frequency, determines the performance of the video accelerator. In the Mali characteristics table, you can observe the GFLOPS indicators (number of operations per second, with a comma), the first value is the peak performance of 1 block at the minimum frequency, the second value is the theoretical potential when all blocks operate at the maximum frequency.
Frequency indicators, dimensions, heat dissipation, energy efficiency largely depend on the applied manufacturing standards (nm), double values in this column indicate that the chips are or were produced using different standards, at the discretion of the manufacturer.
In theory, everything is extremely simple - the newer the architecture, the more blocks and the higher the frequency, the higher the performance. In practice, we will never see the maximum number of computing units of Mali graphics accelerators in smartphones; this is the lot of mini-PCs and laptops that have a small cooling system. To date, the maximum recorded for Mali-G72 is MP18 (18 units) - Samsung Exynos 9810 mobile processor; for comparison, Mali-G71 is MP20 - Samsung Exynos 8895. A new HiSilicon Kirin 980 chipset with new generation Mali-G76 MP10 graphics is planned, The performance promises to be higher than the Adreno 630 graphics used in the Snapdragon 845.
To complete the picture, I recommend using the following information - the performance rating of mobile processors, where you can compare the theoretical capabilities of Mali declared by the ARM developer with the characteristics and performance of manufactured processors.
Professional advice on how to choose the right smartphone will help you organize your thoughts and put together the missing puzzles. Video processors Mali characteristics:
Model | nm | Calculate blocks | Frequency | GFLOPS | OpenGL | DirectX | Vulkan |
Mali-G76 | 7 | from 4 to 20 | 750 MHz | — | 3.2 | 12 | 1.1 |
Mali-G72 | 10 16 | from 1 to 32 | 546-850 MHz | from 18.6 to 924.8 | 3.2 | 12 | 1.1 |
Mali-G71 | 10 16 | from 1 to 32 | 546-1037 MHz | from 18.6 to 1128 | 3.2 | 11 | 1.1 |
Mali-G52 | 16 | from 1 to 4 | 850 MHz | from 86.7 to 346.8 | 3.2 | 11 | 1.1 |
Mali-G51 | 10 28 | 1 uni-pixel 3 dual-pixel | 650 MHz | — | 3.2 | 11 | 1.1 |
Mali-G31 | 28 | from 1 to 2 | 650 MHz | — | 3.2 | 11 | 1.0 |
Mali-T880 | 28 | from 1 to 16 | 650-1000 MHz | from 22.1 to 544 | 3.2 | 11 | 1.0 |
Mali-T860 | 28 | from 1 to 16 | 350-700 MHz | from 11.9 to 380.8 | 3.2 | 11 | 1.0 |
Mali-T830 | 28 | from 1 to 4 | 600-950 MHz | from 20.4 to 129.2 | 3.2 | 11 | 1.0 |
Mali-T820 | 28 | from 1 to 4 | 600 MHz | from 20.4 to 81.6 | 3.2 | 11 | 1.0 |
Mali-T760 | 28 | from 1 to 16 | 600-772 MHz | from 20.4 to 420 | 3.2 | 11 | 1.0 |
Mali-T720 | 28 | from 1 to 8 | 400-700 MHz | from 6.8 to 95.2 | 3.2 | 11 | 1.0 |
Mali-T628 | 28 32 | from 1 to 8 | 533-695 MHz | from 17 to 177.9 | 3.1 | 11 | — |
Mali-T624 | 28 32 | from 1 to 4 | 533-600 MHz | from 17 to 76.8 | 3.1 | 11 | — |
Mali-T622 | 28 32 | from 1 to 2 | 533 MHz | from 17 to 34.1 | 3.1 | 11 | — |
Mali-T604 | 28 32 | from 1 to 4 | 533 MHz | from 17 to 68.2 | 3.1 | 11 | — |
Mali-470 | 28 40 | from 1 to 4 | 250-650 MHz | — | 2.0 | — | — |
Mali-450 | 28 40 | from 1 to 8 | 300-750 MHz | from 4.5 to 71.7 | 2.0 | — | — |
Mali-400 | 28 40 | from 1 to 4 | 200-600 MHz | from 1.8 to 19.2 | 2.0 | — | — |
Mali-300 | 28 40 | 1 | 500 MHz | 5 | 2.0 | — | — |
Model | nm | Calculate blocks | Frequency | GFLOPS | OpenGL | DirectX | Vulkan |
The Mali graphics accelerator is one of the integrated modules of the SoC chip. Manufacturers of mobile processors HiSilicon (Kirin), Samsung (Exynos), MediaTek (Helio, MT) determine how many computing units to “install” and at what frequency they can operate. In turn, smartphone manufacturers, in the fight against heat generation (heating) of the crystal, are making their own adjustments. The declared frequency is not always the actual operating frequency; a comparison of Mali within the same processor model (identical) shows a significant discrepancy in test results. Regardless of the generation, we test the outdated Mali-400, the outgoing Mali-T880 or the new Mali-G - there is always variation. When faced with the choice of Mali vs Adreno - compare the performance of specific smartphones, link to the rating above the table.
ARM Cortex-A76 and Mali-G76: with an eye on Windows and laptops
Just before the start of the current Computex exhibition, the British processor developer ARM introduced new CPU cores. Compared to its predecessors, Cortex-A76 cores will provide a performance increase of up to 35%, but other details are more interesting. The fact is that for new kernels the scope of use is Windows. However, we will probably see them in smartphones too.
Compared to the Cortex-A75, which is the basis for the Qualcomm Kryo 385 (Snapdragon 845) cores, ARM emphasizes four main advantages.
Branch prediction and instruction fetch blocks are now separated, which should reduce latency under high loads. The front end of the pipeline is capable of processing four to eight instructions per clock cycle, and there have also been changes to the instruction caching system. The second point concerns the instruction decoding stage: the core is capable of processing four instructions per clock cycle - more than in previous generations. The number of micro-operations processed per clock cycle has increased to eight. Third, ARM specifies higher integer and vector throughput. Which should have a positive impact on the machine learning segment. Finally, we note the optimized cache hierarchy. Which, together with the 4th generation prefetch unit, should significantly increase performance.
Numerous changes should result in a 90% increase in integer performance over the Cortex-A73, and a 150% increase in floating point performance. Overall, as ARM points out, the increase compared to the cores released two years ago is 80%; compared to Cortex-A75 - 35%, although ARM uses different clock speeds here: 2.45 GHz for Cortex-A73, 2.8 GHz for Cortex-A75 and 3.0 GHz for Cortex-A76.
"Laptop performance" based on AArch64 Specint2K6 should be significantly higher than previous generations. Single-threaded performance is 110% higher than Cortex-A73, as for the 5W Big.little cluster, the performance will be 90% higher. But in this comparison, clock frequencies are no longer given.
Overall, compared to last year's Cortex-A75, ARM cites a 35% performance boost overall, four times faster machine learning performance, and 40% higher efficiency. The latter is associated with the transition to a new technological process. Cortex-A76 will be produced on TSMC's optimized 7 nm FinFET process, but ARM also mentions TSMC's 16 nm FFC process. SoC developers who are willing to settle for the 16nm process can get started now. As for 7 nm, it will not be possible to start work here before the fourth quarter.
As for the DynamIQ cluster, ARM sees it as the best partner for the new Cortex-A55 cores, which were introduced last year. They will work as efficient cores.
ARM also introduced the Mali-G76 GPU graphics core. The main change compared to the previous flagship Mali-G72 is the doubling of the ALU units, which is associated with doubling the SIMD width. But we still get the Bifrost architecture introduced in 2016, with three execution engines per core. But maximum performance will increase by only 25%, which is due to a decrease in the possible number of cores. If the Mali-G72 had configurations with 32 cores (MP32), now the limit will be 20 (MP20). However, in practice this is unlikely to play a role, since manufacturers did not approach this limit - even the Samsung Exynos 9810 SoC, which is used in the Galaxy S9+ (test), only has 20 cores available. In addition to the performance gains, ARM mentions 30% higher efficiency and 170% higher machine learning performance.
It is still unknown when the first SoCs based on Cortex-A76 or Mali-G76 will be released. But, most likely, they will be the Huawei Kirin 980 and the new high-end Samsung chip, they will be presented in early January 2020. It is quite possible that the successor to the Snapdragon 845 will use new CPU cores as the basis. The orientation of ARM to Windows or Windows on ARM raises much more questions. The platform cannot yet be called successful. However, the fault here is not ARM, but Microsoft and the first three OEMs - ASUS, HP and Lenovo.