MMX and SSE Optimization - MMX

xiaoxiao2021-04-11  301

MMX and SSE are technologies based on Intel-based SIMD (single instruction multi-data stream). The so-called single-instruction multi-data stream refers to the operation of multiple data that can be completed with one instruction. Although the 64-bit system has been launched, we are mostly using 32-bit systems, so if you want to complete two 128-bit added operations, you can clearly perform 4 additional instructions with ordinary 32-bit instructions, and based on 64-bit The MMX instruction only needs to be executed twice, more powerful SSE can process 128 bits at a time, so you can complete the operation, so you can use MMX and SSE optimization to greatly enhance program performance.

MMX uses a low 64 bit of the 80-bit floating point register of the processor as the MMX register, a total of 8, from MM0 to MM7, because it is "borrow" floating point register low 64 bit, after each time the MMX instruction Be sure to empty the register with the EMMS instruction, MMX is mainly for integer operations, and a 64-bit MMX register can store 8 8-bit or 4 16-bit integers at the same time. Estimated one-time 8 digits Operation or 4 16-bit operations, the MMX instruction to be noted cannot be directly 22 operations directly, but 32-bit split into two 16-bit will be operated. The MMX technology has a very useful feature is saturated, such as two 8-digit plus: 128 129, it is significantly more than 8 bits of maximum 256, but the result of saturated computation will be the largest Value 256, the saturation operation controls the calculation results within the range of the corresponding number.

The following example: You can use the MMX instruction directly in the intra-internal assembly in the VC 6 SP6, and you can view the MMX register __int16 a [] = {1, 2, 3, 4} __ int16 b [] = { 5, 6, 7, 8} _ASM {MOVQ MM0, A / / 4 digits in the array A at once in MM0MOVQ MM1, B // A number of four numbers in array B is stored in MM0PADDSW MM0, MM1 // 16 The saturation of the 16-bit strip is added, and the result is MM0 MOVQ A, MM0 //, store the results in the MM0 in the array A, don't forget, wipe clean}

You can also support this method without the direct SDK and Intel compiler of the inner container. , 3, 4); // Write 4 16-bit integers to A_M64 b = _mm_set_pi16 (5, 6, 7, 8); // The last bit write lowest position, ie 8 write lowest position A = _m_paddsw (a, b); // Complete 16-bit addition operation _m_empty; // Don't forget to wipe PP, 嘿嘿

The above is only the most basic simple introduction, MMX has a total of 57 instructions, including basic arithmetic operation instructions, comparison instructions, conversion instructions, logical instructions, displacements, and transmission instructions, this is not one listed, detailed instructions, please refer to Intel's official website. However, keep in mind that MMX is for integer operations, don't use floating point operations, floating point operations are done with more powerful SSE instructions, presence SSE, and to decompose.


New Post(0)