IEEE floating point representation

zhaozj2021-02-08 384

At the beginning of the month, I went to work in the New Year. However, I finally got up to the New Year, but I found that my 12-day holiday will spend the unity, my friends will take a long time, I really put it. Cool! The most recent popularity is low, even the illegal rate (do not dare to rinse the rate, afraid of being cut), there is a lot of people, I think this is from this one of the Spring Festival, this is unable to have a friend who can't work on the forum. Maybe go home to reunite. It is like this kind of homeless person, in addition to the happiness of others, it is only to support our brothers and sisters who are still holding our C / face-oriented object, I wish the annual salary rose, the position is high, Good health, family happiness!

Recently, I saw a lot of discussion about floating point variables in C , then I gave people who have doubtful people in this question, explain how Intel's processors handle floating point numbers. In order to make more convenient explanations, I only take FLOAT type as an example. From the storage structure and algorithm, Double and Float are the same, different places just float is 32-bit, Double is 64-bit, So Double can store higher precision. What else to say is that there are articles and procedures, compatibility is a certain range, so you want to read this article completely, you better understand the binary, decimal, and hexadecimal conversion, understand data The storage structure in memory and the Simple console program is compiled using VC.NET. OK, let's start below.

Everyone knows that any data is stored in binary (1 or) in order, each 1 or 0 is called 1 bit, and one byte on X86CPU is 8 bits. For example, the value of a 16-bit (2 byte) Short INT type variable is 1156, then its binary expression is: 00000100 10000100. Since the INTEL CPU architecture is Little Endian (invigible to the parameter computer principle related knowledge), it is stored in byte, then it is thus: 10000100 00000100, this is the structure of the fixed point 1156 in memory.

So how is the floating point number stored? All C / C compilers currently known are calculated in accordance with the IEEE floating point representation in accordance with the IEEE (International Electronic Engineering Engineer Association). This structure is a scientific representation, with symbols (positive or negative), index and mantissa, the base is determined to be 2, that is, the number of floating point numbers is indicated by the index of 2 by 2. Symbol. Let's take a look at the specific FLOAT specification:

Float is a total of 32 bits, and the 4-byte is the highest to the lowest positions. 30-23 bits, a total of 8 bits are index bits. 22-0-bit, a total of 23 is the mantra. A group is divided into group 4, which is group 4, Group Group D, respectively. Each group is a byte, in-memory storage, namely: DCBA

We don't consider the problem of reverse storage, because it will completely faint the readers, so I will turn it out in order to turn them again.

Let us now convert the Float type floating point 12345.0f into hexadecimal code according to the IEEE floating point representation. When dealing with this non-fractional floating point number, the integer portion is directly converted to binary representation: 1 11100010 01000000 can also be said: 11110001001000000.0 then move the decimal point to the left, only one digit, the highest bit, the highest bit 1: 1.111000100100000 A total of 16 digits, and the left shift one in the rule of the BUO is equal to the middle index 1 in the bottom of the scientific calculation, so the original number is equal to this: 1.11100010010000000 * (2 ^ 16) Ok, now we want the mantissa and index come out. Obviously, the highest bit is always 1, because you can't say that 16 eggs have been bought for 0016 eggs? (Oh, don't take the stinky egg you bought ~), so this 1 should we still keep him? (: Nothing!) Ok, we delete him. The binary of the mantissa becomes: 11100010010000000 Finally, after the latter, it is up to 23: 11100010010000000000000 (MD, these 0 almost did not put my back angry ~), a total 8-bit, can represent the unsigned integer of 0 - 255, or the symbolic integer of -128 - 127. However, since the index can be negative, in order to uniformize the integersion of Ten-based integers, add 127, here, our 16 plus 127 will become 143, binary is: 10001111112345.0F The number is positive, so the sign bit is 0, then we take it according to the previous format: 0 10001111 111000100100000000000000001000111 11110001 00100000 00000000 Re-transformation to 16 Enterprises: 47 f1 20 00, finally turning it over, it will become : 00 20 F1 47. Now you have turned 54321.0f to binary, you will practice it!

After having the above foundation, I will take a decisive example to see why accurate problems. According to the IEEE floating point representation, the FLOAT floating point 123.456f is converted to hexadecimal code. The integer portion and the fractional portion are required for this rated number. Directization of integral portions: 100100011. The decisions are more troublesome, nor is it very good, it may be a little better, such as a decimal pure decimal 0.57826, then 5 is very bit, the level is 1/10; 7 is a percentile, bit The order is 1/100; 8 is a thousand points, the level is 1/1000 ...... The relationship between these levels is 10 ^ 1, 10 ^ 2, 10 ^ 3 ..., now assume that each sequence is {S1, S2, S3, ..., SN}, here are 5, 7, 8, 2, 6, and this pure decimal can be said: n = S1 * (1 / (10 ^ 1)) S2 * (1 / (10 ^ 2)) S3 * (1 / (10 ^ 3)) ... SN * (1 / (10 ^ n)). This is this in this formula to B-en-purified decimal: n = s1 * (1 / (b ^ 1)) S2 * (1 / (b ^ 2)) S3 * (1 / (b ^ 3) ) ... sn * (1 / (b ^ n)), how can I become a math, how can I become a math teacher! No way, in order to program the interests of the enthusiasts, drink the water to continue! Now a binary pure decimal, such as 0.100101011, it should be better understood. This number of sequences of this number will be 1 / (2 ^ 1), 1 / (2 ^ 2), 1 / (2 ^ 3), 1 / (2 ^ 4), 0.5, 0.25, 0.125, 0.0625 ....... Multiplying 1 or 0 in the S sequence to calculate each item and then add it. Now your basic knowledge is enough, then go back to see 0.45 this decimal pure decimal, how to say? Now you have to do it, it is best not to see the answer first, so it is good for you.

I think you can't wait to see the answer, because you find that this is not possible! Take a look: 1/2 ^ 1 (for convenience, only 2 index is used in 2), 0.456 is smaller than the bit stage value 0.5 is 0; 2, 0.456 is greater than the bit stage 0.25, this bit is 1 And drop 0.25 to 0.206 in 0.206; 3 bits, 0.206 is greater than the bit stage 0.125, the bit is 1, and 0.206 minus 0.125 to 0.081, one, 0.081 greater than 0.0625, 1, and subtract 0.081 0.0625 to 0.0185 to one; 5 bits 0.0185 less than 0.03125, 0 ... problem, even if the maximum length of the mantissa is not exhausted! This is the famous floating point accuracy problem. However, I am here to tell you "numerical calculation", use various methods to improve the calculation accuracy, because it is too contrastic, I am afraid I will talk about it in the last year. I am here only to clear the floating point representation.

OK, we continue. Well, I just said? Oh, right, the number has not finished, anyway, I finally seek not exhausted, plus the previous integer part is enough to be 24. 1111011001. A BC asked: "Isn't it 23?" I: "Don't say it, do you have to remove the first 1? Of course, you have to add one!" Now start to shift a decimal point to the left, everyone shift together, The public: "1, 2, 3 ..." Ok, a total of 6 digits, 6 plus 127 won 131 (how to follow the primary school students? Ha ha ~), binary is expressed as: 10000101, symbolite is ... ... don't say it, the more you say, everyone look: 0 10000101 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 111011011101001011 1110110111010010111100142 F6 E9 7979 E9 F6 42 below how to convert pure decimal into hexadecimal. For pure decimal, such as 0.0456, we need to make him regrained, changed to 1.xxxx * (2 ^ n), requires that the pure decimal X corresponds to the following formula: n = int (1 log (2 ) X);

0.0456 We can expressed as 1.4592 multiplied power at 2 -5 -5 times, ie 1.4592 * (2 ^ -5). After transformation into this form, according to the flow process in the second example above: 1. Remove the first 101110100010001000110001001000100010001000111000100010 011000110001110001000 011101011000111000100010101010110001110001000 0111010110001100010001 final: 11 C7 3A 3D

Another thing to mention is that the hexadecimal corresponding to 0.0f is 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00.

Finally, a function source code that can be analyzed and outputting a floating point structure, is interested in see you:

/ / Enter 4 bytes of floating point memory data void decodefloat (Byte Pbyte [4]) {printf ("original (decimal):% D% D% D% D / N", (int) PBYTE [0], (int) PBYTE [1], (int) PBYTE [2], (int) PBYTE [3]); Printf ("Flip (Ten):% D% D% D% D / N", (int) PBYTE [ 3], (int) PBYTE [2], (int) PBYTE [1], (int) PBYTE [0]); BitSet <32> BITALL (* (ulong *) pbyte); string strbinary = BITAll.to_STRING

ALLOCATOR

> ();

Strbinary.insert (9, ");

Strbinary.insert (1, ");

COUT << "binary:" << strbinary.c_str () << endl;

Cout << "Symbol:" << (BITALL [31]? "-": " ") << ENDL;

BitSet <32> BitTemp;

BitTemp = BITAll;

BitTemp << = 1;

Long uLexponent = 0; for (int i = 0; i <8; i )

{

ULEXPONENT | = (BitTemp [31 - i] << (7 - i));

}

ULEXPONENT - = 127;

Cout << "index (decimal):" << ULEXPONENT << ENDL;

BitTemp = BITAll;

BitTemp << = 9;

Float fmantissa = 1.0f;

For (int i = 0; i <23; i )

{

BOOL B = BitTemp [31 - i];

Fmantissa = (Float) BitTemp [31 - I] / (FLOAT) (2 << i));

}

COUT << "Monads (Decimal):" << fmantissa << Endl;

Float fpow;

IF (ULEXPONENT> = 0)

{

FPOW = (float) (2 << (ULEXPONENT - 1));

}

Else

{

FPOW = 1.0f / (float) (2 << (-1 - uLexponent);

}

Cout << "Computing Results:" << fmantissa * fpow << Endl;

}

I was exhausted, I found this article, although short, but it is really difficult to write. God, I am not the machine, but why do I have only 1 and 0? It seems that I have also become the vocabulator in the hacker empire ... I hope everyone can live up to my own, help up!

Creamdog is completed on January 18, 2004 on January 18, 2004

Another note:

Thank you very much for your support!

For a long time, I always want to answer some questions about the issues raised by this article. But I always put it again for time and other problems, I am very sorry.

I have never seen the details of IEEE for floating point numbers. For these conversion methods, I combine my multi-year work experience and programming technology, and add the test code to debug memory. So there is a problem in this article is inevitable, but my original intention is to tell everyone "Computer represents a precision problem with the floating point number, and the specific conversion method is not deep, but the general conversion law has been made. Explanation, very small for special circumstances.

I am very grateful to Combative (strive upstream), Yaoxinyan (), Yaoxinyan (), Sharkhuang (love and programs do not understand) three friends, they pointed out the mistakes in some articles and let me know a lot of details. Question, these are very helpful for people and all those who read this article!

Since there is not a small number in the original, some problems are more serious, but it is not enough to correct these questions in the information I have now, so I can only hope that other friends will need to pay a little attention when reading. That is, it is not necessary to reach, so as not to cause adverse effects on the learning of future programming, sorry!

转载请注明原文地址:https://www.9cbs.com/read-485.html

9cbs

New Post(0)