Jump to content

C++ neophyte question


Guest 3ndal1s

Recommended Posts

Hello i'm a C++ newbie and i am starting my exploration of mangos sourcecode.

The first files i tried to read have been some /libmpq/ sources, huffman.cpp in particular.

I don't really understand a statement in that file, can someone explain it to me with some patience?

The line of interest lies within a function called libmpq_huff_get_8bits:

static unsigned long libmpq_huff_get_8bits(struct huffman_input_stream *is) {
   unsigned long one_byte;

   if (is->bits <= 8) {
       is->bit_buf |= *(unsigned short *)is->in_buf << is->bits;
       is->in_buf  += sizeof(unsigned short);
       is->bits    += 16;
   }

   one_byte      = (is->bit_buf & 0xFF);
   is->bit_buf >>= 8;
   is->bits     -= 8;

   return one_byte;
}

this function takes on a huffman_input_stream * pointer as a parameter. This struct is declared in huffman.h as it follows:

struct huffman_input_stream {
   unsigned char *in_buf;                /* 00 - Input data */
   unsigned long bit_buf;                /* 04 - Input bit buffer */
   unsigned int bits;                /* 08 - Number of bits remaining in 'byte' */
};

so now let's get to the point i'm missing:

is->bit_buf |= *(unsigned short *)is->in_buf << is->bits;

What does this line do exactly?

Anticipated thanks for the ones who will answer me and my apologies if i committed any mistakes in posting this question here.

Link to comment
Share on other sites

struct huffman_input_stream {
   unsigned char *in_buf; // this is a pointer to our data
   unsigned long bit_buf; // this buffer stores 16 bits of our input (in_buf) at a time
   unsigned int bits; // number of bits in bit_buf (if we have 8 or less we need to load more data into it)
};

static unsigned long libmpq_huff_get_8bits(struct huffman_input_stream *is) {
   unsigned long one_byte; // this is the byte we will return

   if (is->bits <= 8) { // if we have 8 or less bits in our buffer...
       is->bit_buf |= *(unsigned short *)is->in_buf << is->bits; // load the next 16 bits into bit_buf
       is->in_buf  += sizeof(unsigned short); // increment the input so that we are pointing to the next 16 bits to load (for the next time)
       is->bits    += 16; // add 16 to bits so we know that we have 16 more bits loaded
   }

   one_byte      = (is->bit_buf & 0xFF); // set one_byte to the first byte of our buffer
   is->bit_buf >>= 8; // push out the first byte of our buffer (since we've now used it)
   is->bits     -= 8; // remove a byte from our counter (since we just pushed it out)

   return one_byte;
}

Link to comment
Share on other sites

so now let's get to the point i'm missing:

is->bit_buf |= *(unsigned short *)is->in_buf << is->bits;

What does this line do exactly?

For an example, let's say that in_buf = 0x1000 0000. At 0x1000 0000, we have stored the data, the next 16 bits being "0110 0000 0011 0010". bit_buf = 0000 0000 0001 0110 and bits = 8 (since we have 8 bits still loaded in bit_buf).

Now, we want to load the next 16 bits into bit_buf. The first step is to get the next 16 bits from in_buf, which is accomplished by *(unsigned short *)is->in_buf (since in_buf is a pointer, we need to use * to get the actual data from it, and since we only want 16 bits we cast it to unsigned short.)

Loaded data: 0110 0000 0011 0010

The next step it to make sure we are only loading 16 bits and not losing any data. Right now there is 8 bits still loaded into bit_buf, so if we just set bit_buf to what we just loaded, we will lose those 8 bits. So what we do is shift our newly loaded data left by however many bits are still in bit_buf so we aren't overwriting anything but we aren't getting out of order.

Loaded data is now: 0011 0010 0000 0000

Next, we will use the OR operator to set our data without losing the old data. OR takes 2 binary numbers and combines them so that any bits that = 1 in either numbers = 1 in the new number, and any bits that = 0 in both numbers = 0 in the new number.

bit_buf: 0000 0000 0001 0110

New data: 0011 0010 0000 0000

After OR: 0011 0010 0001 0110

And there you have it. bit_buf is now loaded with the next 16 bits.

Link to comment
Share on other sites

Now time for my question.

   if (is->bits <= 8) {
       is->bit_buf |= *(unsigned short *)is->in_buf << is->bits;
       is->in_buf  += sizeof(unsigned short);
       is->bits    += 16;
   }

Wouldn't this part of the code actually lose data? Every time we run this, we are loading 16 bits of data from the input; however, not all of this will actually be put into bit_buf, most of the time only the first 8 bits will actually be loaded, but we are incrementing the input by 16 bits. Wouldn't those last 8 bits then just be lost since the input is moved forward past them?

And that's enough spam from me. :)

Link to comment
Share on other sites

@Patman128

Basically, the shifts only operate on int32 and uint32 types. This means the bits are only lost if you assign the result into a type that is too narrow to store it.

You can read the details in the C99 standard, sections 6.3.1.1 (for integer promotion) and 6.5.7 (for bitwise-shift semantics)

Thanks, but that's not exactly what I meant.

Let's say we had our input data:

1011 0101 0100 0101 0000 0101 0111 0001

When we begin, in_buf points to the beginning:

1011 0101 0100 0101 0000 0101 0111 0001

^ <- in_buf points to first bit

Now, we load our data. We load 16 bits shifted left by however many are still in the bit buffer. So if our bit buffer has 8 bits:

0000 0000 0110 1111

then we want to load:

0000 0101 0111 0001

shifted left 8 bits, or:

0111 0001 0000 0000

so that they will OR together without losing data. However, then the in_buf pointer is moved over 16 bits:

1011 0101 0100 0101 0000 0101 0111 0001

^-----<-----<----<---- move pointer forward 16 bits

So my question is: what happens to the 8 bites that were skipped and never loaded?

1011 0101 0100 0101 0000 0101 0111 0001

|---------| <- these bits were never loaded!

I apologize ahead of time if I'm being obtuse and you already explained it and I just didn't understand.

Link to comment
Share on other sites

In these snipets, I use little-endian hex - lshifts actually move numbers right, and vise versa.

Also, we read from the left side of the buffer.

8 bits (previous read) | 16 bits (current read, lshifted 8 bits) => bit_buf (32 or 64 bit variable, depending on which source you have):

pp 00 00 00 | (cc cc 00 00 << 8) => pp cc cc 00

Here is a more drawn-out example...

Initial contents:

in_buf    43 7F 20 89 4C 11 2D FF
bit_buf   00 00 00 00
bits      0

First call to get_8bit:

// after extract from in_buf
in_buf    20 89 4C 11 2D FF
bit_buf   43 7F 00 00
bits      16

// after extract from bit_buf
in_buf    20 89 4C 11 2D FF
bit_buf   7F 00 00 00
bits      8
one_byte  43

Second call to get_8bit:

// after extract from in_buf
in_buf    4C 11 2D FF
bit_buf   7F 20 89 00
bits      24

// after extract from bit_buf
in_buf    4C 11 2D FF
bit_buf   20 89 00 00
bits      16
one_byte  7F

Third call to get_8bit:

// don't need to extract from in_buf

// after extract from bit_buf
in_buf    4C 11 2D FF
bit_buf   89 00 00 00
bits      8
one_byte  20

Link to comment
Share on other sites

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Privacy Policy Terms of Use