#Help Implementing "VarInt" and "VarLong"

27 messages · Page 1 of 1 (latest)

glacial rampart
#

Hey Yall! I'm trying to implement the "VarInt" and "VarLong" types for use with Minecraft protocol interoperability, where definitions can be found here: https://wiki.vg/Protocol#VarInt_and_VarLong

This is my current implementation of VarInt, but both VarInt and VarLong hang when trying to turn a negative number into bytes, and for some reason my breakpoints aren't working. Can anyone help debug?

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}
plush olive
#

Sorry if I appear to be a bit off topic
I haven't checked your code, but I want to recommend a crate
binrw
for things you are doing now
especially if you have a lot other binary data types to parse

obtuse roost
#

?eval

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits
 dbg!(byte, value, &result);
            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}

let a = VarInt::new(-1).as_bytes();
dbg!(a);
let a = VarInt::new(i32::MIN).as_bytes();
dbg!(a);
distant galeBOT
#
[src/main.rs:39:2] byte = 127
[src/main.rs:39:2] value = -1
[src/main.rs:39:2] &result = []
[src/main.rs:52:1] a = [
    127,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -16777216
[src/main.rs:39:2] &result = []
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -131072
[src/main.rs:39:2] &result = [
    128,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -1024
[src/main.rs:39:2] &result = [
    128,
    128,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -8
[src/main.rs:39:2] &result = [
    128,
    128,
    128,
]
[src/main.rs:39:2] byte = 120
[src/main.rs:39:2] value = -1
[src/main.rs:39:2] &result = [
    128,
    128,
    128,
    128,
]
[src/main.rs:54:1] a = [
    128,
    128,
    128,
    128,
    120,
]

()```
obtuse roost
glacial rampart
#

Any negative numbers. I ran a loop from i16::MAX..=i16::MIN, printing out every number as they passed whether my code properly encoded/decoded them, and it hit zero before my cpu usage spiked and my zram filled up and either the program got closed by oomd or my kernel panicked

#

I did have working code before that worked, but the encoding was giving wrong bytes compared to the reference in the article

#

The encoding is apparently similar to Protocol Buffers, and the numbers are signed (obv), but it doesn't use zigzag encoding. Additionally, VarInts can only be maximum 5 bytes (VarLongs maximum 10 bytes) while ProtoBufs will always be 10 bytes for negative numbers if using int32 (normal encoding, signed) over sint32 (ZigZag encoding, signed)

obtuse roost
#

?eval

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits
 //dbg!(byte, value, &result);
            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}

for i in i16::MIN..0 {
  let  _ = VarInt::new(i as _).as_bytes();
}
println!("done");
distant galeBOT
#
done
()```
obtuse roost
glacial rampart
#

Oh wait! Figured it out, the encoding is still wrong for VarInt but the actual infinite looping only occurs in VarLong (this is totally my bad im sorry)

#
#[derive(Ord, PartialOrd, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct VarLong {
    value: i64,
}

impl VarLong {
    pub fn new(value: i64) -> Self {
        Self { value }
    }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i64 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i64) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(10); // Capacity for 64-bit integers
        let mut value = self.value;
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            // Check if this is the last byte to encode
            if value == 0 {
                if byte & 0x40 != 0 {
                    // If the sign bit is set and the value is negative, add the remaining bytes to make the value -1
                    for _ in 0..9 {
                        result.push(0xff);
                    }
                    result.push(0x01);
                } else {
                    // Otherwise, just push the byte
                    result.push(byte);
                }
                break;
            } else {
                // If there are more bytes to encode, set the most significant bit to continue the encoding
                result.push(byte | 0x80);
            }
        }
        result
    }
}
#

im 100% sure the problem is the as_bytes method but I'm not sure exactly what the problem is

#

all numbers from 0..=i64::MAX work but negative numbers crash my computer lmao

obtuse roost
#

You only handle value == 0 as a base case, which will never be reached when value is initially negative

#

Why does your varlong as_bytes even need to be different from your variant as_bytes?

devout spindle
#

Or maybe it only allows up to 28 bits.

#

Let me check.

#

Yes, it allows 35 bits.

glacial rampart
orchid nimbus
devout spindle
#

Yeah, don't use i64 as it sign extends it.

glacial rampart
#
pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(10); // Capacity for 64-bit integers
        let mut value = self.value as u64; // LITERALLY JUST ADDING "as u64" oml
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            // Check if this is the last byte to encode
            if value == 0 {
                if byte & 0x40 != 0 {
                    // If the sign bit is set and the value is negative, add the remaining bytes to make the value -1
                    for _ in 0..9 {
                        result.push(0xff);
                    }
                    result.push(0x01);
                } else {
                    // Otherwise, just push the byte
                    result.push(byte);
                }
                break;
            } else {
                // If there are more bytes to encode, set the most significant bit to continue the encoding
                result.push(byte | 0x80);
            }
        }
        result
    }
glacial rampart
#

Actually nvm it did not work for certain values, but I fixed and tested it. Now I have a question about optimization; Here's the normal version of the as_bytes method, and a slightly modified version that uses a stack-based array until it's turned into a vec, truncated, and returned. Why is the second one faster if it still has to allocate the heap memory for the vector?

pub fn as_bytes(&self) -> Vec<u8> {
        let mut bytes = Vec::with_capacity(10);
        let mut value = self.value as u64;

        loop {
            if (value & !0x7f) == 0 {
                bytes.push(value as u8);
                break;
            }
            bytes.push(((value & 0x7f) | 0x80) as u8);
            value >>= 7;
        }
        bytes
    }

    pub fn as_bytes_stack(&self) -> Vec<[u8; 10]> {
        let mut bytes = [0u8; 10];
        let mut value = self.value as u64;
        let mut index = 0;

        loop {
            if (value & !0x7f) == 0 {
                bytes[index] = value as u8;
                break;
            }
            bytes[index] = ((value & 0x7f) | 0x80) as u8;
            value >>= 7;
            index += 1;
        }
        let mut vec = vec!(bytes);
        vec.truncate(index + 1);
        vec
    }