Help Implementing "VarInt" and "VarLong" | Rust Programming Language Community | Page 1

glacial rampart Mar 9, 2024, 3:45 AM

#

Hey Yall! I'm trying to implement the "VarInt" and "VarLong" types for use with Minecraft protocol interoperability, where definitions can be found here: https://wiki.vg/Protocol#VarInt_and_VarLong

This is my current implementation of VarInt, but both VarInt and VarLong hang when trying to turn a negative number into bytes, and for some reason my breakpoints aren't working. Can anyone help debug?

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}

Protocol

plush olive Mar 9, 2024, 4:28 AM

#

Sorry if I appear to be a bit off topic
I haven't checked your code, but I want to recommend a crate
binrw
for things you are doing now
especially if you have a lot other binary data types to parse

obtuse roost Mar 9, 2024, 6:40 AM

#

?eval

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits
 dbg!(byte, value, &result);
            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}

let a = VarInt::new(-1).as_bytes();
dbg!(a);
let a = VarInt::new(i32::MIN).as_bytes();
dbg!(a);

distant galeBOT Mar 9, 2024, 6:40 AM

#

[src/main.rs:39:2] byte = 127
[src/main.rs:39:2] value = -1
[src/main.rs:39:2] &result = []
[src/main.rs:52:1] a = [
    127,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -16777216
[src/main.rs:39:2] &result = []
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -131072
[src/main.rs:39:2] &result = [
    128,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -1024
[src/main.rs:39:2] &result = [
    128,
    128,
]
[src/main.rs:39:2] byte = 0
[src/main.rs:39:2] value = -8
[src/main.rs:39:2] &result = [
    128,
    128,
    128,
]
[src/main.rs:39:2] byte = 120
[src/main.rs:39:2] value = -1
[src/main.rs:39:2] &result = [
    128,
    128,
    128,
    128,
]
[src/main.rs:54:1] a = [
    128,
    128,
    128,
    128,
    120,
]

()```

obtuse roost Mar 9, 2024, 6:41 AM

#

glacial rampart Hey Yall! I'm trying to implement the "VarInt" and "VarLong" types for use with ...

The code you gave appears to work fine. What values did you observe it getting stuck on?

glacial rampart Mar 9, 2024, 12:54 PM

#

Any negative numbers. I ran a loop from i16::MAX..=i16::MIN, printing out every number as they passed whether my code properly encoded/decoded them, and it hit zero before my cpu usage spiked and my zram filled up and either the program got closed by oomd or my kernel panicked

#

I did have working code before that worked, but the encoding was giving wrong bytes compared to the reference in the article

#

The encoding is apparently similar to Protocol Buffers, and the numbers are signed (obv), but it doesn't use zigzag encoding. Additionally, VarInts can only be maximum 5 bytes (VarLongs maximum 10 bytes) while ProtoBufs will always be 10 bytes for negative numbers if using int32 (normal encoding, signed) over sint32 (ZigZag encoding, signed)

obtuse roost Mar 9, 2024, 1:32 PM

#

?eval

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash)]
pub struct VarInt {
    value: i32,
}

impl VarInt {
    pub fn new(value: i32) -> Self { Self { value } }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i32 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i32) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(5);
        let mut value = self.value as i64; // Convert to i64 to handle negative values correctly
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits
 //dbg!(byte, value, &result);
            if (value == 0 && (byte & 0x40) == 0) || (value == -1 && (byte & 0x40) != 0) {
                result.push(byte);
                break;
            } else {
                result.push(byte | 0x80); // Set the most significant bit to continue the encoding
            }
        }
        result
    }
}

for i in i16::MIN..0 {
  let  _ = VarInt::new(i as _).as_bytes();
}
println!("done");

distant galeBOT Mar 9, 2024, 1:32 PM

#

done
()```

obtuse roost Mar 9, 2024, 1:33 PM

#

glacial rampart Any negative numbers. I ran a loop from i16::MAX..=i16::MIN, printing out every ...

The code you posted above does not cause an infinite loop for those negative values. Do you have other code that could be causing it? Have you tried just like printing stuff in your loops?

glacial rampart Mar 9, 2024, 1:39 PM

#

Oh wait! Figured it out, the encoding is still wrong for VarInt but the actual infinite looping only occurs in VarLong (this is totally my bad im sorry)

#

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub struct VarLong {
    value: i64,
}

impl VarLong {
    pub fn new(value: i64) -> Self {
        Self { value }
    }

    pub fn from_bytes(bytes: &[u8]) -> Self {
        let mut result: i64 = 0;
        let mut shift: u32 = 0;
        let input_iter = bytes.iter();

        for byte in input_iter {
            result |= ((byte & 0x7f) as i64) << shift;
            shift += 7;
            if (byte & 0x80) == 0 {
                if shift < 64 && (byte & 0x40) != 0 {
                    return Self {
                        value: result | (!0 << shift),
                    };
                }
                return Self { value: result };
            }
        }

        Self { value: result }
    }

    pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(10); // Capacity for 64-bit integers
        let mut value = self.value;
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            // Check if this is the last byte to encode
            if value == 0 {
                if byte & 0x40 != 0 {
                    // If the sign bit is set and the value is negative, add the remaining bytes to make the value -1
                    for _ in 0..9 {
                        result.push(0xff);
                    }
                    result.push(0x01);
                } else {
                    // Otherwise, just push the byte
                    result.push(byte);
                }
                break;
            } else {
                // If there are more bytes to encode, set the most significant bit to continue the encoding
                result.push(byte | 0x80);
            }
        }
        result
    }
}

#

im 100% sure the problem is the as_bytes method but I'm not sure exactly what the problem is

#

all numbers from 0..=i64::MAX work but negative numbers crash my computer lmao

obtuse roost Mar 9, 2024, 1:45 PM

#

You only handle value == 0 as a base case, which will never be reached when value is initially negative

#

Why does your varlong as_bytes even need to be different from your variant as_bytes?

devout spindle Mar 9, 2024, 2:20 PM

#

glacial rampart Hey Yall! I'm trying to implement the "VarInt" and "VarLong" types for use with ...

The code they use in that wiki is incorrect, as it doesn't stop a VarInt from being 33 to 35 bits (position is a multiple of 7, so it will continue to allow more bits until it gets 35 bits). Do you want it to match the behavior in the wiki or to work correctly?

#

Or maybe it only allows up to 28 bits.

#

Let me check.

#

Yes, it allows 35 bits.

glacial rampart Mar 9, 2024, 2:49 PM

#

obtuse roost Why does your varlong as_bytes even need to be different from your variant as_by...

It doesn't; The problem is that (for some reason) one of the implementations just doesn't give the proper values for negative numbers (as they should be the max number of bytes according to the wiki), and the other one just gets oomd'd out of existence

orchid nimbus Mar 9, 2024, 3:03 PM

#

glacial rampart all numbers from `0..=i64::MAX` work but negative numbers crash my computer lmao

you can do value as u64 before doing the conversion, this worked for me iirc

devout spindle Mar 9, 2024, 3:05 PM

#

Yeah, don't use i64 as it sign extends it.

glacial rampart Mar 9, 2024, 3:14 PM

#

orchid nimbus you can do `value as u64` before doing the conversion, this worked for me iirc

oh! it works now! thank you!

#

pub fn as_bytes(&self) -> Vec<u8> {
        let mut result = Vec::with_capacity(10); // Capacity for 64-bit integers
        let mut value = self.value as u64; // LITERALLY JUST ADDING "as u64" oml
        let mut byte: u8;

        loop {
            byte = (value & 0x7f) as u8; // Extract the 7 least significant bits
            value >>= 7; // Shift right by 7 bits

            // Check if this is the last byte to encode
            if value == 0 {
                if byte & 0x40 != 0 {
                    // If the sign bit is set and the value is negative, add the remaining bytes to make the value -1
                    for _ in 0..9 {
                        result.push(0xff);
                    }
                    result.push(0x01);
                } else {
                    // Otherwise, just push the byte
                    result.push(byte);
                }
                break;
            } else {
                // If there are more bytes to encode, set the most significant bit to continue the encoding
                result.push(byte | 0x80);
            }
        }
        result
    }

glacial rampart Mar 10, 2024, 1:30 AM

#

Actually nvm it did not work for certain values, but I fixed and tested it. Now I have a question about optimization; Here's the normal version of the as_bytes method, and a slightly modified version that uses a stack-based array until it's turned into a vec, truncated, and returned. Why is the second one faster if it still has to allocate the heap memory for the vector?

pub fn as_bytes(&self) -> Vec<u8> {
        let mut bytes = Vec::with_capacity(10);
        let mut value = self.value as u64;

        loop {
            if (value & !0x7f) == 0 {
                bytes.push(value as u8);
                break;
            }
            bytes.push(((value & 0x7f) | 0x80) as u8);
            value >>= 7;
        }
        bytes
    }

    pub fn as_bytes_stack(&self) -> Vec<[u8; 10]> {
        let mut bytes = [0u8; 10];
        let mut value = self.value as u64;
        let mut index = 0;

        loop {
            if (value & !0x7f) == 0 {
                bytes[index] = value as u8;
                break;
            }
            bytes[index] = ((value & 0x7f) | 0x80) as u8;
            value >>= 7;
            index += 1;
        }
        let mut vec = vec!(bytes);
        vec.truncate(index + 1);
        vec
    }

#Help Implementing "VarInt" and "VarLong"