#Creating my own data format - how does serde work?

6 messages · Page 1 of 1 (latest)

jade yoke
#

Hey there!
For learning purposes, I wanted to make my own data format. It's called huan. What is it? Slightly opinionated yaml. For example:

name: "John"
age: 33
adult: true
address:
    house: "Abyss"
    postal: 33333

Now, I already built a tokenizer/lexer.
The type looks like:

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub enum TokenType<'a> {
    Identifier(&'a str),
    IdentifierSpace,
    Str(&'a str),
    Int(i64),
    NewLine,
    WhiteSpace(usize),
    Boolean(bool),
}
```I get a `Vec<TokenType>` from the tokenizer. I tested it, it works.
I wanted to plug it into serde, but I have no idea how.
```rs
pub struct Deserializer<'de> {
    pub input: &'de [TokenType<'de>],
    cursor: usize,
}

impl<'de> Deserializer<'de> {
    pub fn new(input: &'de [TokenType<'de>]) -> Self {
        Self { input, cursor: 0 }
    }

    pub fn peek(&self) -> Result<&'de TokenType<'de>> {
        self.input.get(self.cursor).ok_or(Error::Eof)
    }

    pub fn advance(&mut self) -> Result<&'de TokenType<'de>> {
        let token = self.peek()?;
        self.cursor += 1;
        Ok(token)
    }
}

impl<'de> de::Deserializer<'de> for &mut Deserializer<'de> {
    type Error = Error;

This is what I currently have, but I have no idea how to continue. My biggest question: how does it parse/visit identifiers? I don't get it. I can't get a grip on the documentation, every time I read it I end up with more questions. A little guidance would be highly appreciated.

Thanks in advance

#

Creating my own data format - how does serde work?

keen stirrup
#

First of all, lets fix your lifetimes. 'de is the lifetime of the serialized data, most likely either &'de [u8] or &'de str. There are no TokenTypes in there. You could make another lifetime for the slice of tokens:

pub struct Deserializer<'a, 'de> {
    pub input: &'a [TokenType<'de>],
    cursor: usize,
}
```but it's more flexible to just make it a generic so you can use any slice type, like `Vec`
```rust
pub struct Deserializer<T> {
    pub input: T,
    cursor: usize,
}

impl<'de, T: AsRef<[TokenType<'de>]>> Deserializer<T> {
    pub fn new(input: T) -> Self {
        Self { input, cursor: 0 }
    }

    pub fn peek(&self) -> Result<TokenType<'de>> {
        self.input
            .as_ref()
            .get(self.cursor)
            .ok_or(Error::Eof)
            .copied()
    }

    pub fn advance(&mut self) -> Result<TokenType<'de>> {
        let token = self.peek()?;
        self.cursor += 1;
        Ok(token)
    }
}
```And that means your `Deserializer` impl looks like
```rust
impl<'de, T: AsRef<[TokenType<'de>]>> serde::Deserializer<'de> for Deserializer<T> {
  ...
}
```And then since your format is self-describing, you can forward all the methods to `deserialize_any`
```rust
impl<'de, T: AsRef<[TokenType<'de>]>> serde::Deserializer<'de> for Deserializer<T> {
    type Error = Error;

    fn deserialize_any<V: Visitor<'de>>(mut self, visitor: V) -> Result<V::Value> {
        loop {
            match self.advance()? {
                TokenType::Boolean(b) => return visitor.visit_bool(b),
                _ => todo!(),
            }
        }
    }

    serde::forward_to_deserialize_any! {
        bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
        bytes byte_buf option unit unit_struct newtype_struct seq tuple
        tuple_struct map struct enum identifier ignored_any
    }
}
jade yoke
#

is it just visit_str?