#How to split a string and return indices with split contents

11 messages · Page 1 of 1 (latest)

safe pumice
#

is there a convenient way to split a string and get the index of each split?
using split only gives me the content and using match_indices with a char pattern returns each matched char individually, I need strings in this case
e.g. "foo bar \nbaz" -> [("foo", 0), ("bar", 4), ("\nbaz", 8)]

buoyant turtle
#

You can use the enumerate iterator adapter

#

?play

fn main() {
    dbg!("foo bar \nbaz".split(" ").enumerate().collect::<Vec<_>>());
}
plain ventureBOT
#
[src/main.rs:2] "foo bar \nbaz".split(" ").enumerate().collect::<Vec<_>>() = [
    (
        0,
        "foo",
    ),
    (
        1,
        "bar",
    ),
    (
        2,
        "\nbaz",
    ),
]```
safe pumice
buoyant turtle
#

Ohh yeah, sorry I misread your sample output

buoyant turtle
#

?play

fn split_with_index<'a>(string: &'a str, delimiter: &'a str) -> impl Iterator<Item=(usize, &'a str)> {
    string
        .split(delimiter)
        .scan(0, |offset, item| {
            let to_return = Some((*offset, item));
            *offset += item.len() + delimiter.len();
            to_return
        })
}

fn main() {
    dbg!(split_with_index("foo bar \nbaz", " ").collect::<Vec<_>>());
}
plain ventureBOT
#
[src/main.rs:12] split_with_index("foo bar \nbaz", " ").collect::<Vec<_>>() = [
    (
        0,
        "foo",
    ),
    (
        4,
        "bar",
    ),
    (
        8,
        "\nbaz",
    ),
]```
buoyant turtle
#

I wanted to make the delimiter have a shorter lifetime but it complains about the return type capturing a lifetime behind the scenes (cos it's passed to split) and I didn't have enough time to work it out. But should be good as an example

#

Also there's a limitation in that split supports anything that takes a Pattern while this function only works with &str as a delimiter.... It seems Pattern wouldn't be aware of how long it is though so dunno how you do it for a generic Pattern delimiter

Or indeed how you even write a generic function around split since Pattern is nightly only

safe pumice