Parsing HTML | Rust Programming Language Community | Page 1

late needle May 22, 2023, 3:03 PM

#

You want .text() instead of .value().name()

#

.name() is the name of the element, which is why you got h2 as the answer

gilded shard May 22, 2023, 9:15 PM

#

Thanks, it really helped my understand what I was doing, but the output it gave me was really weird:

Text {
    inner: Traverse {
        root: NodeRef {
            id: NodeId(
                5,
            ),
            tree: Tree { Document => { Element(<html>) => { Element(<head>), Element(<body>) => { Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine May 2023") }, Text("\n    ") }, Text("\n    "), Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine July 2022") }, Text("\n    ") }, Text("\n    "), Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine January 2023") }, Text("\n    ") }, Text("\n    ") } } } },
            node: Node {
                parent: Some(
                    NodeId(
                        4,
                    ),
                ),
                prev_sibling: None,
                next_sibling: Some(
                    NodeId(
                        10,
                    ),
                ),
                children: Some(
                    (
                        NodeId(
                            6,
                        ),
                        NodeId(
                            9,
                        ),
                    ),
                ),
                value: Element(<h2 class="h3 product-title">),
            },
        },
        edge: None,
    },
}

So, I used .html() in element and the output I got was:

"<h2 class=\"h3 product-title\">\n        <a href=\"https://example.com\">Magazine May 2023</a>\n    </h2>"

I guess it's the better if I try to work with this new output, I just need to find a function to worki with it

late needle May 22, 2023, 9:17 PM

#

you need to iterate over the .text()

#

Returns an iterator over descendent text nodes.

#

this is because the html doesn't already contain contiguous text, so it's more efficient to give you an iterator than to concatenate it all ignoring what you might then do with that string

gilded shard May 22, 2023, 9:24 PM

#

Ohhh, so how could I work with this iterator?

late needle May 22, 2023, 9:25 PM

#

the iterator returns &strs

#Parsing HTML