#Parsing HTML

8 messages · Page 1 of 1 (latest)

late needle
#

You want .text() instead of .value().name()

#

.name() is the name of the element, which is why you got h2 as the answer

gilded shard
#

Thanks, it really helped my understand what I was doing, but the output it gave me was really weird:

Text {
    inner: Traverse {
        root: NodeRef {
            id: NodeId(
                5,
            ),
            tree: Tree { Document => { Element(<html>) => { Element(<head>), Element(<body>) => { Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine May 2023") }, Text("\n    ") }, Text("\n    "), Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine July 2022") }, Text("\n    ") }, Text("\n    "), Element(<h2 class="h3 product-title">) => { Text("\n        "), Element(<a href="https://example.com">) => { Text("Magazine January 2023") }, Text("\n    ") }, Text("\n    ") } } } },
            node: Node {
                parent: Some(
                    NodeId(
                        4,
                    ),
                ),
                prev_sibling: None,
                next_sibling: Some(
                    NodeId(
                        10,
                    ),
                ),
                children: Some(
                    (
                        NodeId(
                            6,
                        ),
                        NodeId(
                            9,
                        ),
                    ),
                ),
                value: Element(<h2 class="h3 product-title">),
            },
        },
        edge: None,
    },
}

So, I used .html() in element and the output I got was:

"<h2 class=\"h3 product-title\">\n        <a href=\"https://example.com\">Magazine May 2023</a>\n    </h2>"

I guess it's the better if I try to work with this new output, I just need to find a function to worki with it

late needle
#

you need to iterate over the .text()

#

Returns an iterator over descendent text nodes.

#

this is because the html doesn't already contain contiguous text, so it's more efficient to give you an iterator than to concatenate it all ignoring what you might then do with that string

gilded shard
#

Ohhh, so how could I work with this iterator?

late needle
#

the iterator returns &strs