#Parsing HTML
8 messages · Page 1 of 1 (latest)
Thanks, it really helped my understand what I was doing, but the output it gave me was really weird:
Text {
inner: Traverse {
root: NodeRef {
id: NodeId(
5,
),
tree: Tree { Document => { Element(<html>) => { Element(<head>), Element(<body>) => { Element(<h2 class="h3 product-title">) => { Text("\n "), Element(<a href="https://example.com">) => { Text("Magazine May 2023") }, Text("\n ") }, Text("\n "), Element(<h2 class="h3 product-title">) => { Text("\n "), Element(<a href="https://example.com">) => { Text("Magazine July 2022") }, Text("\n ") }, Text("\n "), Element(<h2 class="h3 product-title">) => { Text("\n "), Element(<a href="https://example.com">) => { Text("Magazine January 2023") }, Text("\n ") }, Text("\n ") } } } },
node: Node {
parent: Some(
NodeId(
4,
),
),
prev_sibling: None,
next_sibling: Some(
NodeId(
10,
),
),
children: Some(
(
NodeId(
6,
),
NodeId(
9,
),
),
),
value: Element(<h2 class="h3 product-title">),
},
},
edge: None,
},
}
So, I used .html() in element and the output I got was:
"<h2 class=\"h3 product-title\">\n <a href=\"https://example.com\">Magazine May 2023</a>\n </h2>"
I guess it's the better if I try to work with this new output, I just need to find a function to worki with it
you need to iterate over the .text()
Returns an iterator over descendent text nodes.
this is because the html doesn't already contain contiguous text, so it's more efficient to give you an iterator than to concatenate it all ignoring what you might then do with that string
Ohhh, so how could I work with this iterator?
the iterator returns &strs