#Soupy: Bringing BeautifulSoup to Rust

9 messages · Page 1 of 1 (latest)

woeful linden
#

Example

<!DOCTYPE html>
<html lang="en">

    <head>
        <meta charset="UTF-8"/>
        <title>Hello!</title>
    </head>

    <body>
        <h1>Hello World!</h1>
        <p>This is a simple paragraph.</p>

        <div class="parent">
            <div class="child">
                <div id="item">
                    <p>Nested item</p>
                    <a href="https://example.com">Example Link</a>
                </div>
            </div>
        </div>
    </body>

    <img class="self-closing"/>

    <!-- Simple comment -->

</html>
use soupy::prelude::*;

const HTML: &str = include_str!("example.html");

fn main() {
    let soup = Soup::new(HTML).expect("Failed to parse HTML");

    println!("Soup {:?}", soup);

    for node in soup.tag("a").attr_name("href") {
        println!("Href {:?}", node.get("href"));
    }

    for node in soup.tag("p") {
        println!("Paragraph {:?}", node);
    }

    if let Some(item) = soup.attr("id", "item").first() {
        println!("Found item {:?}", item);
    }
}
#

Output

soup Soup { ... }
href Some("https://example.com")
paragraph Element { name: "p", attrs: {}, children: [Text("This is a simple paragraph.")] }
paragraph Element { name: "p", attrs: {}, children: [Text("Nested item")] }
Found item Element { name: "div", attrs: {"id": "item"}, children: [Element { name: "p", attrs: {}, children: [Text("Nested item")] }, Element { name: "a", attrs: {"href": "https://example.com"}, children: [Text("Example Link")] }] }
harsh dove
#

Could we use unit structs as identifiers instead of &str?

woeful linden
orchid hatch
#

Hello. Are you looking for contributors on this ?

woeful linden
#

suggestions welcome though, I can't think of anything else I'd really like to add

woeful linden
#

Soupy v0.8.0 has been released

  • HTML querying (standards mode or lenient)
  • XML querying
  • Or query your own user-defined format via the Parser trait
  • Querying nested items is now possible via the QueryItem::query() method
  • Regex filters now work as expected
  • Improved testing and documentation