this post was submitted on 02 Sep 2024
5 points (100.0% liked)
Rust
5933 readers
7 users here now
Welcome to the Rust community! This is a place to discuss about the Rust programming language.
Wormhole
Credits
- The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I would try another HTML 5 parser. HTML 5 is somewhat of a unification of HTML and XHTML, getting into syntax-specifics between the two with XML parsing is probably going to be an uphill battle. That said, I'm curious what the first line is, it could just be malformed entirely.
Thats the first line:
I thought it was html because it everything on the web is html. But because of the first line I figured out it was xhtml which should be parsed with xml parser, but I did not know the transitional is a mix which cant be parsed with anything.
Hmm, doctype declarations are sort of like the markup equivalent of headers. Usually parsers read them to know what flavor to expect and then go parse the rest of the page separately. You shouldn't have to do this, but if you chop off that first line and run it through a standard HTML parser it might work fine.
Thats the first thing that I tried and still failes somewhere deep in the html where I probably shouldn't skip a line.