Parsing HTML and then pretty printing it #18

rlaferla · 2016-05-12T22:15:38Z

I'm trying to parse some html text and then pretty print the entire document. I couldn't tell what was the best way to traverse the hierarchy of nodes/elements and wasn't sure how to get the inner html content of a tag. I'm posting here because I think this could improve the documentation for the API.

   let html = "**** put some html text here. ****"

       let doc = try HTMLDocument(string: html, encoding: NSUTF8StringEncoding)

        if let root = doc.root {
                let str = self.dumpElement(root)
                print(str)
        }

    func dumpElement(element:XMLElement) -> String {
        var str = ""

        str = "<\(element.tag!.uppercaseString)"
        for attr in element.attributes {
            str += " \(attr.0)=\(attr.1)"
        }
        str += ">"
        let nodes = element.childNodes(ofTypes: [.Text])
        for node in nodes {
            str += node.stringValue
        }

        for el in element.children {
            str += self.dumpElement(el)
        }
        str += "</\(element.tag!.uppercaseString)>"
        return str
    }

Is this correct?

The text was updated successfully, but these errors were encountered:

cezheng · 2016-05-13T08:16:28Z

What do you mean exactly by pretty print?

The code you shared actually does the following things:

Make the element's tag name uppercase
Ignored all child node that are not text nodes and elements(which includes CDATA, comment, etc.)
Dump all text nodes first, then elements after them, regardless of their order

So I don't really think it makes sense.

cezheng · 2016-05-13T08:48:42Z

I guess you only want to pretty print some xml for while debugging? try the xmllint command.

hvtor · 2016-09-05T11:59:48Z

How might you create the html String? I'm using a separate class that inherits from NSObject to parse down a URL.

func httpGet(request: NSURLRequest!, callback: @escaping (String, String?) -> Void) {
        var session = URLSession.shared
        var task = session.dataTask(with: request as URLRequest){
            (data, response, error) -> Void in
            if error != nil {
                callback("error", error?.localizedDescription)
            } else {
                var result = String(data: data!, encoding:
                    String.Encoding(rawValue: String.Encoding.ascii.rawValue))!
                callback(result as String, nil)
            }
        }
        task.resume()
    }

in my ViewController I'm trying to :

let html = data
do {
        let doc = try HTMLDocument(string: html, encoding: String.Encoding.utf8)
        } catch {   
        }

but I get an error for the 'html' variable, 'use of unresolved identifier.'

How do I set the html as a string from my initial URLrequest set up in my data service?

cezheng · 2016-09-05T12:43:24Z

@hvtor you don't need to create the string if you have NSData. It is stated in the README that you can create a document with either a String, an NSData(Data for Swift 3), or [CChar] instance. Actually having a NSData instance is simpler since you don't have to specify the Encoding.

let doc = try HTMLDocument(data: data)

hvtor · 2016-09-05T13:12:16Z

@cezheng Yes. Thank you. :) BTW, great documentation. Just a bit 😴 I guess.

cezheng · 2016-09-05T13:25:40Z

@hvtor haha, it's true. Any suggestions on improving it?

hvtor · 2016-09-05T13:59:01Z

No I meant I am 😴. It's great documentation. An example webpage would be great too! Showing how the elements can be mapped over. I'm not too familiar with

I'm trying to parse an IMDb list and it's a series of anchor tags. It's not clear to me how you select for specific tags.

Walking Dead Stranger Things

goes to read the docs again

Parent. And then the child tags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing HTML and then pretty printing it #18

Parsing HTML and then pretty printing it #18

rlaferla commented May 12, 2016 •

edited

Loading

cezheng commented May 13, 2016

cezheng commented May 13, 2016

hvtor commented Sep 5, 2016 •

edited by cezheng

Loading

cezheng commented Sep 5, 2016

hvtor commented Sep 5, 2016

cezheng commented Sep 5, 2016

hvtor commented Sep 5, 2016

Parsing HTML and then pretty printing it #18

Parsing HTML and then pretty printing it #18

Comments

rlaferla commented May 12, 2016 • edited Loading

cezheng commented May 13, 2016

cezheng commented May 13, 2016

hvtor commented Sep 5, 2016 • edited by cezheng Loading

cezheng commented Sep 5, 2016

hvtor commented Sep 5, 2016

cezheng commented Sep 5, 2016

hvtor commented Sep 5, 2016

rlaferla commented May 12, 2016 •

edited

Loading

hvtor commented Sep 5, 2016 •

edited by cezheng

Loading