-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
When using as-hiccup to convert parsed HTML documents that contain a <script> tag in the <head>, the <body> element remains as an unconverted JSoup Element object instead of being converted to a hiccup vector.
It's actually mentioned in the core.clj, that this is a know issue even before 0.7.1.
But if I use 0.7.1 I can still get all the HTML converted to hiccup.
hickory/src/clj/hickory/core.clj
Lines 78 to 92 in d94c069
| (as-hiccup | |
| ([this] (trampoline as-hiccup this (hzip/hiccup-zip this))) | |
| ([this loc] | |
| ;; There is an issue with the hiccup format, which is that it | |
| ;; can't quite cover all the pieces of HTML, so anything it | |
| ;; doesn't cover is thrown into a string containing the raw | |
| ;; HTML. This presents a problem because it is then never the case | |
| ;; that a string in a hiccup form should be html-escaped (except | |
| ;; in an attribute value) when rendering; it should already have | |
| ;; any escaping. Since the HTML parser quite properly un-escapes | |
| ;; HTML where it should, we have to go back and un-un-escape it | |
| ;; wherever text would have been un-escaped. We do this by | |
| ;; html-escaping the parsed contents of text nodes, and not | |
| ;; html-escaping comments, data-nodes, and the contents of | |
| ;; unescapable nodes. |
These are example using hickory >0.7.1:
(-> "<html><body><h1>Test</h1></body></html>"
hickory/parse
hickory/as-hiccup)
=> ([:html {} [:head {}] [:body {} [:h1 {} "Test"]]])
(-> (str "<!DOCTYPE html><html><head><script src=\"/test.js\"></script></head>"
"<body><h1>Test</h1></body></html>")
hickory/parse
hickory/as-hiccup)
=>
("<!DOCTYPE html>"
[:html
{}
[:head {} [:script {:src "/test.js"}]]
#object [org.jsoup.nodes.Element 0x1647da75 "<body>\n <h1>Test</h1>\n</body>"]])Environment:
- hickory version: >0.7.1
Workaround:
Downgrade to hickory 0.7.1
Metadata
Metadata
Assignees
Labels
No labels