WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

as-hiccup leaves <body> as JSoup Element when <script> is present in <head> #98

@lusiakpurnama

Description

@lusiakpurnama

When using as-hiccup to convert parsed HTML documents that contain a <script> tag in the <head>, the <body> element remains as an unconverted JSoup Element object instead of being converted to a hiccup vector.

It's actually mentioned in the core.clj, that this is a know issue even before 0.7.1.
But if I use 0.7.1 I can still get all the HTML converted to hiccup.

(as-hiccup
([this] (trampoline as-hiccup this (hzip/hiccup-zip this)))
([this loc]
;; There is an issue with the hiccup format, which is that it
;; can't quite cover all the pieces of HTML, so anything it
;; doesn't cover is thrown into a string containing the raw
;; HTML. This presents a problem because it is then never the case
;; that a string in a hiccup form should be html-escaped (except
;; in an attribute value) when rendering; it should already have
;; any escaping. Since the HTML parser quite properly un-escapes
;; HTML where it should, we have to go back and un-un-escape it
;; wherever text would have been un-escaped. We do this by
;; html-escaping the parsed contents of text nodes, and not
;; html-escaping comments, data-nodes, and the contents of
;; unescapable nodes.

These are example using hickory >0.7.1:

(-> "<html><body><h1>Test</h1></body></html>"
      hickory/parse
      hickory/as-hiccup)
=> ([:html {} [:head {}] [:body {} [:h1 {} "Test"]]])

(-> (str "<!DOCTYPE html><html><head><script src=\"/test.js\"></script></head>"
           "<body><h1>Test</h1></body></html>")
      hickory/parse
      hickory/as-hiccup)
=>
("<!DOCTYPE html>"
 [:html
  {}
  [:head {} [:script {:src "/test.js"}]]
  #object [org.jsoup.nodes.Element 0x1647da75 "<body>\n <h1>Test</h1>\n</body>"]])

Environment:

  • hickory version: >0.7.1

Workaround:
Downgrade to hickory 0.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions