diff mbox series

[ovs-dev] dist-docs: Fix bugs in text to HTML conversion.

Message ID 20190510220243.11945-1-blp@ovn.org
State Accepted
Commit cfc06fb13d9c81bc138219ec2b1486090fa76c2f
Headers show
Series [ovs-dev] dist-docs: Fix bugs in text to HTML conversion. | expand

Commit Message

Ben Pfaff May 10, 2019, 10:02 p.m. UTC
This fixes two bugs.  First, & has a special meaning in the replacement
text for a sed "s" command, so this escapes it.  Second, this code
misprocessed bold or underlined &<>: >^H> would become &gt;^H&gt; which
would display as &gt&gt; in most browers.

Finally, this improves the HTML output so that bold ABC becomes <b>ABC</b>
instead of <b>A</b><b>B</b><b>C</b>.

Reported-by: Nicolas Bouliane <nbouliane@digitalocean.com>
Reported-at: https://twitter.com/nicboul/status/1126959264772259842
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
 build-aux/dist-docs | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Comments

Ben Pfaff June 10, 2019, 12:16 a.m. UTC | #1
On Fri, May 10, 2019 at 03:02:43PM -0700, Ben Pfaff wrote:
> This fixes two bugs.  First, & has a special meaning in the replacement
> text for a sed "s" command, so this escapes it.  Second, this code
> misprocessed bold or underlined &<>: >^H> would become &gt;^H&gt; which
> would display as &gt&gt; in most browers.
> 
> Finally, this improves the HTML output so that bold ABC becomes <b>ABC</b>
> instead of <b>A</b><b>B</b><b>C</b>.
> 
> Reported-by: Nicolas Bouliane <nbouliane@digitalocean.com>
> Reported-at: https://twitter.com/nicboul/status/1126959264772259842
> Signed-off-by: Ben Pfaff <blp@ovn.org>

This still needs a review from someone.
diff mbox series

Patch

diff --git a/build-aux/dist-docs b/build-aux/dist-docs
index 9f6ca7b2cbfc..f6b88ca2d04b 100755
--- a/build-aux/dist-docs
+++ b/build-aux/dist-docs
@@ -69,11 +69,29 @@  EOF
      GROFF_NO_SGR=1 man -l -Tutf8 $manpage | sed 's/.//g' > $manpage.txt
      (echo '<html><head><meta charset="UTF-8"></head><body><pre>'
       GROFF_NO_SGR=1 man -l -Tutf8 $manpage | sed '
-s/&/&amp;/g
-s/</&lt;/g
-s/>/&gt;/g
-s,\(.\)\1,<b>\1</b>,g
-s,_\(.\),<u>\1</u>,g'
+# Change bold and underline via backspacing into bracketing with control
+# characters.  We cannot directly translate them to HTML because <> need
+# to be escaped later.  (We cannot escape <> first because bold or
+# underlined escaped characters would be mis-processed.)
+s,\(.\)\1,\1,g
+s,_\(.\),\1,g
+
+# Drop redundant font changes, to keep from having every character have
+# a separate tag pair.
+s,,,g
+s,,,g
+
+# Escape special characters.
+s,&,\&amp;,g
+s,<,\&lt;,g
+s,>,\&gt;,g
+
+# Translate control characters to HTML.
+s,,<b>,g
+s,,</b>,g
+s,,<u>,g
+s,,</u>,g
+'
       echo '</pre></body></html>'
      ) > $manpage.html