Add space after any close tag to ensure that
tokens that would be rendered separate in HTML
get extracted as separate tokens in text.
Example:
`<p>First</p><p>Second</p>` --> `First Second`
NOT `FirstSecond`
singlefile.html contains a lot of large strings in the form of `data:`
URLs, which can be unnecessarily stored in full-text indices. Also,
large chunks of JavaScript shouldn't be indexed, either, as they pollute
search results for searches about JS functions, etc.
This commit takes a blanket approach of parsing singlefile.html as it is
read and only outputting text and selected textual attributes (like
`alt`) for indexing.