ArchiveBox

mirror of https://github.com/ArchiveBox/ArchiveBox.git synced 2025-05-24 19:54:25 -04:00

History

Ross Williams 310b4d1242 Add htmltotext extractor Saves HTML text nodes and selected element attributes in `htmltotext.txt` for each Snapshot. Primarily intended to be used for search indexing.		2023-10-23 21:42:32 -04:00
..
backends	bail out on sonic indexing after 5 errors	2021-04-10 05:18:03 -04:00
__init__.py	refactor: Remove setup_django from search	2020-12-11 16:43:48 -05:00
utils.py	Add htmltotext extractor	2023-10-23 21:42:32 -04:00