ArchiveBox/archivebox/index
Ross Williams 310b4d1242 Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
..
__init__.py add proper support for URL_WHITELIST instead of using negation regexes 2021-07-06 23:42:00 -04:00
csv.py split up utils into separate files 2019-04-30 23:13:04 -04:00
html.py Add htmltotext extractor 2023-10-23 21:42:32 -04:00
json.py fix extra arg 2021-04-13 02:21:51 -04:00
schema.py Add htmltotext extractor 2023-10-23 21:42:32 -04:00
sql.py rename TAG_SEPARATORS to TAG_SEPARATOR_PATTERN 2022-01-06 14:14:41 +00:00