Nick Sweeting
b3107ab830
move final legacy config to plugins and fix archivebox config cmd and add search opt
2024-10-21 02:56:00 -07:00
Nick Sweeting
18474f452b
move config moved out of legacy files and better version output
2024-09-30 23:52:00 -07:00
Nick Sweeting
363a499289
move util.py into misc folder
2024-09-30 17:25:15 -07:00
Nick Sweeting
3e5b6ddeae
move config into dedicated global app
2024-09-30 15:59:05 -07:00
Nick Sweeting
bb65b2dbec
move almost all config into new archivebox.CONSTANTS
CodeQL / Analyze (python) (push) Waiting to run
Build Debian package / build (push) Waiting to run
Build Docker image / buildx (push) Waiting to run
Build Homebrew package / build (push) Waiting to run
Build GitHub Pages website / build (push) Waiting to run
Build GitHub Pages website / deploy (push) Blocked by required conditions
Run linters / lint (push) Waiting to run
Build Pip package / build (push) Waiting to run
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Waiting to run
Run tests / docker_tests (push) Waiting to run
2024-09-25 05:10:09 -07:00
Nick Sweeting
ee5bec6a10
flip link_archive exception throw order so real exception is easier to read at the bottom
2024-09-25 00:39:49 -07:00
Nick Sweeting
c9c163efed
begin migrating search backends to new plugin system
2024-09-24 02:13:01 -07:00
Nick Sweeting
52386d9c16
run all blocking commands in background threads and show nice UI messages as confirmation
2024-09-06 02:54:22 -07:00
Nick Sweeting
cbf2a8fdc3
rename datetime fields to _at, massively improve ABID generation safety and determinism
2024-09-04 23:42:36 -07:00
Nick Sweeting
d0fefc0279
add chunk_size=500 to more iterator calls
2024-08-27 19:28:00 -07:00
Nick Sweeting
9b1659c72f
make created_by_id autoapply to any ArchiveResults created under Snapshot
Build GitHub Pages website / build (push) Has been cancelled
Run linters / lint (push) Has been cancelled
Build Debian package / build (push) Has been cancelled
Build Docker image / buildx (push) Has been cancelled
Build Homebrew package / build (push) Has been cancelled
CodeQL / Analyze (python) (push) Has been cancelled
Build Pip package / build (push) Has been cancelled
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Has been cancelled
Run tests / docker_tests (push) Has been cancelled
Build GitHub Pages website / deploy (push) Has been cancelled
2024-08-20 19:43:07 -07:00
Nick Sweeting
0420662174
switch everywhere to use Snapshot.pk and ArchiveResult.pk instead of id
2024-05-13 05:12:12 -07:00
Nick Sweeting
457c42bf84
load EXTRACTORS dynamically using importlib.import_module
2024-05-11 22:28:59 -07:00
Nick Sweeting
8b9bc3dec8
minor fixes
2024-02-22 04:50:22 -08:00
Nick Sweeting
6a4e568d1b
new archivebox update speed improvements
2024-02-22 04:50:22 -08:00
Nick Sweeting
f0033f75d0
config.py lint fixes
2023-11-14 02:07:35 -08:00
Nick Sweeting
a680724367
Merge branch 'dev' into search_index_extract_html_text
2023-10-27 23:09:28 -07:00
Ross Williams
310b4d1242
Add htmltotext extractor
...
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
Ross Williams
2076474252
Drop use of TypeAlias to maintain Python 3.9 compat
...
TypeAlias annotation was introduced in Python 3.10, and is not strictly
necessary. Drop use of it to maintain Python 3.9 compatibility.
2023-08-02 10:56:48 -04:00
Ross Williams
b44f7e68b1
Add URL-specific method allow/deny lists
...
Allows enabling only allow-listed extractors or disabling specific
deny-listed extractors for a regular expression matched against an added
site's URL.
2023-08-02 09:36:40 -04:00
Sascha Ißbrücker
7bf4f40da0
just use out_dir
2023-05-29 10:03:49 +02:00
Sascha Ißbrücker
40c122515a
fix: make oneshot command return successful exist code
2023-05-29 10:01:27 +02:00
Joseph Turian
07de4a79a1
Merge branch 'dev' into feature/kludge-984-UTF8-bug
2022-12-20 11:39:01 +01:00
Joseph Turian
081a12b079
Add ts
2022-09-12 21:32:47 +00:00
Joseph Turian
daef48e59b
flake8
2022-09-12 21:31:33 +00:00
Joseph Turian
983f485cc0
flake8
2022-09-12 21:29:43 +00:00
Joseph Turian
f5f7aff3b4
Added yt-dlp everywhere
2022-09-12 20:34:02 +00:00
Joseph Turian
2b58cce43f
Attempted to warn on #984 and #1014
2022-09-11 12:19:16 +02:00
papersnake
de8e22efb7
improve title extractor
2022-02-08 23:17:52 +08:00
Nick Sweeting
4715ace7dd
ignore BaseException lgtm errors
2021-05-31 20:59:05 -04:00
Nick Sweeting
62078a77f8
show run duration after each archived link in cli output
2021-04-10 07:52:01 -04:00
Nick Sweeting
a9986f1f05
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
2021-04-10 04:21:36 -04:00
Nick Sweeting
084cf7ff51
add more explanation about snapshot.save timestamp bump
2021-02-17 13:34:46 -05:00
Nick Sweeting
c95698e608
bump Snapshot.updated time after each extractor, change extractor order
2021-02-16 15:52:18 -05:00
Dan Arnfield
5420903102
Refactor should_save_extractor
methods to accept overwrite
parameter
2021-01-21 15:56:32 -06:00
Cristian
275ad22db7
refactor: Remove skip_index
from archive related functions
2020-12-08 18:42:25 -05:00
Cristian
f6c73f9aeb
fix: Issue with oneshot command
2020-12-08 18:42:25 -05:00
JDC
7903db6dfb
Add ArchiveResult Manager and sorted indexable filter
2020-12-06 01:13:39 +02:00
JDC
b1f70b2197
Initial implementation
2020-12-06 01:12:45 +02:00
Cristian
33182fd53c
fix: Add missing assignation
2020-11-04 15:07:45 -05:00
Cristian
d064a3eeff
fix: Handle case when update tries to re-add a link that is not in the sql index
2020-11-04 15:02:54 -05:00
Cristian
f292cface2
fix: Add condition for oneshot when archiving links
2020-11-04 14:40:44 -05:00
Cristian
4484491fb7
feat: Create ArchiveResult after finishing an extractor process
2020-11-04 11:22:55 -05:00
Angel Rey
ce71747538
replaced os.path in init extractors
2020-10-02 15:46:39 -05:00
Cristian
7d3767b882
fix: oneshot command not running extractors
2020-09-24 12:56:16 -05:00
Angel Rey
852e3c9cff
Added headers extractor
2020-09-23 11:07:00 -05:00
ttimasdf
357b677363
fix: add mercury-parser to extractors list
2020-09-22 18:44:12 -05:00
Cristian
b18bbf8874
test: Fix tests post-rebase
2020-09-17 09:09:52 -05:00
Cristian
50f3f16203
lint: Remove unused import
2020-09-15 08:05:46 -05:00
Cristian
0a83392cbf
fix: Replace any
typing with Union[Iterable[Link], QuerySet] in archive_links
2020-09-15 08:05:46 -05:00