Nick Sweeting
4a5d607296
move logging_util into archivebox.misc subfolder
2024-11-18 19:08:49 -08:00
Nick Sweeting
de2ab43f7f
switch .is_dir and .exists for os.access to avoid PermissionError on startup
2024-10-08 03:02:34 -07:00
Nick Sweeting
cf1ea8f80f
improve config loading of TMP_DIR, LIB_DIR, move to separate files
2024-10-07 23:45:11 -07:00
Nick Sweeting
b913e6f426
rename OUTPUT_DIR to DATA_DIR
CodeQL / Analyze (python) (push) Waiting to run
Build Debian package / build (push) Waiting to run
Build Docker image / buildx (push) Waiting to run
Build Homebrew package / build (push) Waiting to run
Build GitHub Pages website / build (push) Waiting to run
Build GitHub Pages website / deploy (push) Blocked by required conditions
Run linters / lint (push) Waiting to run
Build Pip package / build (push) Waiting to run
Run tests / python_tests (ubuntu-22.04, 3.11) (push) Waiting to run
Run tests / docker_tests (push) Waiting to run
2024-09-30 17:44:18 -07:00
Nick Sweeting
363a499289
move util.py into misc folder
2024-09-30 17:25:15 -07:00
Nick Sweeting
dfca4b13b2
move system.py into misc folder
2024-09-30 17:13:55 -07:00
Nick Sweeting
3e5b6ddeae
move config into dedicated global app
2024-09-30 15:59:05 -07:00
Nick Sweeting
6a6ae7468e
fix lint errors
2024-04-25 21:36:11 -07:00
Nick Sweeting
beb3932d80
replace uses of URL_REGEX with find_all_urls to handle markdown better
2024-04-24 17:45:45 -07:00
jim winstead
5478d13d52
Add generic_jsonl parser
...
Resolves #1369
2024-03-14 15:42:29 -07:00
Nick Sweeting
16d278fbdb
Merge pull request #1168 from mAAdhaTTah/add-readwise-reader
2023-09-03 21:24:49 -07:00
Ross Williams
c039ef05b3
Fix hyphen placement in util.URL_REGEX
...
Incorrect hyphen placement in `URL_REGEX` was allowing it to match more
characters than intended. In a regex character class, a literal hyphen
can only appear as the first character in the class, or it will be
interpreted as the delimiter of a range of characters.
The issue fixed here caused the range of characters from `[$-_]`
be treated as valid URL characters, instead of the intended set of three
characters `[-_$]`. The incorrect range interpretation inadvertantly
included most ASCII punctuation, most importantly the angle brackets,
square brackets, and single quote that the expression uses
to mark the end of a match.
This causes the expression to match a URL that has a "hostname" portion
beginning with one of the intended "stop parsing" characters. For
example:
```
https://<b>www</b>.example.com/ # MATCHES but should not
https://[for example] # MATCHES but should not
scheme='https://' # MATCHES, including final quote, but should not
```
Some test cases have been added to the `URL_REGEX` assert in
archivebox.parsers to cover this possibility.
2023-08-08 15:24:16 -04:00
mAAdhaTTah
181501fd36
Add Readwise Reader API parser
...
Implemented similar to the Pocket API.
2023-07-02 11:20:58 -04:00
SnZ
2db830c6a8
Method typo?
...
Fixes '[Errno 2] No such file or directory' error during add
2022-11-20 01:51:16 +01:00
Nick Sweeting
c5fc3e1e65
--ammend
2022-05-09 23:59:27 -07:00
Nick Sweeting
a6767671fb
append content of referenced files to imports
2022-05-09 21:21:39 -07:00
Nick Sweeting
f6d6a06c78
always show all totals in log output
2022-05-09 21:21:26 -07:00
Nick Sweeting
38e54b93fe
allow parsing to continue even when fetching URL contents fails
2022-05-09 19:56:24 -07:00
Nick Sweeting
a9986f1f05
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
2021-04-10 04:21:36 -04:00
Nick Sweeting
5d3a03b299
use stderr and hint in case of parser returning no urls instead of bare exception
2021-03-31 01:39:01 -04:00
Nick Sweeting
8ce93ff787
use KEY, NAME, and PARSER to define parsers instead of hardcoding in init
2021-03-31 01:05:49 -04:00
Nick Sweeting
36f0646501
Merge pull request #669 from FliegendeWurst/fix-issue-235
...
add command: --parser option (fixes #235 )
2021-03-31 00:53:47 -04:00
FliegendeWurst
60bd9a902e
add command: --parser option
2021-03-28 10:09:11 +02:00
Nick Sweeting
5fb9ca389f
check more url parsing invariants on startup
2021-03-27 03:57:22 -04:00
mAAdhaTTah
ac7ad9e942
Add parser for Pocket API
...
Pass a url like `pocket://Username` to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:
```
POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}
```
`POCKET_ACCESS_TOKENS` MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.
2020-12-04 22:54:39 -05:00
Emmanuel Hainry
aebc83659d
Add parser for Wallabag Atom feeds
2020-10-18 11:20:07 +02:00
Angel Rey
2c62abb270
Replaced os.path in init parsers
2020-10-02 15:46:39 -05:00
apkallum
594d9e49ce
first attempt to migrate to Pathlib
2020-09-17 09:09:52 -05:00
Nick Sweeting
15efb2d5ed
new generic_html parser for extracting hrefs
2020-08-18 08:29:05 -04:00
Nick Sweeting
e3ac4c2405
htmldecode downloaded sources before parsing for links
2020-08-18 08:23:20 -04:00
Cristian
c073ea141d
feat: Initial oneshot command proposal
2020-07-29 11:19:06 -05:00
Cristian
6006b4f93b
refactor: Organize code to remove flake8 issues
2020-07-24 12:25:25 -05:00
Cristian
a5550b2105
fix: Rename logging folder to avoid naming conflicts (and circular import issues)
2020-07-22 11:02:13 -05:00
Cristian
f4d1b5121e
refactor: Move logging.py to main module to avoid circular import issues
2020-07-17 18:00:04 -05:00
Nick Sweeting
d3bfa98a91
fix depth flag and tweak logging
2020-07-13 11:26:34 -04:00
Nick Sweeting
cb67b09f9d
Merge branch 'master' into django
2020-06-25 21:30:29 -04:00
Nick Sweeting
204de37eb9
fix parsing errors for older archive index formats
2019-05-01 02:28:48 -04:00
Nick Sweeting
95007d9137
split up utils into separate files
2019-04-30 23:13:04 -04:00
Nick Sweeting
1b8abc0961
move everything out of legacy folder
2019-04-27 17:26:24 -04:00