mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2025-05-27 05:04:13 -04:00
Add script to remove entries from index
This commit is contained in:
parent
3ef4fa2387
commit
75c062f33e
3 changed files with 60 additions and 1 deletions
|
@ -388,9 +388,13 @@ Open an [issue](https://github.com/pirate/bookmark-archiver/issues) with a descr
|
|||
**Lots of broken links from the index:**
|
||||
|
||||
Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots.
|
||||
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/bookmark-archiver/issues)
|
||||
If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/bookmark-archiver/issues**
|
||||
with some of the URLs that failed to be archived and I'll investigate.
|
||||
|
||||
**Removing unwanted links from the index:**
|
||||
|
||||
If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control).
|
||||
|
||||
### Hosting the Archive
|
||||
|
||||
If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue