Update README.md

This commit is contained in:
Nick Sweeting 2017-05-29 19:05:20 -05:00 committed by GitHub
parent 07db61bf4c
commit 21b28d392c

View file

@ -37,20 +37,20 @@ google-chrome --version && which wget && which python3 && echo "[√] All depend
**2. Run the archive script:** **2. Run the archive script:**
1. Download your export file e.g. `ril_export.html` from https://getpocket.com/export 1. Get your HTML export file from [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Chrome Bookmarks](https://support.google.com/chrome/answer/96816?hl=en), [Firefox Bookmarks](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), or [Safari Bookmarks](http://i.imgur.com/AtcvUZA.png)
2. Clone the repo `git clone https://github.com/pirate/pocket-archive-stream` 2. Clone this repo `git clone https://github.com/pirate/pocket-archive-stream`
3. `cd pocket-archive-stream/` 3. `cd pocket-archive-stream/`
4. `./archive.py ~/Downloads/ril_export.html [pocket|pinboard|bookmarks]` 4. `./archive.py ~/Downloads/exported_file.html [pocket|pinboard|chrome]`
It produces a folder `pocket/` containing an `index.html`, and archived copies of all the sites, It produces a folder `archive/` containing an `index.html`, and archived copies of all the sites,
organized by timestamp. For each sites it saves: organized by starred timestamp. For each sites it saves:
- wget of site, e.g. `en.wikipedia.org/wiki/Example.html` with .html appended if not present - wget of site, e.g. `en.wikipedia.org/wiki/Example.html` with .html appended if not present
- `sreenshot.png` 1440x900 screenshot of site using headless chrome - `sreenshot.png` 1440x900 screenshot of site using headless chrome
- `output.pdf` Printed PDF of site using headless chrome - `output.pdf` Printed PDF of site using headless chrome
- `archive.org.txt` A link to the saved site on archive.org - `archive.org.txt` A link to the saved site on archive.org
You can tweak parameters like screenshot size, file paths, timeouts, etc. in `archive.py`. You can tweak parameters like screenshot size, file paths, timeouts, dependencies, at the top of `archive.py`.
You can also tweak the outputted html index in `index_template.html`. It just uses python You can also tweak the outputted html index in `index_template.html`. It just uses python
format strings (not a proper templating engine like jinja2), which is why the CSS is double-bracketed `{{...}}`. format strings (not a proper templating engine like jinja2), which is why the CSS is double-bracketed `{{...}}`.
@ -80,14 +80,14 @@ will run fast subsequent times because it only downloads new links that haven't
## Publishing Your Archive ## Publishing Your Archive
The archive is suitable for serving on your personal server, you can upload the The archive is suitable for serving on your personal server, you can upload the
archive to `/var/www/pocket` (or pinboard) and allow people to access your saved copies of sites. archive to `/var/www/archive` and allow people to access your saved copies of sites.
Just stick this in your nginx config to properly serve the wget-archived sites: Just stick this in your nginx config to properly serve the wget-archived sites:
```nginx ```nginx
location /pocket/ { location /archive/ {
alias /var/www/pocket/; alias /var/www/archive/;
index index.html; index index.html;
autoindex on; autoindex on;
try_files $uri $uri/ $uri.html =404; try_files $uri $uri/ $uri.html =404;
@ -96,7 +96,7 @@ location /pocket/ {
Make sure you're not running any content as CGI or PHP, you only want to serve static files! Make sure you're not running any content as CGI or PHP, you only want to serve static files!
Urls look like: `https://sweeting.me/pocket/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem` Urls look like: `https://sweeting.me/archive/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem`
## Info ## Info