diff --git a/README.md b/README.md index aeb07aa3..44dd7096 100644 --- a/README.md +++ b/README.md @@ -23,39 +23,28 @@ curl -sSL 'https://get.archivebox.io' | sh # (or see pip/brew/Docker instruct Without active preservation effort, everything on the internet eventually dissapears or degrades. Archive.org does a great job as a free central archive, but they require all archives to be public, and they can't save every type of content. -*ArchiveBox is an open source tool that helps you archive web content on your own (or privately within an organization): save copies of browser bookmarks, preserve evidence for legal cases, backup photos from FB / Insta / Flickr, download your media from YT / Soundcloud / etc., snapshot research papers & academic citations, and more...* +*ArchiveBox is an open source tool that helps organizations and individuals archive web content and retain control over their data: save copies of browser bookmarks, preserve evidence for legal cases, backup photos from FB / Insta / Flickr, download your media from YT / Soundcloud / etc., snapshot research papers & academic citations, and more...* -> ➡️ *Use ArchiveBox as a [command-line package](#quickstart) and/or [self-hosted web app](#quickstart) on Linux, macOS, or in [Docker](#quickstart).* +> ➡️ *Use ArchiveBox on [Linux](#quickstart)/[macOS](#quickstart)/[Windows](#quickstart)/[Docker](#quickstart) as a [CLI tool](#usage), [self-hosted Web App](https://github.com/ArchiveBox/ArchiveBox/wiki/Publishing-Your-Archive), [`pip` library](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#python-shell-usage), or [one-off command](#static-archive-exporting).*
mkdir ~/archivebox; cd ~/archivebox # create a dir somewhere for your archivebox data
-# Get ArchiveBox with Docker Compose (recommended):
+# Option A: Get ArchiveBox with Docker Compose (recommended):
curl -sSL 'https://docker-compose.archivebox.io' > docker-compose.yml # edit options in this file as-needed
docker compose run archivebox init --setup
# docker compose run archivebox add 'https://example.com'
@@ -86,14 +75,14 @@ docker compose run archivebox init --setup
# docker compose up
-# Or use it as a plain Docker container:
+# Option B: Or use it as a plain Docker container:
docker run -it -v $PWD:/data archivebox/archivebox init --setup
# docker run -it -v $PWD:/data archivebox/archivebox add 'https://example.com'
# docker run -it -v $PWD:/data archivebox/archivebox help
# docker run -it -v $PWD:/data -p 8000:8000 archivebox/archivebox
-# Or install it with your preferred pkg manager (see Quickstart below for apt, brew, and more)
+# Option C: Or install it with your preferred pkg manager (see Quickstart below for apt, brew, and more)
pip install archivebox
archivebox init --setup
# archviebox add 'https://example.com'
@@ -101,14 +90,14 @@ archivebox init --setup
# archivebox server 0.0.0.0:8000
-# Or use the optional auto setup script to install it
+# Option D: Or use the optional auto setup script to install it
curl -sSL 'https://get.archivebox.io' | sh
+
+http://localhost:8000
to see your server's Web UI ➡️
http://localhost:8000
to see your server's Web UI ➡️
-
-
docker-compose.yml
file into a new empty directory (can be anywhere).
mkdir ~/archivebox && cd ~/archivebox
-curl -O 'https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/dev/docker-compose.yml'
+# Read and edit docker-compose.yml options as-needed after downloading
+curl -sSL 'https://docker-compose.archivebox.io' > docker-compose.yml
docker compose run archivebox init --setup
docker run -v $PWD:/data -p 8000:8000 archivebox/archivebox
# completely optional, CLI can always be used without running a server
# docker run -v $PWD:/data -it [subcommand] [--args]
+docker run -v $PWD:/data -it archivebox/archivebox help
pip3
.
+pip3
(or pipx
).
pip3 install archivebox
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
+archivebox help
pip-archive
-
apt
(Ubuntu/Debian)
+
apt
(Ubuntu/Debian/etc.)
- Add the ArchiveBox repository to your sources.
@@ -286,6 +291,7 @@ archivebox init --setup # if any problems, install with pip instead
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
+archivebox help
@@ -296,7 +302,7 @@ See the debian-a
-
brew
(macOS)
+
brew
(macOS only)
- Install Homebrew on your system (if not already installed).
@@ -314,6 +320,7 @@ archivebox init --setup # if any problems, install with pip instead
archivebox server 0.0.0.0:8000
# completely optional, CLI can always be used without running a server
# archivebox [subcommand] [--args]
+archivebox help
@@ -435,7 +442,7 @@ For more discussion on managed and paid hosting options see here:
-sqlite3 ./index.sqlite3 # run SQL queries on your index
-archivebox shell # explore the Python API in a REPL
-ls ./archive/*/index.html # or inspect snapshots on the filesystem
+archivebox shell # explore the Python library API in a REPL
+sqlite3 ./index.sqlite3 # run SQL queries directly on your index
+ls ./archive/*/index.html # or inspect snapshot data directly on the filesystem
@@ -525,12 +536,16 @@ docker run -v $PWD:/data -it archivebox/archivebox archivebox manage createsuper
docker run -v $PWD:/data -it -p 8000:8000 archivebox/archivebox
-http://localhost:8000
to see your server's Web UI ➡️
+
archivebox config --set PUBLIC_ADD_VIEW=True # allow guests to submit URLs
archivebox config --set PUBLIC_SNAPSHOTS=True # allow guests to see snapshot content
archivebox config --set PUBLIC_INDEX=True # allow guests to see list of all snapshots
+# or
+docker compose run archivebox config --set ...
# restart the server to apply any config changes
@@ -697,11 +712,14 @@ CURL_USER_AGENT="Mozilla/5.0 ..."
## Dependencies
-To achieve high-fidelity archives in as many situations as possible, ArchiveBox depends on a variety of 3rd-party tools that specialize in extracting different types of content.
+To achieve high-fidelity archives in as many situations as possible, ArchiveBox depends on a variety of 3rd-party libraries and tools that specialize in extracting different types of content.
+
+> Under-the-hood, ArchiveBox uses [Django](https://www.djangoproject.com/start/overview/) to power its [Web UI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#ui-usage) and [SQlite](https://www.sqlite.org/locrsf.html) + the filesystem to provide [fast & durable metadata storage](https://www.sqlite.org/locrsf.html) w/ [determinisitc upgrades](https://stackoverflow.com/a/39976321/2156113). ArchiveBox bundles industry-standard tools like [Google Chrome](https://github.com/ArchiveBox/ArchiveBox/wiki/Chromium-Install), [`wget`, `yt-dlp`, `readability`, etc.](#dependencies) internally, and its operation can be [tuned, secured, and extended](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration) as-needed for many different applications.
+