docs: restructure documentation

This rewrite follows the principles of https://diataxis.fr/

Co-authored-by: Erik Michelson <github@erik.michelson.eu>
Signed-off-by: Philip Molares <philip.molares@udo.edu>
Signed-off-by: Erik Michelson <github@erik.michelson.eu>
This commit is contained in:
Philip Molares 2023-07-02 22:31:04 +02:00 committed by Tilman Vatteroth
parent e0dd24ed29
commit e07cd62596
68 changed files with 1163 additions and 315 deletions

View file

@ -0,0 +1,64 @@
# API Authentication
!!! info "Design Document"
This is a design document, explaining the design and vision for a HedgeDoc 2
feature. It is not a user guide and may or may not be fully implemented.
## Public API
All requests to the public API require authentication using a [bearer token][bearer-token].
This token can be generated using the profile page in the frontend
(which in turn uses the private API to generate the token).
### Token generation
When a new token is requested via the private API, the backend generates a 64 bytes-long secret of
cryptographically secure data and returns it as a base64url-encoded string,
along with an identifier. That string can then be used by clients as a bearer token.
A SHA-512 hash of the secret is stored in the database. To validate tokens, the backend computes
the hash of the providedsecret and checks it against the stored hash for the provided identifier.
#### Choosing a hash function
Unfortunately, there does not seem to be any explicit documentation about our exact use-case.
Most docs describe classic password-saving scenarios and recommend bcrypt, scrypt or argon2.
These hashing functions are slow to stop brute-force or dictionary attacks, which would expose
the original, user-provided password, that may have been reused across multiple services.
We have a very different scenario:
Our API tokens are 64 bytes of cryptographically strong pseudorandom data.
Brute-force or dictionary attacks are therefore virtually impossible, and tokens are not
reused across multiple services.
We therefore need to only guard against one scenario:
An attacker gains read-only access to the database. Saving only hashes in the database prevents the
attacker from authenticating themselves as a user. The hash-function does not need to be very slow,
as the randomness of the original token prevents inverting the hash. The function actually needs to
be reasonably fast, as the hash must be computed on every request to the public API.
SHA-512 (or alternatively SHA3) fits this use-case.
## Private API
The private API uses a session cookie to authenticate the user.
Sessions are handled using [passport.js](https://www.passportjs.org/).
The backend hands out a new session token after the user has successfully authenticated
using one of the supported authentication methods:
- Username & Password (`local`)
- LDAP
- SAML
- OAuth2
- GitLab
- GitHub
- Facebook
- Twitter
- Dropbox
- Google
The `SessionGuard`, which is added to each (appropriate) controller method of the private API,
checks if the provided session is still valid and provides the controller method
with the correct user.
[bearer-token]: https://datatracker.ietf.org/doc/html/rfc6750

View file

@ -0,0 +1,101 @@
# Config
!!! info "Design Document"
This is a design document, explaining the design and vision for a HedgeDoc 2
feature. It is not a user guide and may or may not be fully implemented.
The configuration of HedgeDoc 2 is handled entirely by environment variables.
Most of these variables are prefixed with `HD_` (for HedgeDoc).
NestJS - the framework we use - is reading the variables from the environment and also from
the `.env` file in the root of the project.
## How the config code works
The config of HedgeDoc is split up into **nine** different modules:
`app.config.ts`
: General configuration of the app
`auth.config.ts`
: Which authentication providers are available and which options are set
`csp.config.ts`
: Configuration for [Content Security Policy][csp]
`customization.config.ts`
: Config to customize the instance and set instance specific links
`database.config.ts`
: Which database should be used
`external-services.config.ts`
: Which external services are activated and where can they be called
`hsts.config.ts`
: Configuration for [HTTP Strict-Transport-Security][hsts]
`media.config.ts`
: Where media files are being stored
`note.config.ts`
: Configuration for notes
Each of those files (except `auth.config.ts` which is discussed later) consists of three parts:
1. An interface
2. A Joi schema
3. A default export
### Interface
The interface just describes which options the configuration has and how the rest of HedgeDoc can
use them. All enums that are used in here are put in their own files with the extension `.enum.ts`.
### Joi Schema
We use [Joi][joi] to validate each provided configuration to make sure the configuration of the user
is sound and provides helpful error messages otherwise.
The most important part here is that each value ends with `.label()`. This names the
environment variable that corresponds to each config option. It's very important that each config
option is assigned the correct label to have meaningful error messages that benefit the user.
Everything else about how Joi works and how you should write schemas can
be read in [their documentation][joi-doc].
### A default export
The default exports are used by NestJS to provide the values to the rest of the application.
We mostly do four things here:
1. Populate the config interface with environment variables, creating the config object.
2. Validate the config object against the Joi schema.
3. Polish the error messages from Joi and present them to the user (if any occur).
4. Return the validated config object.
## How `auth.config.ts` works
Because it's possible to configure some authentication providers multiple times
(e.g. multiple LDAPs or GitLabs), we use user defined environment variable names.
With the user defined names it's not possible to put the correct labels in the schema
or build the config objects as we do in every other file.
Therefore, we have two big extra steps in the default export:
1. To populate the config object we have some code at the top of the default export to gather all
configured variables into arrays.
2. The error messages are piped into the util method `replaceAuthErrorsWithEnvironmentVariables`.
This replaces the error messages of the form `gitlab[0].providerName`
with `HD_AUTH_GITLAB_<nameOfFirstGitlab>_PROVIDER_NAME`. For this the util function gets
the error, the name of the config option (e.g `'gitlab'`), the approriate prefix
(e.g. `'HD_AUTH_GITLAB_'`), and an array of the user defined names.
## Mocks
Some config files also have a `.mock.ts` file which defines the configuration for the e2e tests.
Those files just contain the default export and return the mock config object.
[csp]: https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP
[hsts]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Strict-Transport-Security
[joi]: https://joi.dev/
[joi-doc]: https://joi.dev/api

View file

@ -0,0 +1,18 @@
# Events
!!! info "Design Document"
This is a design document, explaining the design and vision for a HedgeDoc 2
feature. It is not a user guide and may or may not be fully implemented.
In HedgeDoc 2, we use an event system based on [EventEmitter2][eventemitter2].
It's used to reduce circular dependencies between different services and inform these services
about changes.
HedgeDoc's system is basically [the system NestJS offers][nestjs/eventemitter].
The config for the `EventEmitterModule` is stored in `events.ts` and
exported as `eventModuleConfig`. In the same file enums for the event keys are defined.
Each of these events is expected to be sent with an additional value.
In the enum definition a comment should tell you what exactly this value should be.
[eventemitter2]: https://github.com/EventEmitter2/EventEmitter2
[nestjs/eventemitter]: https://docs.nestjs.com/techniques/events

View file

@ -0,0 +1,39 @@
# Core concepts
Core concepts explain the internal structure of HedgeDoc by providing
background information and explanations. They are especially useful for contributing to HedgeDoc.
<!-- markdownlint-disable no-inline-html -->
<div class='topic-container'>
<a href='/concepts/notes/'>
<div class='topic'>
<span>📝</span>
<span>Notes</span>
</div>
</a>
<a href='/concepts/user-profiles/'>
<div class='topic'>
<span>🙎</span>
<span>User Profiles</span>
</div>
</a>
<a href='/concepts/config/'>
<div class='topic'>
<span>🛠️</span>
<span>Config</span>
</div>
</a>
<a href='/concepts/api-auth/'>
<div class='topic'>
<span>🤖️</span>
<span>API Auth</span>
</div>
</a>
<a href='/concepts/events/'>
<div class='topic'>
<span>🎩</span>
<span>Events</span>
</div>
</a>
</div>
<!-- markdownlint-enable no-inline-html -->

View file

@ -0,0 +1,109 @@
# Notes
!!! info "Design Document"
This is a design document, explaining the design and vision for a HedgeDoc 2
feature. It is not a user guide and may or may not be fully implemented.
Each note in HedgeDoc 2 contains the following information:
- publicId (`b604x5885k9k01bq7tsmawvnp0`)
<!-- markdownlint-disable proper-names -->
- a list of aliases (`[hedgedoc-2, hedgedoc-next]`)
<!-- markdownlint-enable proper-names -->
- groupPermissions
- userPermissions
- viewCount (`0`)
- owner
- revisions
- authorColors
- historyEntries
- description (`All you never wanted to know about notes`)
- title (`Notes`)
- tags (`[features, cool, update]`)
- version
The `publicId` is the default possibility of identifying a note. It will be a randomly generated
128-bit value encoded with [base32-encode][base32-encode] using the crockford variant and converted
to lowercase. This variant of base32 is used, because that results in ids that only use one case of
alpha-numeric characters and other url safe characters. We convert the id to lowercase, because we
want to minimize case confusion.
`aliases` are the other way of identifying a note. There can be any number of them, and the owner
of the note is able to add or remove them. All aliases are just strings (especially to accommodate
the old identifier from HedgeDoc 1 [see below](#conversion-of-hedgedoc-1-notes)), but new aliases
added with HedgeDoc 2 will only allow characters matching this regex: `[a-z0-9\-_]`. This is done to
once again prevent case confusion. One of the aliases can be set as the primary alias, which will be
used as the identifier for the history entry.
`groupPermissions` and `userPermissions` each hold a list of the appropriate permissions.
Each permission holds a reference to a note and a user/group and specify what the user/group
is allowed to do.
Each permission is additive, that means a user that has only the right to read a note via a group,
but the right to write via a different group or directly for his user, is able to write in the note.
The `viewCount` is a simple counter that holds how often the read-only view of the note in question
was requested.
`owner` is the user that created the note or later got ownership of the note. The current owner is
able to change the owner of the note to someone else. The owner of a note is the only person that
can perform the following actions:
- delete the note
- modify `aliases`
- remove all `revisions`
The `revisions` hold all revisions of the note. These are the changes to the note content and by
whom they were performed.
The `authorColors` each specify for the tuple user and note which color should be used
to highlight them.
The `historyEntries` hold the history entries this note is referenced in. They are mainly here
for the purpose of deleting the history entries on note deletion.
`description`, `tags` and `title` are each information specified in the [frontmatter][frontmatter]
of the note. They are extracted and saved in the database to allow the history page to show them and
do a search for tags without having to do a full-text search or having to parse the tags of
each note on search.
While `description` and `tags` are only specified by the [frontmatter][frontmatter], the title is
- the content of the *title* field of the [frontmatter][frontmatter] of the note
- **OR** the content of the *title* field in the *opengraph* field of the [frontmatter][frontmatter]
of the note
- **OR** the first level 1 heading of the note
which ever of these is the first to not be unspecified.
All mentioned fields are extracted from the note content by the backend on save or update.
`version` specifies if a note is an old HedgeDoc 1 note, or a new HedgeDoc 2 note.
This is mainly used to redirect old notes form <https://md.example.org/noteid>
to <https://md.example.org/n/noteid>.
## Deleting Notes
- The owner of a note may delete it.
- By default, this also removes all revisions and all files that were uploaded to that note.
- The owner may choose to skip deleting associated uploads, leaving them without a note.
- The frontend should show a list of all uploads that will be affected
and provide a method of skipping deletion.
- The owner of a note may delete all revisions. This effectively purges the edit
history of a note.
## Conversion of HedgeDoc 1 notes
First we want to define some terms of the HedgeDoc 1 notes:
- **noteId**: This refers to the auto-generated id for new notes.
(<https://demo.hedgedoc.org/Q_Iz5T_lQWGYxne0sbMtwg>)
- **shortId**: This refers to the auto-generated short id which is used for "published" notes and
slide presentation mode. (<https://demo.hedgedoc.org/s/61ZHI6HGE>)
- **alias**: This refers to user-defined URLs for notes on instances with Free-URL mode enabled.
(<https://md.kif.rocks/lowercase>)
The noteId, shortId and alias of each HedgeDoc 1 note are saved as HedgeDoc 2 aliases.
Each note gets a newly generated publicId.
[frontmatter]: https://jekyllrb.com/docs/front-matter/
[base32-encode]: https://www.npmjs.com/package/base32-encode

View file

@ -0,0 +1,99 @@
# User Profiles and Authentication
!!! info "Design Document"
This is a design document, explaining the design and vision for a HedgeDoc 2
feature. It is not a user guide and may or may not be fully implemented.
Each user in HedgeDoc 2 has a profile
which contains the following information:
- username (`janedoe`)
- display name (`Jane Doe`)
- email address, optional (`janedoe@example.com`)
- profile picture, optional
- the date the user was created
- the date the profile was last updated
HedgeDoc 2 supports multiple authentication methods per user.
These are called *identities* and each identity is backed by an
auth provider (like OAuth, SAML, LDAP or internal auth).
One of a users identities may be marked as *sync source*.
This identity is used to automatically update profile attributes like the
display name or profile picture on every login. If a sync source exists, the
user can not manually edit their profile information.
If an external provider was used to create the account,
it is automatically used as sync source.
The administrator may globally set one or more auth providers as sync source,
e.g. to enforce that all profile information comes from the corporate
LDAP and is the same across multiple applications.
If global sync sources exist, new accounts can only be created using
these auth providers. The auth provider that was used to create the account
is automatically set as sync source and cannot be changed by the user.
This effectively pins the account to this provider.
## Example: Corporate LDAP
The administrator wants to allow users to log in via the corporate LDAP
and Google. Login must only be possible for users present in LDAP and
all users must be displayed as they are in the LDAP.
The admin therefore sets up two login providers:
- corporate LDAP, marked as global sync source
- Google OAuth login
If a new user tries to log in via Google, they will not be found in the
database. The frontend detects that a global sync source exists and
suggests logging in via LDAP first.
After a new user created their account by logging in via LDAP, they can use
the 'add a new login method' feature in their profile page to link their
Google account and use it to login afterwards.
## Example: Username Conflict
HedgeDoc is configured with two auth providers.
- A user logs in using auth provider A.
- The backend receives the profile information from provider A and notices that the username
in the profile already exists in the database, but no identity for this provider-username
combination exists.
- The backend creates a new user with another username to solve the username conflict.
- The frontend warns the user that the username provided by the auth provider is already taken
and that another username has been generated. It also offers to instead link the new auth provider
(in this case A) with the existing auth provider (in this case B).
- If the user chooses the latter option, the frontend sends a request to delete the newly created
user to the backend.
- The user can then log in with auth provider B and link provider A using the "link auth provider"
feature in the profile page.
### Handling of sync sources and username conflicts
#### Global sync sources
If at the time of logging in with auth provider A, *only* A is configured as a *global* sync source,
the backend cannot automatically create a user with another username.
This is because:
- Creating new accounts is only possible with a sync source auth provider.
- Setting an auth provider as sync-source entails that profile information the auth provider
provides must be saved into the local HedgeDoc database.
- As the username the auth provider provides already exists in the database, a new user cannot
be created with that username.
In this case, the frontend should show the use a notice that they should contact an admin
to resolve the issue.
!!! warning
Admins must ensure that usernames are unique across all auth providers set as a global sync
source. Otherwise, if e.g. in both LDAPs configured as sync source a user `johndoe` exists,
only the first that logs in can use HedgeDoc.
#### Local sync sources
If auth provider A is configured as a sync source by the user, syncing is automatically disabled,
and a notice is shown. Re-enabling the sync source is not possible until the username conflict is
resolved, e.g. by changing the username in the auth provider.