From 648552a73643f9c2f79dce52584e3ce154c14a1b Mon Sep 17 00:00:00 2001 From: Tim Van Baak Date: Mon, 19 Jun 2023 13:11:05 -0700 Subject: [PATCH] Expand README --- README.md | 58 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 33 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index e3718f7..68fda11 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,12 @@ # intake -`intake` is an arbitrary feed aggregator. +Intake is an arbitrary feed aggregator that generalizes the concept of a feed. Rather than being restricted to parsing items out of an RSS feed, Intake provides a middle layer of executing arbitrary programs that conform to a JSON-based specification. An Intake source can parse an RSS feed, but it can also scrape a website without a feed, provide additional logic to filter or annotate feed items, or integrate with an API. -## Feed source interface +A basic demonstration in a VM can be run with `nixos-shell` using the `#demo` flake attribute. -The base `intake` directory is `$XDG_DATA_HOME/intake`. Each feed source's data is contained within a subdirectory of the base directory. The name of the feed source is the name of the subdirectory. +## Feed source definitions + +The base Intake directory is `$XDG_DATA_HOME/intake`. Each feed source's data is contained within a subdirectory of the base directory. The name of the feed source is the name of the subdirectory. Feed source directories have the following structure: @@ -25,7 +27,7 @@ intake `intake.json` has the following structure: -``` +```json { "action": { "fetch": { @@ -37,40 +39,46 @@ intake "args": "..." } }, - "env": { ... } + "env": { + "...": "..." + } } ``` -Each key under `action` defines an action that can be taken for the source. The `fetch` action is required. `env` is optional. Each key under `env` will be set as an environment variable when executing actions. +Each key under `action` defines an action that can be taken for the source. `action` must be present with a `fetch` action. `env` is optional. -When an action is executed, intake executes the `exe` program for the action with the corresponding `args` as arguments. The process's environment is as follows: +## Interface for source programs + +Intake interacts with sources by executing the actions defined in the source's `intake.json`. The `fetch` action is required and used to check for new feed items. + +When any action is executed, intake executes the `exe` program for the action with the corresponding `args` as arguments. The process's working directory is set to the source's folder, i.e. the folder containing `intake.json`. The process's environment is as follows: * intake's environment is inherited. * `STATE_PATH` is set to the absolute path of `state`. * Each key in `env` in `config.json` is passed with its value. -Anything written to `stderr` by the process will be logged by intake. +Anything written to `stderr` by the process will be captured and logged by Intake. The `fetch` action is used to fetch the current state of the feed source. It receives no input and should write feed items to `stdout` as JSON objects, each on one line. All other actions are taken in the context of a single item. These actions receive the item as a JSON object on the first line of `stdin`. The process should write the item back to `stdout` with any changes as a result of the action. An item must have a key under `action` with that action's name to support executing that action for that item. The value under that key may be any JSON structure used to manage the item-specific state. -All encoding is done with UTF-8. If an item cannot be parsed or the exit code of the process is nonzero, intake will consider the action to be a failure. No items or other feed changes will happen as a result of a failed action, except for changes to `state` done by the action process. +All encoding is done with UTF-8. If an item cannot be parsed or the exit code of the process is nonzero, Intake will consider the action to be a failure. No items or other feed changes will happen as a result of a failed action, except for changes to `state` done by the action process. -## Item fields +## Top-level item fields -An item has the following top-level fields: - -* `id`: **Required**. A unique identifier within the scope of the feed source. -* `created`: **Automatic**. The Unix timestamp at which the item was generated. This attribute is automatically populated. -* `active`: **Automatic**. Whether the item is active. Inactive items are not displayed in channels. -* `title`: The title of the item. If an item has no title, `is` is used as a fallback title. -* `author`: An author name associated with the item. -* `body`: Body text of the item as raw HTML. This will be displayed in the item without further processing. -* `link`: A hyperlink associated with the item. -* `time`: A time associated with the item, not necessarily when the item was created. Feeds sort by `time` when it is defined and fall back to `created`. -* `tags`: A list of tags that describe the item. Tags help filter feeds that contain different kinds of content. -* `tts`: The time-to-show of the item. An item with `tts` defined is hidden from channel feeds until the current time is past `created + tts`. -* `ttl`: The time-to-live of the item. An item with `ttl` defined is not deleted by feed updates even if it is inactive if `created + ttl` is in the future. -* `ttd`: The time-to-die of the item. An item with `ttd` defined is deleted by feed updates even if it is active if `created + ttd` is in the past. -* `action`: An object with keys for all supported actions. +| Field name | Specification | Description | +| ---------- | ------------- | ----------- | +| `id` | **Required** | A unique identifier within the scope of the feed source. | +| `created` | **Automatic** | The Unix timestamp at which intake first processed the item. | +| `active` | **Automatic** | Whether the item is active. Inactive items are not displayed in channels. | +| `title` | Optional | The title of the item. If an item has no title, `id` is used as a fallback title. +| `author` | Optional | An author name associated with the item. Displayed in the item footer. +| `body` | Optional | Body text of the item as raw HTML. This will be displayed in the item without further processing! Consider your sources' threat models against injection attacks. +| `link` | Optional | A hyperlink associated with the item. +| `time` | Optional | A time associated with the item, not necessarily when the item was created. Feeds sort by `time` when it is defined and fall back to `created`. Displayed in the item footer. +| `tags` | Optional | A list of tags that describe the item. Tags help filter feeds that contain different kinds of content. +| `tts` | Optional | The time-to-show of the item. An item with `tts` defined is hidden from channel feeds until the current time is after `created + tts`. +| `ttl` | Optional | The time-to-live of the item. An item with `ttl` defined is not deleted by feed updates as long as `created + ttl` is in the future, even if it is inactive. +| `ttd` | Optional | The time-to-die of the item. An item with `ttd` defined is deleted by feed updates if `created + ttd` is in the past, even if it is active. +| `action` | Optional | An object with keys for all supported actions. The schema of the values depends on the source.