Go to file

Tim Van Baak 40464e9078 Fix missed reference to Python 3.8		2024-06-28 15:55:35 +00:00
demo	Logging improvements	2023-07-02 17:15:52 -07:00
intake	Fix serialization of action return value	2024-04-27 14:34:29 +00:00
template	Add a flake template	2023-06-04 18:16:34 -07:00
tests	Move demo to its own folder	2023-06-06 19:33:48 -07:00
.gitignore	Init Flask app	2023-05-29 18:19:33 -07:00
default.nix	Add flake.nix	2023-06-04 18:09:12 -07:00
flake.lock	23.05 -> 24.05	2024-06-28 15:46:08 +00:00
flake.nix	23.05 -> 24.05	2024-06-28 15:46:08 +00:00
LICENSE	Add gitignore and license	2023-05-29 10:52:56 -07:00
module.nix	Fix missed reference to Python 3.8	2024-06-28 15:55:35 +00:00
pyproject.toml	Unpin python from 3.8 to whatever nixpkgs defaults to	2024-06-28 04:11:03 +00:00
README.md	Update crontab entries on source update	2023-06-22 21:47:16 -07:00
shell.nix	Add flake.nix	2023-06-04 18:09:12 -07:00

README.md

intake

Intake is an arbitrary feed aggregator that generalizes the concept of a feed. Rather than being restricted to parsing items out of an RSS feed, Intake provides a middle layer of executing arbitrary programs that conform to a JSON-based specification. An Intake source can parse an RSS feed, but it can also scrape a website without a feed, provide additional logic to filter or annotate feed items, or integrate with an API.

A basic demonstration in a VM can be run with nixos-shell using the #demo flake attribute.

Feed source definitions

The base Intake directory is $XDG_DATA_HOME/intake. Each feed source's data is contained within a subdirectory of the base directory. The name of the feed source is the name of the subdirectory.

Feed source directories have the following structure:

intake
 |- <source name>
 |   |- intake.json
 |   |- state
 |   |- <item id>.item
 |   |- <item id>.item
 |   |- ...
 |- <source name>
 |   |  ...
 | ...

intake.json must be present; the other files are optional. Each .item file contains the data for one feed item. state provides a file for the feed source to write arbitrary data, e.g. JSON or binary data.

intake.json has the following structure:

{
  "action": {
    "fetch": {
      "exe": "<absolute path to program or name on intake's PATH>",
      "args": ["list", "of", "program", "arguments"]
    },
    "<action name>": {
      "exe": "...",
      "args": "..."
    }
  },
  "env": {
    "...": "..."
  },
  "cron": "* * * * *"
}

Each key under action defines an action that can be taken for the source. An action must contain exe and may contain args. A source must have a fetch action.

Each key under env defines an environment variable that will be set when actions are executed.

If cron is present, it must define a crontab schedule. Intake will automatically create crontab entries to update each source according to its cron schedule.

Interface for source programs

Intake interacts with sources by executing the actions defined in the source's intake.json. The fetch action is required and used to check for new feed items when intake update is executed.

To execute an action, intake executes the exe program for the action with the corresponding args (if present) as arguments. The process's working directory is set to the source's folder, i.e. the folder containing intake.json. The process's environment is as follows:

intake's environment is inherited.
STATE_PATH is set to the absolute path of state.
Each key in env in config.json is passed with its value.

Anything written to stderr by the process will be captured and logged by Intake.

The fetch action is used to fetch the current state of the feed source. It receives no input and should write feed items to stdout as JSON objects, each on one line. All other actions are taken in the context of a single item. These actions receive the item as a JSON object on the first line of stdin. The process should write the item back to stdout with any changes as a result of the action.

An item must have a key under action with that action's name to support executing that action for that item. The value under that key may be any JSON structure used to manage the item-specific state.

All input and output is treated as UTF-8. If an item cannot be parsed or the exit code of the process is nonzero, Intake will consider the action to be a failure. No items or other feed changes will happen as a result of a failed action, except for changes to state done by the action process.

Top-level item fields

Field name	Specification	Description
`id`	Required	A unique identifier within the scope of the feed source.
`created`	Automatic	The Unix timestamp at which intake first processed the item.
`active`	Automatic	Whether the item is active. Inactive items are not displayed in channels.
`title`	Optional	The title of the item. If an item has no title, `id` is used as a fallback title.
`author`	Optional	An author name associated with the item. Displayed in the item footer.
`body`	Optional	Body text of the item as raw HTML. This will be displayed in the item without further processing! Consider your sources' threat models against injection attacks.
`link`	Optional	A hyperlink associated with the item.
`time`	Optional	A time associated with the item, not necessarily when the item was created. Feeds sort by `time` when it is defined and fall back to `created`. Displayed in the item footer.
`tags`	Optional	A list of tags that describe the item. Tags help filter feeds that contain different kinds of content.
`tts`	Optional	The time-to-show of the item. An item with `tts` defined is hidden from channel feeds until the current time is after `created + tts`.
`ttl`	Optional	The time-to-live of the item. An item with `ttl` defined is not deleted by feed updates as long as `created + ttl` is in the future, even if it is inactive.
`ttd`	Optional	The time-to-die of the item. An item with `ttd` defined is deleted by feed updates if `created + ttd` is in the past, even if it is active.
`action`	Optional	An object with keys for all supported actions. The schema of the values depends on the source.