Go to file
Tim Van Baak 40464e9078 Fix missed reference to Python 3.8 2024-06-28 15:55:35 +00:00
demo Logging improvements 2023-07-02 17:15:52 -07:00
intake Fix serialization of action return value 2024-04-27 14:34:29 +00:00
template Add a flake template 2023-06-04 18:16:34 -07:00
tests Move demo to its own folder 2023-06-06 19:33:48 -07:00
.gitignore Init Flask app 2023-05-29 18:19:33 -07:00
LICENSE Add gitignore and license 2023-05-29 10:52:56 -07:00
README.md Update crontab entries on source update 2023-06-22 21:47:16 -07:00
default.nix Add flake.nix 2023-06-04 18:09:12 -07:00
flake.lock 23.05 -> 24.05 2024-06-28 15:46:08 +00:00
flake.nix 23.05 -> 24.05 2024-06-28 15:46:08 +00:00
module.nix Fix missed reference to Python 3.8 2024-06-28 15:55:35 +00:00
pyproject.toml Unpin python from 3.8 to whatever nixpkgs defaults to 2024-06-28 04:11:03 +00:00
shell.nix Add flake.nix 2023-06-04 18:09:12 -07:00

README.md

intake

Intake is an arbitrary feed aggregator that generalizes the concept of a feed. Rather than being restricted to parsing items out of an RSS feed, Intake provides a middle layer of executing arbitrary programs that conform to a JSON-based specification. An Intake source can parse an RSS feed, but it can also scrape a website without a feed, provide additional logic to filter or annotate feed items, or integrate with an API.

A basic demonstration in a VM can be run with nixos-shell using the #demo flake attribute.

Feed source definitions

The base Intake directory is $XDG_DATA_HOME/intake. Each feed source's data is contained within a subdirectory of the base directory. The name of the feed source is the name of the subdirectory.

Feed source directories have the following structure:

intake
 |- <source name>
 |   |- intake.json
 |   |- state
 |   |- <item id>.item
 |   |- <item id>.item
 |   |- ...
 |- <source name>
 |   |  ...
 | ...

intake.json must be present; the other files are optional. Each .item file contains the data for one feed item. state provides a file for the feed source to write arbitrary data, e.g. JSON or binary data.

intake.json has the following structure:

{
  "action": {
    "fetch": {
      "exe": "<absolute path to program or name on intake's PATH>",
      "args": ["list", "of", "program", "arguments"]
    },
    "<action name>": {
      "exe": "...",
      "args": "..."
    }
  },
  "env": {
    "...": "..."
  },
  "cron": "* * * * *"
}

Each key under action defines an action that can be taken for the source. An action must contain exe and may contain args. A source must have a fetch action.

Each key under env defines an environment variable that will be set when actions are executed.

If cron is present, it must define a crontab schedule. Intake will automatically create crontab entries to update each source according to its cron schedule.

Interface for source programs

Intake interacts with sources by executing the actions defined in the source's intake.json. The fetch action is required and used to check for new feed items when intake update is executed.

To execute an action, intake executes the exe program for the action with the corresponding args (if present) as arguments. The process's working directory is set to the source's folder, i.e. the folder containing intake.json. The process's environment is as follows:

  • intake's environment is inherited.
  • STATE_PATH is set to the absolute path of state.
  • Each key in env in config.json is passed with its value.

Anything written to stderr by the process will be captured and logged by Intake.

The fetch action is used to fetch the current state of the feed source. It receives no input and should write feed items to stdout as JSON objects, each on one line. All other actions are taken in the context of a single item. These actions receive the item as a JSON object on the first line of stdin. The process should write the item back to stdout with any changes as a result of the action.

An item must have a key under action with that action's name to support executing that action for that item. The value under that key may be any JSON structure used to manage the item-specific state.

All input and output is treated as UTF-8. If an item cannot be parsed or the exit code of the process is nonzero, Intake will consider the action to be a failure. No items or other feed changes will happen as a result of a failed action, except for changes to state done by the action process.

Top-level item fields

Field name Specification Description
id Required A unique identifier within the scope of the feed source.
created Automatic The Unix timestamp at which intake first processed the item.
active Automatic Whether the item is active. Inactive items are not displayed in channels.
title Optional The title of the item. If an item has no title, id is used as a fallback title.
author Optional An author name associated with the item. Displayed in the item footer.
body Optional Body text of the item as raw HTML. This will be displayed in the item without further processing! Consider your sources' threat models against injection attacks.
link Optional A hyperlink associated with the item.
time Optional A time associated with the item, not necessarily when the item was created. Feeds sort by time when it is defined and fall back to created. Displayed in the item footer.
tags Optional A list of tags that describe the item. Tags help filter feeds that contain different kinds of content.
tts Optional The time-to-show of the item. An item with tts defined is hidden from channel feeds until the current time is after created + tts.
ttl Optional The time-to-live of the item. An item with ttl defined is not deleted by feed updates as long as created + ttl is in the future, even if it is inactive.
ttd Optional The time-to-die of the item. An item with ttd defined is deleted by feed updates if created + ttd is in the past, even if it is active.
action Optional An object with keys for all supported actions. The schema of the values depends on the source.