From 79dbea50c2cb3093d515438d1ebff056e78fd1ac Mon Sep 17 00:00:00 2001 From: Tim Van Baak Date: Fri, 24 Jan 2025 10:00:45 -0800 Subject: [PATCH] Import README content from Python, include todo list --- README.md | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 133 insertions(+) diff --git a/README.md b/README.md index e69de29..3687591 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,133 @@ +# intake + +Intake is an arbitrary feed aggregator that generalizes the concept of a feed. +Rather than being restricted to parsing items out of an RSS feed, Intake provides a middle layer of executing arbitrary commands that conform to a JSON-based specification. +An Intake source can parse an RSS feed, but it can also scrape a website without a feed, provide additional logic to filter or annotate feed items, or integrate with an API. + +## Development + +Parity with existing Python version + +* [x] create sources +* [ ] rename sources +* fetch sources + * [x] create and delete items + * [x] update existing items + * [ ] support item TTL and TTD + * [ ] on_create triggers + * [ ] on_delete triggers + * [x] dry-run +* item actions + * [ ] create + * [ ] edit + * [ ] rename + * [ ] delete + * [ ] execute + * [ ] require items to declare action support + * [ ] state files + * [ ] source environment + * [ ] working directory set +* [ ] update web UI credentials +* [ ] automatic crontab integration +* [ ] feed supports item TTS +* [ ] data directory from envvars +* [ ] source-level tt{s,d,l} +* [ ] source batching +* channels + * [ ] create + * [ ] edit + * [ ] rename + * [ ] delete +* feeds + * [x] show items + * [x] deactivate items + * [ ] mass deactivate + * [ ] punt + * [ ] trigger actions + * [x] add ad-hoc items +* [ ] NixOS module +* [ ] NixOS module demo + +Additional features + +* [ ] metric reporting +* [ ] on action failure, create an error item with logs +* [ ] first-party password handling instead of basic auth and htpasswd +* [ ] items gracefully add new fields and `action` keys +* [ ] arbitrary date punt +* [ ] HTTP edit item +* [ ] sort crontab entries +* [ ] TUI feed view + +## Overview + +In Intake, a _source_ represents a single content feed of discrete _items_, such as a blog and its posts or a website and its pages. +Each source has associated _actions_, which are executable commands. +The `fetch` action checks the feed and returns the items in a JSON format. +Each item returned by a fetch is stored by Intake and appears in that feed's source. +When you have read an item, you can deactivate it, which hides it from your feed. +When a deactivated item is no longer returned by `fetch`, it is deleted. +This allows you to consume feed content at your own pace without missing anything. + +Intake stores all its data in a SQLite database. + +### Items + +Items are passed between Intake and sources as JSON objects. +Only the `id` field is required. +Any unspecified field is equivalent to the empty string, object, or 0, depending on field's type. + +| Field name | Specification | Description | +| ---------- | ------------- | ----------- | +| `id` | **Required** | A unique identifier within the source. +| `source` | **Automatic** | The source that produced the item. +| `created` | **Automatic** | The Unix timestamp at which Intake first processed the item. +| `active` | **Automatic** | Whether the item is active and displayed in feeds. +| `title` | Optional | The title of the item. If an item has no title, `id` is used as a fallback title. +| `author` | Optional | An author name associated with the item. Displayed in the item footer. +| `body` | Optional | Body text of the item as raw HTML. This will be displayed in the item without further processing! Consider your sources' threat models against injection attacks. +| `link` | Optional | A hyperlink associated with the item. +| `time` | Optional | A Unix timestamp associated with the item, not necessarily when the item was created. Items sort by `time` when it is defined and fall back to `created`. Displayed in the item footer. + +Existing items are updated with new values when a fetch or action produces them, with some exceptions: + +* Automatic fields cannot be changed. +* If a field's previous value is non-empty and the new value is empty, the old value is kept. + +### Sources + +A source is identified by its name. A minimally functional source requires a `fetch` action that returns items. + +### Action API + +The Intake action API defines how programs should behave to be used with Intake sources. + +To execute an action, Intake executes the command specified by that action's `argv`. +The process's environment is as follows: + +* `intake`'s environment is inherited. +* `STATE_PATH` is set to the absolute path of a file containing the source's persistent state. + +When an action receives an item as input, that item's JSON representation is written to that action's `stdin`. +When an action outputs an item, it should write the item's JSON representation to `stdout` on one line. +All input and output is assumed to be UTF-8. +If an item cannot be parsed or the exit code of the process is nonzero, Intake will consider the action to be a failure. +No items will be created or updated as a result of the failed action. +Anything written to `stderr` by the action will be captured and logged by Intake. + +The `fetch` action receives no input and outputs multiple items. +This action is executed when a source is updated. +The `fetch` action is the core of an Intake source. + +All other actions take an item as input and should output the same item with any modifications made by the action. +Actions can only be executed for an item if that item has a key with the same name in its `action` field. +The value of that key may be any non-null JSON value used to pass state to the action. + +The special action `on_create` is always run when an item is first returned by a fetch. +The item does not need to declare support for `on_create`. +This action is not accessible through the web interface, so if you need to retry the action, you should create another action with the same command as `on_create`. +If an item's `on_create` fails, the item is still created, but without any changes made by action. + +The special action `on_delete` is like `on_create`, except it runs right before an item is deleted. +It does not require explicit support and is not accessible in the web interface. +The output of `on_delete` is ignored; it is primarily for causing side effects like managing state.