9.5 KiB
intake
Intake is an arbitrary feed aggregator that generalizes the concept of a feed. Rather than being restricted to parsing items out of an RSS feed, Intake provides a middle layer of executing arbitrary commands that conform to a JSON-based specification. An Intake source can parse an RSS feed, but it can also scrape a website without a feed, provide additional logic to filter or annotate feed items, or integrate with an API.
A demo running in a NixOS VM is available via make demo
or using nix run
on the nixosConfigurations.demo.config.system.build.nixos-shell
flake attribute.
Overview
In Intake, a source represents a single content feed of discrete items, such as a blog and its posts or a website and its pages.
Each source has associated actions, which are executable commands.
The fetch
action checks the feed and returns the items in a JSON format.
Each item returned by a fetch is stored by Intake and appears in that feed's source.
When you have read an item, you can deactivate it, which hides it from your feed.
When a deactivated item is no longer returned by fetch
, it is deleted.
This allows you to consume feed content at your own pace without missing anything.
Intake stores all its data in a SQLite database.
This database is stored in $INTAKE_DATA_DIR
, $XDG_DATA_HOME/intake
, or $HOME/.local/share/intake
, whichever is resolved first.
The database can also be specified on the command line via --data-dir
/-d
instead of the environment.
Items
Items are passed between Intake and sources as JSON objects.
Only the id
field is required.
Any unspecified field is equivalent to the empty string, object, or 0, depending on field's type.
Field name | Specification | Description |
---|---|---|
id |
Required | A unique identifier within the source. |
source |
Automatic | The source that produced the item. |
created |
Automatic | The Unix timestamp at which Intake first processed the item. |
active |
Automatic | Whether the item is active and displayed in feeds. |
title |
Optional | The title of the item. If an item has no title, id is used as a fallback title. |
author |
Optional | An author name associated with the item. Displayed in the item footer. |
body |
Optional | Body text of the item as raw HTML. This will be displayed in the item without further processing! Consider your sources' threat models against injection attacks. |
link |
Optional | A hyperlink associated with the item. |
time |
Optional | A Unix timestamp associated with the item, not necessarily when the item was created. Items sort by time when it is defined and fall back to created . Displayed in the item footer. |
ttl |
Optional | The time-to-live of the item. An item with ttl defined is not deleted by feed updates as long as created + ttl is in the future, even if it is inactive. |
ttd |
Optional | The time-to-die of the item. An item with ttd defined is deleted by feed updates if created + ttd is in the past, even if it is active. |
tts |
Optional | The time-to-show of the item. An item with tts defined is hidden from feeds before the time created + tts . |
action |
Optional | A JSON object with keys for all supported actions. No schema is imposed on the values. |
Existing items are updated with new values when a fetch or action produces them, with some exceptions:
- Automatic fields cannot be changed.
- Source-level settings for
ttl
,ttd
, ortts
override the item's values. - Fields cannot be updated from a non-empty value to an empty value. If a field's previous value is non-empty and the new value is empty, the old value is kept.
Sources
A source is identified by its name.
A minimally functional source requires a fetch
action that returns items.
TTL, TTD, and TTS can be configured at the source level by setting the environment variables INTAKE_TTL
, INTAKE_TTS
, or INTAKE_TTS
to an integer value.
These values override any ttl
, ttd
, or tts
value returned by a fetch or action.
Automatic fetching can be configured by setting the INTAKE_FETCH
environment variable to a fetch schedule.
A fetch schedule may be:
every <duration>
, where<duration>
is a Go duration stringat HH:MM[,HH:MM[...]]
, where HH:MM is an hour and minuteon DOW[,DOW[...]] [at ...]
, where DOW is an abbreviated weekdayon M/D[,M/D[...]] [at ...]
, where M/D is a month and day
Examples:
INTAKE_FETCH | Schedule |
---|---|
every 5m |
Every 5 minutes (00:00, 00:05, ...) |
every 1d |
Once per day (at midnight) |
every 7d |
Once per week (at midnight Sunday) |
at 08:00 |
Once per day at 08:00 |
at 06:00,18:00 |
Twice per day at 6am and 6pm |
on Tue,Thu |
Twice a week, on Tue and Thu |
on Mon,Fri at 12:00 |
Twice a week, at noon on Monday and Friday |
on 3/25 |
Once a year on March 25 |
on */7 |
Each month on the 7th |
Action API
The Intake action API defines how programs should behave to be used with Intake sources.
To execute an action, Intake executes the command specified by that action's argv
.
The process's environment is as follows:
intake
's environment is inherited.- Each environment variable defined in the source is set.
STATE_PATH
is set to the absolute path of a file that the source can use for persistent state. This file can be used for any data in any format. Changes to the state file are only saved if the action succeeds.
The process inherits intake
's working directory, which may differ between CLI invocations and the service daemon.
Consequently, actions should use the state file for persistence and temporary directories for ephemeral files, rather than depending on the current working directory.
When an action receives an item as input, that item's JSON representation is written to that action's stdin
.
When an action outputs an item, it should write the item's JSON representation to stdout
on one line.
All input and output is assumed to be UTF-8.
If an item cannot be parsed or the exit code of the process is nonzero, Intake will consider the action to be a failure.
No items will be created or updated as a result of the failed action.
Anything written to stderr
by the action will be captured and logged by Intake.
The fetch
action receives no input and outputs multiple items.
This action is executed when a source is updated.
The fetch
action is the core of an Intake source.
All other actions take an item as input and should output the same item with any modifications made by the action.
Actions can only be executed for an item if that item has a key with the same name in its action
field.
The value of that key may be any non-null JSON value used to pass state to the action.
The special action on_create
is always run when an item is first returned by a fetch.
The item does not need to declare support for on_create
.
This action is not accessible through the web interface, so if you need to retry the action, you should create another action with the same command as on_create
.
If an item's on_create
fails, the item is still created, but without any changes made by action.
Web interface
The intake serve
command runs an HTTP server that gives access to the feed.
While the CLI can rely on normal filesystem access control to secure the database, this does not apply to HTTP.
Instead, the web interface can be locked behind a password set via intake passwd
.
Development
Parity features
- source batching
- web source add
- first-party replacement for cron
- NixOS module
- NixOS vm demo
Future features
- CLI simplification?
- on_delete triggers
- manual item edits, CLI
- manual item edits, web
- metric reporting
- items gracefully add new fields and
action
keys - arbitrary date punt
- TUI feed view
- Nix flake templates
- parsing a news feed
- following a webcomic
Useful snippets
sh -c
is very useful for turning small shell pipelines into intake sources, especially combined with jq
.
This fetch action warns when the disk is getting full:
sh -c 'df -h --output=pcent /home/user | grep -oe '\''[0-9]*'\'' | jq -c '\''if . > 85 then { id: "warning", title: "Free space usage: \(.)%", ttd: 1 } else empty end'\'''
Note that the last parameter to df
narrows the listing down to just the mount with the home directory on it, so there's only one percentage being reported.
The threshold is defined by the if
in the jq
expression.
By using the empty
output when below the threshold, the source returns no items when usage is low.
This is necessary to allow the item ttd
to delete it.
If you want a source to act as a scheduled reminder, it is necessary for the fetch to return an item with a new id every time.
A short /dev/random
pipeline can provide a random id to a jq
expression:
sh -c "cat /dev/random | base32 | head -c8 | jq -cR '{id: ., title: \"Hello\"}'"
You can generalize this using jq
's env
and set the message as a source environment variable:
# with MESSAGE defined on the source
sh -c "cat /dev/random | base32 | head -c8 | jq -cR '{id: ., title: env.MESSAGE}'"