One theme I'll be returning to is a principle which I'm calling "Data Precedence" (assuming the specific goal doesn't already have a name). This specifically refers to the precedence of data over logic.
This is closely related to data-centrism, data as a product (1), and also aligns closely with much of the software tools ideology built on top of I think what Fielding calls a uniform pipes and filters architecture and linked data/semantic web efforts.
The main focus of the principle is that one of the primary goals of logic is to expose model data for portable external use. This is in contrast to approaches such as plugins or a preference for libraries which try to keep most of the processing inside of a given process or platform.
As mentioned this is particularly in-line with established concepts such as shell pipelines, and may be the merging of that and data as a product to lead to the goal that intermediate and not only finished data should be similarly exposed (albeit there should likely be different levels of controls and contracts depending on the firmness of the boundaries). This also seems a plausible long term evolution of how some current directions play out given that we seemingly tend to develop tools which allow us to work with finer-grained pieces over time - certainly this could be viewed as isomorphic to a function-as-a-service data mesh.
In practice this translates to what has already been mentioned - a primary goal should be to enable interoperable access to data and therefore adopting tools and practices to expose data rather than processing it directly. There are certainly potential drawbacks where some that spring to mind is any overhead in marshaling and unmarshaling the data. On modern systems this is less of a concern, and there are plenty of tools floating around that have focused on optimizing encoding performance and enabling use across languages. Another concern may be security - this by itself is not actually a real concern given that there should be appropriate enforcement such that data security should be pervasive/deperimeterized and therefore if where or when data is exposed changes access that is a problem. A related more real problem would be that such transfer may introduce additional costs for encrypting and decrypting sensitive data which could undermine the previous sentiment around low IPC costs - this is legitimate and if necessary such work should be done with some form of trust boundary/enclave. While either of these could introduce some form of cost issues - the standard guidance of avoiding premature optimization applies.
One tool which I've repeatedly ended up using over the years for various purposes is Pandoc. It also provides a natural way for me to export this site in conformance with the goal of Data Precedence (and generally enabling that pursuit by providing a more semantic model for many document types).
I've used it in past incarnations with markdown, and most recently with LogSeq and org mode (in an orphaned project) where I have some existing logic that I'll pull in here.
I'll start with seeing what gaps may exist just invoking Pandoc. I did a quick skim of the model from yesterday's journal
pandoc 20250124.org -o native
and it seemed more promising than what was in place for LogSeq so I'll configure that for HTML output and see how it looks. At this point I should also look at splitting out the source and output files.
This has largely been moved to generate.mk.
Given yesterday's file has a citation it will again act as the test subject using the tangled contents of this file (moved to an appropriate place).
The result is overall promising - one piece I didn't carry over was how the references were styled since the default isn't particularly Web friendly. I'll pull in some Pandoc defaults to help with that.
I default to using the iso690-numeric-en.csl
styling
which I grabbed from - somewhere (to be referenced later) and which is a
file that is included locally.
Here I'll specify that I want newer HTML in case that's not the
default, activate citeproc
(which I'll then attempt to
remove from the previous command, indicate the desired citation
formatting, and activate citation linking.
to: html5
citeproc: true
metadata:
citation-style: iso690-numeric-en.csl
link-citations: true
My previous file also had some settings that I don't need yet, so I'll wait on those. I could presumably also add the bibliography file here but that seems to smell a bit given that it crosses out of the relatively hermetic inputs.
The citations are now formatted as desired - but they are also
repeated. This may be due to the current redundant specifying of
citeproc
so I'll remove the one from the command (disabling
the previous block).
One concern I'd have is the splitting of specifying
citeproc
and the bibliography file - if there's a hard
dependency between them then it would be undesirable
behavior/connascence if things would break if the scattered pieces
weren't aligned. Running Pandoc manually without specifying the
bibliography produces a warning about the missing citation which seems
like appropriate behavior.
Another concern are how internal links will work…where the previous page didn't allow for testing that given that there are no such links. The clearest candidate is the home page as it links to all of the journal pages so I'll try that one for testing. It's also about time for me wind down this session so if it works I'll flip my site over to Pandoc and otherwise I'll come back to this later.
Unfortunately but not surprisingly at all, internal links do not work
given that they use the .org
extension. This can be fixed
with a relatively trivial Pandoc filter - I'd prefer to just remove the
extension and leave the resolution to content negotiation and Web server
settings, but at a glance that does not seem to be supported on
sourcehut pages (which is unfortunately in-line with other static page
hosts) so I'll likely start with replacing it .html
and
chasing down the options for avoiding extensions later on. I know of the
common pattern to have each page be its own directory to make use of Web
server's handling of index pages - that seems like a practical
compromise where necessary but also a bit of tail wagging the dog which
I'd avoid. Something like avoiding the extension in the output files may
be a better option IMO so long as the servers and clients can work out
the content type (I know IE used to have an issue ages ago so while I'd
hope that no modern browsers should have issues the server may need to
be configured properly).