January 27, 2025

Pandoc Extension

Pandoc enables extension in a range of languages where an option such as Python is likely to be fairly widely palatable given the language popularity and provided supporting libraries. I personally tend to follow one of the two more direct options depending on context.

Lua Filters

Lua filters are what I use by default in a team setting. They have the enormous advantage that the interpreter is built in to Pandoc and therefore there are no further dependencies. They have the potential disadvantages that Lua itself doesn't include many batteries (and therefore there tend to be local libraries for anything non-trivial) and that there's not typically a lot of design time safety so many errors may not be detected at run time. The latter can be addressed by a test harness (and should be in larger projects) - but in more limited use such testing may be unnecessary.

Haskell Filters

Haskell filters provide earlier feedback in a more powerful language. It's also the primary language of Pandoc and therefore activities such as skimming code or haddock documentation are more natural. It unfortunately expects the environment to be set up properly, and Haskell is likely to be less approachable for many developers. I there use Haskell primarily for personal use.

Replacing Link Extension

For now my need is to replace .org with .html in link targets. This is likely simple enough that I could use something like bash, but the Pandoc filter should also be very simple and should be a bit more reliable.

Shebang

The file starts off with the shebang per the filter documentation (1).

#!/usr/bin/env runhaskell

OverloadedStrings

The OverloadedStrings pragma is used to help ease some of the coercion across Text and Strings.

{-# LANGUAGE OverloadedStrings #-}

Imports

Import Data.Text for some text manipulation functions.

import Data.Text

Import Text.Pandoc.JSON to use filter support function.

import Text.Pandoc.JSON

Entrypoint

The entrypoint for the script will make use of the Pandoc provided toJSONFilter.

This defines main as an IO type class and higher-order function invocation.

main :: IO ()
main = toJSONFilter linkFixer

Filter Type Definition

Given that the focus is on links, the filter will deal with ~Inline~s.

linkFixer :: Inline -> Inline

Links whose targets end with .org should have that replaced with .html which will make use of a guard clause.

Other links will be passed through the subsequent pattern.

linkFixer (Link attr inline (href, title))
  | ".org" `isSuffixOf` href =
    Link attr inline (fixedHref, title)
      where fixedHref = (dropEnd 4 href) <> ".html"

Pass Other Elements Through

Nothing to see here.

linkFixer e = e

Site Authoring Flow

The content in this site is initially driven off of writing what amounts to a daily diary. I started to swap to this flow as part of using LogSeq (which for the time being I've shifted away from) and find it a good mechanism to both capture more information with less overhead and keep better track of what I've been doing (I've never been one to keep a journal…or pay much attention to things like time).

Certain sections of information will then be periodically distilled into pages. I'll likely try to keep the journals focused on activities and the pages more on results (with links or embeddings to relate the two processes). For this and just general cleanup - journal entries are subject to revision.

I'm capturing this information now since one of my next tasks will be integrating this filter into the work captured yesterday so it will be the first need to extract and clean-up some of the longer lived information.

The longer term goal is to produce a graph of information which will be hopefully accomplished through a combination of links and supporting tools.

Site File Structure

At some point in the past few days I was also thinking that all of the journal files in the same directory could optimistically get a bit bloated over time. Splitting them out by year would be perfectly sufficient, but I suppose for consistency I'll split them out down to months. This may also be helpful if I end up wanting to attach some assets (like images) to journal entries (which is fairly likely).

I'll work on doing the three above which doesn't seem like it should take much time but will block updates until they're done.

1.
Pandoc - Pandoc filters. [online]. [Accessed 5 December 2024]. Available from: https://pandoc.org/filters.html