Home - Matt Whipple

Table of Contents

Society

Education Isn't the Answer

When societal issues arise a response that some people jump to is that education is the answer; most recently I've heard this in response to the Black Lives Matter/Police Violence issues with which the US is currently facing. This is a non-answer: it ultimately translates to inaction and therefore by default the preservation of those systems and policies which underly the perceived problems.

The argument is likely a popular for a range of issues because it requires more than casual inspection to dismiss. Education is inherently a good thing, and is the solution for ignorance: positing that education is not the answer could feel as though it's arguing in favor of ignorance although such inference is ultimately based on a straw man fallacy.

A cynical reaction to the education argument is that it is a means of distancing oneself from the problem. By equating a given problem with ignorance and believing that enlightening those benighted souls that instigate problems is the solution is likely to carry with it the belief that one places oneself above the problem. A snarky summary would be that the solution subtext may be along the lines that "the problem wouldn't exist if more people were like me" but less presumptuously there is a likely message of "I'm not part of the problem".

The education argument is also often used for problems which are systemic, yet it shifts the honus onto the individual. While this may foster emergent enhancements from a neutral starting point producing a targetted outcome is inefficient and risky, further as any current issue likely indicates contrary momentum a naive propositon of education is more likely to resemble attempting to stop flooding by randomly throwing sandbags in the streets. An emphasis on the individual may be fostered by something along the lines of liberterian ideals: though it is important to temper such ideals with pragmatism and also to not fall into the common trap of cherry picking when such ideals are espoused. Any systemic failure must be the result of either an issue which organically emerged which speaks to a deficiency in some of the fundmanetal libertarian ideas, or it has been caused by some previous tampering with the system. Any proposed solution born from such ideology must be accompanied by the recognition of associated limitations and corrections and cannot be rationally presented in a bubble. In any case this position primarily disregards the fact that the underlying system has either introduced a problem or has failed to protect itself from an externally introduced problem.

In general the idea of resolving problems through additional knowledge also implies that such problems are incidental rather than intentional; in any case where such issues are systemic this is likely to be naive. If a given idea has been reinforced, and particularly if it has been codified in any way the propsect of ascribing any origin to an ignorance that could be rectified through education ignores or triviailizes the forces that originally created the issue and may continue to benefit from and support its existence.

On a more fundamental level education is means to distribute knowledge, and knowledge by itself does not resolve anything other than lack of knowledge. Problems are solved through the application of knowledge, and therefore education normally amounts to equipping people with tools such that they are better able to apply knowledge to resolve issues. Suggesting that eductaion is a solution to any practical problem is therefore leads to somewhat of a circular argument: education may help yield a solution to the problem and therefore cannot be that solution itself. This is another subtle aspect as to why the "education is the solution" argument is tempting but incorrect: education may help produce solutions and is inherently beneficial but in terms of mapping to reality it is equivalent to the statement "more research is needed" masquerading as a solution in and of itself.

Suggesting education is a solution is generally a non-solution. It is a pernicious idealized carrot which admits a problem but provides no clear action to disrupt the status quo. It amounts to an attempt to be transcendent on a moving train and in the current racial context provides a glaring example of trying project inerty non-racist stances onto a society which clearly has inequality and data seems to indicate has racist policies. If a system has proven itself to be lacking then it takes more than knowledge or wishful thinking to correct it. The education argument could have easily been made within our Jim Crow past (and that lens is also useful for other questions around racial policies). We have since come to accept that that system was broken and when our current system is signalling that it too has issues we must assess that system itself rather than deflecting inspection onto individuals and any ignorance which a broken society seems to be empowering.

Mathematics

Pursuing Literacy

Having not had any significant amount of higher education my formal maths training never even made it to single variable calculus, but I tend to read a fair amount of material that vomits out mathematical notation. To me (and I suspect most people) such information is, at best, hard to read which leads me to either skim over it without understanding, or stop reading and spend what is often a disruptive amount of time to try to process the expression(s) into submission. I'm currently attempting to smooth out the consumption of such content.

Mathematical notation is inherently dense, and its content most often relies on the reader to infer many of the connections. Digesting such content more easily therefore relies on familiarity with both the syntax and a catalog of common techniques (and their outputs). This familiarity will be built with potentially the most obvious and proven approach of practice. To this end I'll be looking to regularly read such materials and working through any included maths until they can be understood with less difficulty. Such materials are easily located within my primary field of software engineering let alone extra areas of interest across science and mathematics itself.

The process of actually working through much of the material is likely to be valuable itself, moreso than just reading explanations. Any content here is therefore only likely to be valuable to myself or anyone that happens to stumble upon it and it nudges them past a problem upon which they are stuck.

In particular it feels as though it can be very east to fall into a trap of being able to provide correct mathematical solutions without a real understanding of why that solution applies. This feels particularly pernicious in that it could manifest as an underlying lack of understanding upon which formal approaches could continue to be applied: such as using pre-baked formulas without the prerequisite knowledge to produce them. This can lead to what is equivalent to an increased ability to leverage the packaged axioms within an ostensibly isomorphic formal system but without a defensible means to map any productions to anything outside of that system (i.e. reality).

Concrete Mathematics

I'm currently picking my way through Concrete Mathematics1, so here I'll attempt to elucidate some of the ideas within that book which gave me pause.

Josephus Recurrence

The third of the problems in the first chapter of Concrete Mathematics is a version of the Josephus Problem2 in which every other element is eliminated. This is the first problem that expands significantly beyond a pattern of deducing recurrences and proving them through induction.

One of the first conclusions that seemed fuzzy to me while working through the Josephus problem is the establishment of the recurrence relation. That the input is effectively halved upon each lap around the circle is evident, and that yields what are effectively new numbers which are adjusted based on whether the input size was odd or even. The missing step for me in my case (which is relatively minor and may be due to having an initially off perspective) is how such new numbers correspond directly to a relation for the solution. As the distinction between the odd and even cases is insignificant for crossing this gap and therefore the logic for one can be transplanted to the other, the focus here will remain on the slightly simpler even case.

Given a trivial example with numbers (let's call them I for input):

\(1 2 3 4 5 6\)

The first lap would yield:

\(1 x 3 x 5 x\)

Where each survivor effectively acquires a new number for the new lap:

\(1 x 2 x 3 x\)

The numbers after that lap can therefore be defined as:

\(T_n = (I_n + 1)/2\)

which corresponds neatly to the halving. Now that the problem has been reduced any product of that reduction needs to be mapped back to the number in the original sequence and therefore the above equation needs to be flipped to reflect the relationship from Tn back to In. Some combination of algebra and inspection of the values reveals that to be:

\(I_n = 2T_n - 1\)

As the previous transformation relfected the halving, this one conveys a product range of odd numbers and therefore omits those eliminated members. The process of solving the halved problem and then transforming that result back into the original sequence therefore provides the ultimate recurrence relation:

\(j(2n) = 2j(n) - 1\)

These seem to be fairly evident dots to connect and resulted in a conclusion which seemed easy to digest but led me to uncertainty when I stopped to make sure I truly understood how the numbers involved aligned with the fundamental relationship.

Repetoire Method

Once concept that left my head spinning the first time I flipped through CM (but felt far more approachable the second time around) was the repertoire method. I think on of the main stumbling blocks with the introduction of the repetoire method is the introduction of the new symbols which are somewhat canonically expressed in the equation:

\(f(n) = A(n)\alpha{} + B(n)\beta{} + C(n)\gamma{}\)

This equation is used as a generalization to shed light on the recurrence within the Josephus Problem covered earlier where the complete set of relations follows the pattern of:

\begin{equation} f(1) = \alpha{}\\ f(2n) = 2f(n) + \beta{}\\ f(2n+1) = 2f(n) + \gamma{}\\ \end{equation}

with \(\alpha{}=1\), \(\beta{}=-1\), and \(\gamma{}=1\). A key aspect that I think I missed the first time is that there are effectively two levels of indirection being introduced into the problem and the repetoire method is being used against that higher layer. The original problem is now somewhat buried and can be temporarily ignored while looking at the higher level of abstraction lest it invites confusion. The replacement of the constants with \(\alpha{}\), $\\(beta{}\), and \(\gamma{}\) is straightforward but then the addition of \(A\), \(B\), and \(C\) add an additional level such that the focus moves to defining the relationship between \(\alpha{}\), \(\beta{}\), and \(\gamma{}\) as \(n\) increases. The original more concrete problem is now somewhat irrelevant and the new problem becomes establshign how the shape of the equation changes and how that can be expressed in terms of \(A\), \(B\), and \(C\). It is this more abstracted relationship that the repetoire method is used to resolve: seeking to create equations that produce the desired values for \(f(n)\) as \(n\) increases.

The book provides a table conveying how that shape evolves as \(n\) increases:

\(n\) \(f(n)\)
1 \(\alpha{}\)
2 \(2\alpha{} + \beta{}\)
3 \(2\alpha{} + \gamma{}\)
4 \(4\alpha{} + 3\beta{}\)

This effectively captures a new recurrence for which \(A\), \(B\), and \(C\) can be solved to produce a closed form, or as used in the book to support/prove an existing solution. As mentioned the additional layers of abstraction allow us to focus on how the shape of the equation changes as \(n\) increases. At this level of abstraction the role of the equation is secondary to establishing that internal relationship between the terms. The use of \(\alpha{}\), \(\beta{}\), and \(\gamma{}\) provide the flex points by which we can fit multiple functions into the established shape, and if each such function is provably correct across \(n\) then the relationship between terms is also correct. This allows us to home in on \(A\), \(B\), and \(C\) and by using substitutions and eliminations we can isolate a closed form for each variable which produces the recurrence reflected in the table above. The trivial functions used such as \(f(n) = n\) and \(f(n) = 1\) are obviously entirely different than the equation that we really care about, but that they can be solved using the shape established verifies that the assertions about the shape itself are sound (and this is likely tied in with concepts like linearizability with which I'm not particularly familiar at the moment).

Reading

ACM

I'm a proud member of the ACM3. Part of my daily routine is to chip away at comprehensively reading the material in the ACM Digital Library. Much of the material is fairly old at this point, but in addition to historical interest there are clear patterns that have carried through the history of computing and many challenges during early computing are echoed in problems encountered today. Indeed, many of the older articles circle concepts which remain relevant and which overall seem to indicate that computing has spread far more than it has advanced.

Here I'll track some of the publications which I'm currently working through.

Queue

Books

Cracking the Coding Interview

I've recently started reading a copy of Cracking the Coding Interview4. This popped on to my radar as I was cleaning out my inbox and found an old email from a Google recruiter who referenced it as a resource. Unfortunately when collecting the information to write this I noticed that I was too quick to buy a copy off of Google Play and unfortunately there are several books with similar titles and I got the book by "Harry" whereas the one referenced in the email is the one by Gayle Laakmann McDowell.

The purchased book is far fairly unsatisfying and not something which has aged well.

The initial part is fairly typical interview advice with nothing particularly domain specific. Much of the advice feels a little too calculated for behavioral questions; as an interviewer if a candidate provides overly crafted answers it is likely to trigger a BS alarm or leave me with an unclear feeling towards a candidate. The ability for someone to spin their experiences such that some checkboxes are ticked does not convey what they would be like to work with, nor does overly prepared answers elucidate their ability to reactively communicate. While there is some useful guidance it is buried within practices which would gut authenticity and it didn't seem likely to provide any more insight than the myriad of more general books in the more general category of getting a job (which would also include advice on getting in the door which is absent from this book).

The book then continues to an explanation of why Java should be used for interviews. Many of the justifications for Java are emblematic of a time when Java was expected to dominate a wider range of platforms most of which it no longer dominates, and some of which are effectively dead (such as delivering Web front-end functionality through applets). There's a dissonance in that the selection of Java seems to have a tone of Java being an incidental practical choice, but then the material proceeds to dig through specific corners across the range of the Java platform. The message therefore shifts from Java knowledge being a means to an end driven by concepts (and supplemented as needed by readily available reference material) to a primary goal.

The book then continues onto a randomly distributed (and highly redundant) list of questions about a variety of Java related topics. As previously mentioned and alluded to many of these are focused on scattered corners of the platform and solutions which have since fallen out of favor. These are combined with more general questions and those that are more conceptual, though even these more fundamental questions span a range between pragmatism and dogmatism with the previously mentioned age being a major factor (seemingly pre-Java 5 which at this point is missing at least two transformative versions). The biases may also speak to the buzz around Java at the time (and possibly the fact that publisher appears to be part of Oracle which is now the steward of Java), but it still aligns with a fetishizing of tools over engineering that is unlikely to be indicative of a culture which produces relatively optimal solutions.

The writing and editing of the book itself feels very rough around the edges, but is not noticeably worse than other technical books I've read by some publishers (Packt stands out in my mind though there are others).

The book as a whole has a very dated feel which leaves the initial impression that it is just dated: however there are references to technologies which are far newer than the rest of the content. There are questions seemingly targeting mostly abandoned enhacnements from Java (1.)2 and others (sometimes adjacent) targeting Java 8 features. This implies that the majority of the content is outdated but that some ultimately insignificant updates have been made yeilding a spotty veneer of freshness on top of rotting foundation. This could arguably be valueable if looking for a position that involves significant amount of legacy code or legacy mindsets, but that is inconsistent with the purported purpose of the book.

Utlimately this book feels like a waste of time aside from being a survey of Java hype and history. The contents in no way match the title as the focus is not on coding but on largely obsolete Java trivia. In the case of someone actually looking to acquire Java knowledge in preparation of an interview a far better source that provides sound practicable knowledge would be something like Josh Bloch's Effective Java.

I should probably pick up a copy of the right version of Cracking the Coding Interview.

Projects

This Site

The content, design, and technology of this site are all expected to evolve fairly significantly over time.

Contents

This site is intended to capture knowledge intended primarily for myself. It will effectively evolve as a combination of a collection of annotated bookmarks and a sandbox within which I may practice writing and processing of the referenced sources.

Technology

This site (the latest in a long line for my of varying technologies) is created using Org mode5 in Emacs. The site is hosted using GitLab pages6.

  • Publishing

    The site is published automatically using the CI/CD functionality provided by GitLab. This was relatively straightforward by borrowing from the provided example7, modified a bit to match my tastes.

    • Use of Project .el File

      My initial hope was to adopt a simpler approach than the example configuration as the first iteration. This site started as a single org file to export to HTML, so the hope would be that that could be done through invoking an existing function through emacs batch mode

    • Integration with Make

      With the very simple initial alternative of easily calling a readily available emacs function off the table, pursuit of abject triviality is abandoned. As in virtually every other project I work on I therefore introduce `make` to capture the resulting recipe. On a basic level this is a trivial indirection where the emacs batch command is captured in make rather than in the GitLab CI configuration file, there are however two further adjustments.

      The first of these tweaks is bridging the now more segmented build configuration. If the build is managed entirely within emacs then emacs has access to all of the relevant parameters: however if make is performing more general build definition then it should also provide parameters such as where artifacts should be built. The resulting call to emacs is therefore enhanced to issue a call which passes relevant arguments through to be used in the ultimate project configuration.

      Another slightly more interesting issue is that make introduces a new dependency which therefore must be present in the container within which GitLab calls emacs. For the smallest code footprint the desire would be to use an already published image which contains both (an up-to-date) emacs and make. After a bit of local experimentation the `dev` tagged `silex/emacs` images fit the bill. That particular image seems to be of a fairly large size but considering how it is being used that's not expected to be of practical consequence. Normally I'd also be likely to wire up a make target which recreates the containerized behavior (a host target which calls `docker run` to invoke another container target) but at the moment this doesn't feel interesting enough to warrant the effort given the relative ubquitious availability of make and emacs.

  • Semantic Section IDs

    One of the unforunate but understandable realizations when using exported org documents is that the generated ids have no semantic meaning. A naive hope would be that the ids were generated based on section headings or equivalent contents or that a strategy to do so existed. The `CUSTOMID` property was noticed in passing while reading the Org manual but anything more automagic seemed tempting. Poking around from `ox-html.el` seems to indicate that ids are generated either from that property or from the relatively simple `org-export-get-reference` function.

    This lack of provided functionality is eminently reasonable: deriving a desirable id is relatively complex and fragile: knowing how to transform variations on possible content into a desirable and acceptable convention which is also unique within some potentially uncertain context introduces a fair amount of complexity. Further it is likely to produce suboptimal results in the event of collisions, text which is not suitable for such transformations (and constraining the text for suitably and immutability introduces a new limitation which could easily be avoided), or any content whose meaning is relative to enclosing context.

    Ultimately therefore the simplicity and control that `CUSTOMID` provides more than offsets the trivial toil of explicitly specifying obvious values.

    Any given heading will be qualified enough so that its meaning is clear but should not be bound to its current path. This should allow the content to be referenced in a way which is unambiguous without tying its identity to its current location (therefore allowing any nagivable structure to evolve independently of the content and aligning with the concept that Cool URIs Don't Change. (TODO: cite)

  • Footnote Sources

    Sources are currently managed using Org mode's provided footnote functionality with named footnotes. Currently this is used a simple solution, it is planned to later integrate BibTeX as appropriate references are consumed.

  • Customizing the Domain

    Mapping my personal domain to point to the GitLab pages site was relatively straightforward using the instructions in the GitLab documentation. A minor bump was encountered due to the inconvenient and potentially incorrect information for the `TXT` verification record.

    The inconvenience was due to the GitLab UI providing copying of the complete record whereas the management UI for the zone file splits out the individual fields; this was fairly easily resolved by more selective copying and pasting.

    The more interesting issue and one that is potentially incorrect was due to the host part of the record seemingly being a fully qualified domain name but not including a trailing `.`. When added to the record as is it created a relative host thereby including a spurious domain, resulting in something like `verification.mattwhipple.com.mattwhipple.com` rather than the expected `verification.mattwhipple.com`. To resolve this I removed that part of the domain from the host field, though I think a trailing period would have also done the trick.

    These potentially warrant updates to the GitLab documentation, but at the moment I'm unaware how much of it may be covered in existing documentation around DNS and also uncertain how much the minor hassles I encountered would apply to other users. The likelihood of my spending time to gain clarity around those conerns is pretty low, and therefore so is the corresponding likelihood that I'd submit an update or issue for the GitLab documentation.

Family Media Site

A project I've been dragging my feet on for years is a system to track media files for my family. I settled on a general design and did some proof-of-concept work at the outset, but have since not gotten around to actually building out a UI (which is kinda important for sharing something like media).

UI

  • Standard Functionality

    My main thought around Web development (which will be covered elsewhere later) is that the base level functionality provided by modern browsers is more than sufficient for most uses, as long as the developer is able to design systems decently well. In most healthy ecosystems weight usually builds on one side of a boundary which allows a commensurate reduction in weight on the other side: either a foundational system grows richer and the solutions built on top of that foundation grow smaller or an underlying kernel grows smaller and more focused and more specialized work is offloaded to those solutions which are built on top of it. Currently Web development seems to be growing on both sides at once where the languages and runtimes continue to grow more powerful but the systems built tend to remain at increasingly levels of abstraction above that foundation. While this is likely beneficial for some of the more immersive uses of the Web it feels very heavy handed for basic sites and SPAs. My intent would therefore be to just use functionality readily available in a browser: however that does require the aforemented design and dot connection in a space which I honestly don't have much interest in, which is a largely a reason the UI was left alone for a while.

  • React

    The UI is ultimately going to be built using React. The main reason for this is that it is widely popular and I'm currently using it at work. More thoughts around React will likely be captured separately. The starting point will be the typescript template provided by Create React App8, and the UI will be developed using functional components and hooks with minimal additional dependencies.

Hardware

Acer C738T N15Q8

After just under six years of use I decided to replace my Acer C720 Chromebook. I had an issue with a stuck key (which later worked itself out) and the power connector has become very loose, all of which felt like signs of the impending demise of a relatively old budget system. I decided to opt for a two-in-one as I'm without a tablet at the moment also and much of the planned use for the system would be more suitably done in tablet mode.

Swing For the Fences or Bunt

Upon cursory evaluation of options I realized that I would either want to spend a fair amount of money for a notably nice system, or try to minimize cost while satisfying immediate needs. As alluded to previously the main purpose of this system is lightweight work and concerns such as portability (and potential disposablity) promise more value than power or bells and whistles. Additionally the combination of the likelihood of my normally working from home post-pandemic and that I'd likely want to make sure any significant investment was towards a system that could be beefed up with GPUs to support deep ML both steer me more towards building a desktop system if I were to invest in a powerful personal system. All of these factors lead me down the cost minimization path.

Chromebook

Given the intended uses and budget I had no particular OS or vendor in mind. There are manufacturers that I'm drawn to but for the most part such opinions are unsubstantiated and woefully dated. One driving force for me, however, is that if the stock OS ends up feeling in some way deficient I like to have the option to replace it with Linux and I'm unlikely to spend a substantial amount of time to get that installation working. For me that line is that kernel customization is acceptable but needing to write or tweak or track down driver code to smooth out hardware issues is not something I'm looking for at the moment. This loss of this time would also likely be a particular nuisance given that additional effort would be expected to configure much fo the two-in-one behavior in a fresh Linux install. Given the options for cost and the inherent Linux hardware support, I was led to replace my old Chromebook with a new Chromebook.

ChromeOS

With my previous Chromebook, ChromeOS stayed on for a couple months before being replaced by a fresh (Gentoo) Linux install. I used what was then Crouton for a bit which was awkward but passable, but ultimately I wanted to use Docker and the kernel upon which official ChromeOS was based at the time did not support that (and just going to a standard Linux distributrion represented a far more visible path than building a custom ChromeOS image). There were several other pieces of software which I wanted to have locally (such as systemd) but I think Docker was the straw which led me to give up on ChromeOS. I figured the new device would follow a similar trajectory where I'd work with and work around ChromeOS for a bit until I hit a blocker at which point I'd wipe the system and install Linux. I'd heard and assumed the official and community support for more enanced functionality had matured, and was looking forward to experiment with the support for Android apps along with the discovered beta support for Linux apps. Aside from having to chase down some related settings to enable things in my managed GSuite account, everything was working smoothly and the pivot point was expected to once again be Docker.

For development Docker is often a compelling but practically avoidable convenience. Conversely often the additional isolation of namespaces can complicate or introduce churn into local development. In terms of values such as reproducible verifiability that containers can also provide, such functionality can (and should) also be provided by less golden machines such as CI servers. All together therefore abandonding any working system for the sake of having Docker for development would not be done lightly. Though I had plans to delay pursuing using Docker I made it less than a week before having motivation to attempt to install it (as part of verifying the publishing pipeline for this site). To my delight Docker installed and was able to be started without issue so I'm now left with no expected reasons to need to replace ChromeOS on this system. That ChromeOS may be becoming a viable off-the-shelf OS for more advanced uses is an exciting prospect; though it will be interesting to see how that may evolve in light of Project Fuschia and any resulting tension that may cause with the relative dominance of Linux in the open source community (which includes my personal biases).

Software

AWS

The primary cloud provider for my professional work has been AWS. Here I'll try to capture some hopefully useful information gathered while working with AWS offerings.

AWS provides solid documentation for each of the services, but this may omit some information about practical usage. In particular AWS provides a catalog of building blocks which (seemingly moreso than other cloud providers) may require or benefit from non-trivial combination; additionally while many services have fairly obvious roles and relationships (such as S3 or EC2) there is also a raft of services which overlap such that there application for a given solution requires more in-depth differentiation.

AWS Database Migration Service (DMS)

AWS DMS9 provides what is says: database migration. This can be a highly convenient option to move between databases as long as either the source or destination is in AWS. DMS supports a variety of databases with a nice separation between the sync process itself and the endpoints which act as interfaces to the underlying databases. DMS provides change data capture (CDC) in addition to full database loads and can therefore support a prolonged or incremental migration effort.

A distinguishing factor of DMS (as indicated by its name and the first sentence of this section) is that it is oriented towards migration. Migration in this sense could be considered analogous to a bounded replication where after some trigger the replication is halted and likely the source of the migration is replaced with what was the destination. DMS is not designed for ongoing replication. The CDC behavior it provides could certainly serve such a purpose and be a compelling alternative to solutions like Maxwell's daemon, but DMS is not designed for that purpose and some associated use cases or optimizations may therefore be underserved.

Emacs

I used Emacs on and off for many years; after several attempts to use other editors and crawling back to Emacs I've finally settled on thoroughly investing in it and using it as my primary editor.

Emacs Grievances and Resolutions

I've had a series of concerns that have driven me away from emacs which I subsequently came to accept or embrace.

  • Emacs is Too Slow/Initial Adoption

    My initial exposure to emacs was in the late 1990s when it would for one reason or another accidentally open on Linux machines I would use. At that time it would generally elicit a groan as I waited for it to finish loading and then either stumbled through using it or closing it and switching to vim. PCs at that point were not slow through a long lens, but slow enough that emacs (at least without some optimizations) was sluggish enough to put me off. It also felt a bit too alien which may have been aggravated by cosmetic similarities to simpler editors such as Notepad acommpanied by the significant behavioral differences; but that's just speculation about thoughts I may have had ~20 years ago.

    Several years after that I was in search of a new editor (which has been a periodic quest of mine). At that time I had started down the path of using Eclipse but was looking for something smaller and more flexible and figured that PCs had sped up enough by then that the previously perceived cost of emacs would be gone (there's obvious cost in Eclipse also but that is a more obvious price for what it is). I started actively learning emacs and was quickly drawn into the discoverable self-documentation that it provides and the REPL-y model of being able to dynamically execute code within and against the editor environment.

  • Too Basic

    Emacs quickly became my primary go to editor and I started down the road of trying to use emacs for everything. Shortly thereafter I had jobs which involved a fair amount of Java coding and so I ended up using Eclipse and IntelliJ IDEA to match the tools familiar with the rest of the team while I got my bearing in unfamiliar terrtitory. Emacs gradually faded into the background while I used shinier alternatives for most of my work and only used emacs for simpler purposes. At this point I fell into the common trap of overreliance on my editor. At the time I somewhat attributed this to a faction of the Java culture which leaned heavily on tooling, but now I'd argue that it is an instance of a nuanced continuing tension which falls out of having more powerful tooling but it certainly does not have clear boundaries.

    Emacs is certainly capable of providing similar functionality but my attempts left me stumbling over package selection and lower level details which were all bundled up neatly in the newer IDE alternatives.

    This pattern was broken as I was preparing to change jobs and practicing for coding exercises. Much of the convenience that IDEs provide can be used to produce some parts of code far more easily, but also with less understanding. Switching back to emacs broke that spell over me and forced me to think far more intentionally about the code I was writing and the value the tool was providing for me, so I switched back to emacs.

  • Avoiding the IDE Project Model

    An immediate practical advantage of a simpler editor is that IDEs track their own model of the projects on which you're working. This can often lead to obvious issues such as development processes being defined in terms of specific editors or even specific versions of those editors and as a result not being directly portable to any system not using that editor (including systems such as CI). A less obvious issue is that this also normally introduces a large amount of information and configuration about your project which is cached or indexed internally to the IDE. This alternative projection of your project data invites new issues when this IDE managed state falls out of step with other definitions or when drift is introduced as that internal state evolves.

    This often manifests as the IDE misbehaving until such state is flushed. Ideally the IDE will coordinate such state though a manual refresh is often a backup option; I've had to occasionally wipe out a project entirely, however. While this is normally not a major issue to correct once recognized, that recognition can waste a fair amount of time. Normally the need for such refreshing is also tied to other changes and therefore isolating the problem down to the IDE rather than those changes may not be straightforward. A variation which is particlarly likely to invite confusion is where the poor behavior of the IDE is that it continues to behave properly when the canonical project has been broken in some way.

    These pitfalls are likely to be addressed through improvements to the IDEs and through new habits/knowledge but they impose additional overhead which doesn't produce value at runtime and the reconiliation of reproducibility between two models is non-trivial.

    Most often much of the additional power of IDEs aims to allow developers to work with larger projects without getting lost amidst the project size: or from another angle they allow projects to sprawl out and then superimpose organization on to that sprawl. The alternative is to structure projects such that they can be navigated and exercised without looking for a tool which may be sweeping mess under the rug. While larger projects may benefit from some such features within or without an IDE, more intentional use of organization and standard build tooling confers a more inherent coherence to the project itself. A large swatch of software currently being developed should also be of a small enough size (potentially something like one bounded context) that if the code is not grokkable without the support of tooling it's likely just a result of sloppiness.

  • Standard Elisp is Too Primitive And Scattered

    After using emacs for a while and starting to try to pay closer attention to available packages, a major gripe is that elisp feels too low level.

Exercism

Excerism is a nice language focused coding exercise site. One of my coworkers introduced it to a bookclub we had started at Brightcove (where we were doing the Elixir track as a group). As I'm starting to conscientiously pick through some languages it's very useful to exercise (appropriately) the languages being used.

In using it there certainly seems to be a range of quality between tracks and exercises, though that should certainly be expected for a community provided service.

I've added a profile for myself where I'll dig into particular languages. I'll start with C since I'm currently brushing up on some basic algorithms and C provides a reasonable battery construction kit without inlcuding many of note (and remains the dominant system level language).

GitLab

Why GitLab

I use GitLab as it is a git hosting service which follows an open core model. I generally try to make use of the facilities provided by git itself and more manually composed pieces rather than relying on functionality provided by any centralized service, so from that perspective the choice of hosting service becomes somewhat abitrary. The main driver for adopting GitLab is therefore due to the aforementioned open core model and lending support to a relative underdog.

Some of the additional features provided by GitLab may be utilized, however the underlying functionality will be defined within the source code or similarly and GitLab will simply act as an execution agent.

GitLab vs. GitHub

GNU/Linux

Motivation

GNU/Linux is adopted as it is open-source, has a very large community and widespread support, and is also the primary target for containerized deployments which allows my local environment to more closely match anything which may be deployed (and therefore more work can map more directly).

  • Alternatives
    • Microsoft

      For tool compatibility, any preferred OS would be POSIX compliant. Last time I used a Microsoft OS (~10 years ago) it didn't fit naturally into that category; I think they may be more in line currently the fear would be that it still represents too significant of an ecosystem split to warrant crossing (maybe next time I get a device).

    • Apple

      Apple OSes are a compelling alternative, particularly given that OS X allows for convenient development against both OS X and Linux whereas the converse case of developing for OS X on Linux is non-trivial. I often use Macs for work computers and am also likely to recommend Apple devices generally to others, however I prefer Linux due to it supporting a higher level of tinkering and a wider range of devices. As previously mentioned Linux is also far more likely to be aligned with deployment environments (Apple seems to have little penetration in the server space), and in particular the differences between OS X and Linux can often be subtle enough to not be immediately obvious but cause subsequent consternation.

    • Others

      There are a variety of other POSIXy operating systems available, some of which promise some conceptual advantages over Linux; Linux largely adopts models which are similar to older versions of UNIX which have proven to be remarkably robust but have been incrementally augmented in other OSes). Linux has a far larger community and knowledge base from which to draw and therefore promises more pragmatic economy.

MongoDB

Restoring Data Onto a Different Replica Set

  • TODO Add sources

    In an ideal MongoDB deployment you have fully operationalized and independent environments with things like proper disaster recovery in place for each enviornment. In reality however the production environment may be the only one that's actually important and ay pre-production environments may ultimately just act as a disposable proxy for production. In any relatively small engineering team such an environment may not justify significant investment and may suffer some minor neglect as a result. In such a scenario if that more disposable environment does end up being disposed of, a recovery plan may involve loading a sanitized copy of production data into that pre-production environment.

    The MongoDB documentation provides ample information on restoring backups from data files, and messing with some of the internal databases should allow you to get all of your replica sets and any relevant configuration servers for sharding mapped out properly. This can be a hassle if you're not using some kind of management interface that eases the pain, but at the end everything should be pointing to the right place.

    BUT…those instructions are suitably focused on restoring a backup to a cluster which matches that from which the backup was taken and there can be some additional wrinkles if you're doing something like restoring a backup from one environment into another.

    A particular issue I've run into is around the target environment being on a much lower "term" than the source environment. Unfortunately `term` is a generic enough…umm…term that it's likely to not be a useful hint by itself without a fair amount of background context and so its significance in associated errors may be glossed over. I no longer have the specifics for when I encountered this error and will need to dig in a bit to collect more concrete information. At the time I ran into the issue I ended up having to poke through the MongoDB source code for the significance of the term "term" to click (which was also referenced in a relevant StackOverflow post). This could affect data nodes or also config nodes as newer versions of Mongo use replica sets on both.

    The "term" here is a reference to an election term which is part of with the Raft-based consensus that MongoDB replica sets use to coordinate understanding of which node is the primary/master node that should be accepting and replicating writes to the data: thereby being responsible for consistency and durability. The term number is used to make sure that all of the nodes in the replica set are up to date as operations are replicated to them, and so if a term number in the oplog (such as retrieved from a backup) is higher than that of the cluster then the cluster ends up in a state where it considers itself out of date and therefore unable to safely handle writes. There are likely more rigourous and correct explanations of this around.

    Messing with this number is not something which is readily exposed, and while there is likely a better way to handle this issue a quick solution is to just force the cluster ahead as many terms as are needed. Note in particular that this solution may be suitable for a relatively disposable cluster but elections impose a short period of unavailability and therefore triggering a flurry of elections will cause a proportional amount of downtime which would be unacceptable for any service that should have relatively high availability.

    An election can be instigated by having the primary stepDown, and therefore terms can be incremented by repeatedly issuing a stepDown. As this is a relatively hacky solution in any case the simplest solution would be to issue a stepDown request to all of the nodes on the cluster in a loop until the required term is reached. Such a solution could be done in bash with something along the lines of:

    while true; do for h in ${mongohosts}; do mongo –host ${h} ${mongoauthoptions} –eval rs.stepDown() done sleep 10 done

    That command may certainly need some tuning as I'm not able to easily verify anything about it, and it also may need adjustment based on how connections to Mongo can be established. Harmless errors/noise will be returned for the secondaries. The outer loop could also be modified to break at the desired term number but that's out of scope for this quickly thrown together memory; similarly retrieval of the active term may need an additional line in there. Some variation of the above with some trivial monitoring should do the trick, and a key safety factor is that so long as the cluster has a term at least as high as the terms in the operations things should work so if the term is advanced too far it shouldn't cause issues.

Sources

Footnotes:

Author: root

Created: 2020-08-04 Tue 00:05

Validate