Matt Duck

Why I write commit messages

hi@mattduck.com (Matt Duck) — Mon, 04 Sep 2023 13:35:00 +0100

1.1. Intro

When I'm working on professional software projects, I like to include git commit messages for my changes. From my experience so far¹, most engineers do not write commit messages at work, but I've always found it hard to see a strong argument against doing it.

What do I mean by a commit message? I don't mean using git commit -m to add a one-line subject like fix: #123 fix db null issue. I mean letting git commit open your $EDITOR and typing in sentences or bullet points to write a body for the message, similar to how you'd include a description in your pull requests.

I've written messages for a lot of the changes I've made in my software career so far, and I find myself doing it more as time goes on. Coming into my first proper software job, I had read content like Tim Pope's² post on writing well-formed commit messages, and I assumed that other engineers were going to expect this of me, because it sounded like a sensible professional practice. This turned out to be wrong: amongst the engineers I've worked with – good and bad, from CTOs to interns – hardly anyone ever includes a message body in their commits.

This post will run through the kind of things I write about, why I find it useful, and some arguments against writing messages.

1.2. What I write in the message

I try to focus on providing the most important context that can't quickly be inferred from the diff. I'm writing for two audiences: the engineers I'm working with now, and unknown readers in the future. I don't know who the future reader will be or how much they already know, so I don't assume much existing familiarity. I also try to lay things out clearly and explicitly, just in case they're trying to debug a production issue late at night.

Changes, context, other considerations

I usually focus on three things:

What I'm changing: yes, this is technically described in the code, but if your change is more than a couple of lines, it puts a lot of work on the reader to make them figure it out from the diff, and that gets amplified if the reader is looking at multiple commits. And, the literal code committed won't necessarily have the impact you were trying to make, so it's helpful to describe the intent of the change.
Context for why I'm making the change: the "why" is often harder to infer than the "what", so requires some description. I try not to just link to tickets, because again that puts all the work on the reader.
Considerations for the reader or for code review: the risks of merging the change, gotchas, anything non-obvious that's worth highlighting. I try to make it explicit if there are parts of the change that I'm unsure about or don't understand well, or if I intend to follow up this change with further work.

I often format the message to show the what vs why vs other considerations separately, as this makes it easy to read at a glance, and it's also a nice format to copy straight into a PR description.

Examples

This is the kind of thing I'll write. I've made these up, but they're inspired by changes that I worked on in past roles:

Mitigating a production issue

fix: IM-456 change error handling order to defend OOMs and similar

Changes:
- In the main ingestion process, change the processing order to exit
  early: fetch the job, check if it's been taken off the queue more than
  3 times already, and if so bury it straight away instead of trying to
  process it.

Context:
- New payloads from account X are causing this process to OOM when we
  try to update their customer list. The way error handling was working
  was a big try/except: if it errors, catch the exception, and check if
  we've attempted the job more than 3 times, then bury it. This was
  problematic because the OOM was preventing us from ever hitting the
  "bury it" stage.

- This is currently affecting production and significantly decreasing
  throughput of the ingestion pipeline because the process keeps picking
  these messages up and OOMing. In theory burying should be our
  dead-letter solution but it's preventing that. Need to fix ASAP.

Considerations:
- One consequence I'm unsure about: we do this thing where we catch
  postgres lock exceptions and retry them indefinitely. They will now
  only retry 3 times -- what's the impact of that? We're pretty happy
  that it's safe temporarily.

- I've NOT deleted the old code as part of this. I want to minimise
  surface area of this change to reduce risk as test coverage isn't
  great.

- Alternate investigation ongoing: what is causing the OOM and how do we
  fix it. That's a bigger change.

A basic refactor

refactor(corelibs): DEF-789 move functions from helpers to lib

Changes:
- Move the db and error helper files to live in the lib directory.

- Delete the helpers directory.

- While I'm here, delete a few old util functions that aren't used.

Context:
- We have three separate places that we use to store util functions. The
  duplication is confusing and it's not clear if there's a difference
  between them or where we should put new functions. Nobody is sure, so
  this removes one of them to start simplifying.

Considerations:
- More work to do here, I'm just starting it as I have a spare 30
  mins. This at least moves it in a more sensible direction.

A performance improvement

perf(shipments): ABC-123 improve latency of shipments SQL query

Changes:
- In the SQL query used to power the shipments page, pull the aggregate
  subquery up into the top-level query.

Context:
- This allows postgres to optimize by only running the JSON aggregation
  functions on the selected rows, instead of on the whole underlying
  tables.

- The result on prod data is that loading the base report on account X
  I see a decrease from ~400ms to ~40ms, and it improves the more you
  narrow the base query. If I add filtering for the "customer segment"
  attribute, the difference becomes 400ms vs 1ms.

Considerations:
- I've used `select * from old_version except select * from new_version`
  to confirm these return the same results. If tests pass I see no risk.

- We see regressions on this query sometimes - really need to get some
  performance testing in place to prevent that, but that's out of scope
  here.

Things not to care about

Some of the more prominent commit message advice is based around formatting: writing in a particular tense, keeping subjects at 50 characters, bodies at 72 characters, etc. Personally I like to see that consistency, but it's not where the value is, and if anything it creates barriers to entry if you're dogmatic about message style. You have to pick your battles when collaborating at work, and as long as someone doesn't break standard git tools, then the part that's valuable is providing the information.

1.3. How I find commit messages useful

So, we have a git log filled with these messages… why is this useful? Over the years I've seen various benefits:

Knowing my past intentions

Even if nobody else read my commits, I would continue to write messages for myself. I will not remember every change I worked on in a few months, let alone a few years, so it's helpful to have those messages from the past. If I think my old code is terrible, then I can hit git blame or read the log for that file, and see that I articulated a particular reason for doing it that way two years ago – or more likely, see that there wasn't a good reason and the code really is just terrible. Either way it helps me make better decisions.

Another consequence of this is that reading a proper description jogs my memory a lot more than looking at a one-line link to a ticket number. Or, sometimes the reverse happens: a colleague will ask me a question, and I won't remember the exact answer but I'll remember that I worked on the relevant code change and I can find the answer quickly by grepping the git log.

Providing context for other engineers

Commit messages scale nicely, not just through time, but also out to a lot of people: the messages are useful for other engineers who don't know what was in my head when I made the change.

I believe there should be professional expectation amongst software engineers that we'll provide some context for our changes somewhere. There's nothing more annoying than being confused about a piece of code that's breaking production, and finding the author made absolutely no effort to communicate what the change was and why it was made: no commit message, no PR description, no detail on the ticket, no code comments, no documentation, and they left the company last month – great, I guess I'll spend the rest of my day figuring that out then. Better hope this is happening at 10am and not 10pm.

My hope with my own work is that, if I'm away, and even if it's a few years on, other engineers on the team will have an easily-accessible window into some of my rationale and thought process: what I was trying to do, why I was trying to do it, risks I knew about, anything I thought was non-obvious at the time. My reasons might hold up, they might not – what's important is that other engineers can see what I was thinking.

This is especially important if the team has low confidence when making code changes: maybe the relevant experts have left, or there's legacy sections of the codebase that haven't been changed for a few years, or test coverage isn't good enough to provide certainty when refactoring code, or there's an ongoing incident and everyone is anxious about pushing the right fix. Again, the benefit is being able to make better decisions – quicker and with more confidence – because you have immediate access to information that improves your understanding of the situation.

Looking back at accomplishments

This is another personal benefit. If I run git log --author=duck, I see a clear chronological list of all the code changes I've made on a project, with much higher fidelity than I could possibly provide if I had only written a one-line subject.

I find this a useful tool for understanding what I've actually done this quarter, or year, or even in my time at a company³. Whenever I've left a job, I've always checked the git logs shortly before leaving, and each time I've found changes that I had forgotten about but that were valuable contributions or interesting technical items, suitable for pulling into future interview conversations, or to include in my CV.

Spreading knowledge and awareness

I'll often run git log to see what changes have happened in a project recently. Usually I come away with limited information – I can see who is committing to a repo and some of the keywords in the commit subject lines. But when those messages contain just a few sentences on what the change is and why it's being made, then I build up better awareness of what's being done and I learn more from my co-workers. Particularly in a remote-heavy environment, you have to be explicit about building that kind of awareness in a team by repeating the information and making it easy to find, and the git log is an ideal place to do that because people will see it naturally.

This kind of knowledge-sharing benefit also applies to historic commits: the git log can be a great learning resource when somebody new joins the team, as long as it contains appropriate detail.

Focusing my code changes

The act of writing a commit message forces me to think about the work: if I can't write a concise explanation of what I'm doing and why, then maybe I shouldn't be doing it, or maybe I need to be breaking the work into smaller logical changes. Or, like rubber duck debugging, I might realise something new about my implementation when I try to explain it. Or maybe I can't articulate a message because I don't understand the technical or product details well enough, and I need to take a step back and do some learning. By doing this in the commit message, it forces me to confront it early and explicitly.

Having the information at my fingertips

Some of the value I've mentioned applies to the practice of writing PR messages more generally on a git forge service like Github or Gitlab. And, those services do have benefits that aren't easily replicated in the git log – eg. videos and images, or comment threads.

But the big advantage git has is that, if you're using it, it's at your fingertips through the whole process of writing software. I will probably check the git log after I've run pull. I'll certainly check it as part of any rebasing or merging work. If I'm curious about a line of code, I'll run git blame and look at the commit that introduced it. It's there throughout the development process, and there's value in having that extra information right in front of you in the tool you use every day, in a way that's seamless to access.

I don't think having a merge commit that links to a PR or a ticket is an adequate replacement. Maybe as a one-off, the few extra seconds clicking through the link aren't a problem, but for something that I do many times per day it adds a lot of friction. If I'm debugging a production issue and I have five commits that I'm interested in, and they all just link out to other PRs, and then some of those PRs just link through to other tickets, it significantly reduces how quickly I can understand details about the codebase and its changes, and increases the cognitive load at the exact time that I need things to be quick and easy to understand so I can debug an urgent problem.

Another place where git has an advantage is searching: yes, I can go to Github, search for a term, queue up browser tabs for all the found PRs, and look through each of them individually to find what I want. But searching the git log in my terminal is absolutely trivial and instant: I type git log, it opens less or another pager, I hit forward slash, I type my search term, and I can page through the entire history of the repo interactively in a few seconds. Or I can run it through grep, or narrow it by directory or a particular set of files. The experience isn't comparable – it's literally 10x faster to find what I want using git, the data is all local so I can do it offline, and we don't need any special tools other than git itself.

1.4. Arguments against?

I think there are valid and less-valid arguments against including message bodies. I'll include them roughly in order of how strong I think the argument is:

Production is broken, I don't have time for this

That's fair – if it's very urgent then link to a ticket or postmortem doc that can be updated later. If you can spare a few minutes though, keep in mind that production incidents are a time when clear, explicit reasoning and communication around the code change are particularly important.

I've already included this info in code comments

That's a pretty good reason. If my diff contains useful comments or docstrings I'll sometimes just mention that in the commit message. You can also copy useful info from the code comments into the commit message, the duplication isn't a problem.

Laying out my thought process is intimidating

For more junior engineers, I think this is understandable. The times where I've really not wanted to explain myself are times where I've not entirely understood what I'm doing or why I've been asked to make a particular change. If I just don't leave a message, then I avoid the risk of getting it wrong and looking silly, I don't have to admit to anyone that I don't understand, and I won't have my ignorance encoded in writing forever. It's easier to not do it.

This is just something you have to overcome: the career involves a lot of not knowing things and having to learn them, it's normal. For me it's become easier over time to share when I don't know something, because I'm more secure that there are things that I do know, and I've had times where I've seen the payoff in those old messages – the value has outweighed the embarassment.

I prefer to put this information in the PR or somewhere else

This is the reason I've heard the most – people think it's valuable to leave some contextual information somewhere, but prefer to do it on the PR or in an issue tracker.

The git log certainly has limitations. It's not a place for video or images, it doesn't contain your review comment threads, it's not very accessible for Product or other orgs, and it only captures state at the point in time that you made the code change. By using a tool that's decoupled from the commit, you can use richer media or access information that wasn't available when the commit was made – eg. maybe you can see deployments in your git forge tool or alongside a ticket. That information can be extremely valuable for understanding the particular code change and its impact, and so there are more suitable places to put this contextual information than the git commit.

To this I'd argue: if your tooling is consistent through the team, it works and it puts all that information in front of you instantly in the same way that running "git log" does, then that's excellent and it's a workflow to aspire to.

But, just because PRs and other tools can hold similar information, it doesn't make the git log useless. If you're using git to commit changes, then people will very likely want to look at the log at some point, and it doesn't hurt to duplicate some information so that the commits can be read more easily.

As mentioned above, I think git puts the information in front of you in a more naturally accessible way than forge tools – I don't find PRs an adequate replacement. But even if you prefer the UX of web-based tools, there's another argument for git: open-source version control tools are likely to outlive the data in SaaS services. This is admittedly low down on the list of problems for startups, but people do make repos private, they do misconfigure Jira and lose state, git forge companies accidentally delete production databases, license or pricing changes can force you out of a service, servers go down, accounts get closed, companies go out of business, and sometimes you just decide to change to another provider – shit happens and that data can go away, or the ties between the data and the commit can be broken. Meanwhile every engineer in the team has their own immutable offline copy of the git history, where the commit messages will persist for as long as the code does.

I'd have to significantly change my git workflow

Some personal git workflows make it harder to write messages. If that's the case for you, then that probably means the workflow also makes it hard to make and merge useful logical commits, and I'd argue that you should consider changing it⁴.

Some team workflows also make it harder to write messages, and if you can't influence the team to change conventions this one can be awkward to work around. For example, if your team's tooling is forcing a squash of your commits when the PR is merged, this will replace your deliberate commits. If that's the case, hopefully you can disable that feature on a per-PR basis – if not one thing you can do is include a clear message in the merge commit description.

I explain all this stuff over video calls or IRL

Face to face communication can be higher-signal, and some things can be better communicated verbally, but unless you're recording the call and providing a transcript or notes, it's all lost the moment the call ends, and it's not accessible beyond the participants of that one meeting. If I come across the commit in six months time, I won't remember the full content of that discussion, and I might not even remember that we had a meeting about this change. Writing is more permanent and increases in value over time as people forget. At the very least, record the video or audio and link to it.

It takes a long time to write a message

If you don't want to spend 10 minutes extra writing a message for a logical code change, I think you should reconsider as the ROI for the team is significantly higher than that investment⁵. If you really struggle and it takes hours then this could be a valid reason. If you have a way to share screencasts with your team, that can be an alternative solution.

If you already write notes on your PRs, then there's almost no extra cost at all, as you're already spending time writing about the change.

Commit messages can be wrong so you should just look at the diff

Useful commit messages will highlight context that can't be reliably inferred from just reading through the diff – that's the value of them. Certainly it's possible to write bad messages, but I don't think the potential existence of bad messages should prevent you from trying to write a good one.

I prefer seeing a one-line commit log without the extra noise

I want that too sometimes – there are various ways to achieve it, including the git log --oneline flag.

I see no value in the messages

I do appreciate that not everyone will want to read messages or find them useful, but I've seen it help myself and other engineers in the past, and I've seen the cost and frustration when key context about a change hasn't been made easily available. Even if you personally never want to read the messages, there's no big cost to adding them: you're not making a tradeoff where you're losing something by including extra information in your commit message, and they're easier to write than code comments because you don't have to worry about how they'll change in the future – you just have to capture what you're thinking now. The only thing you'll lose is a small amount of time invested in writing and sometimes editing the messages.

1.5. I'm sold! How can I get started?

The short answer is just don't use the -m flag, write message bodies in your editor or tool of choice, and focus on information that won't be apparent to the reader from the diff. For me this usually falls under one of three categories: the changes I'm making, the context of why I'm making it, and non-obvious considerations for the reviewer or other future readers. The more you do it, the easier it will become.

I started writing more detail on this section, but nobody wants to read a post this long about commit messages. Maybe another time.

1.6. Summary

TL;DR: The best time to write a commit message was 20 years ago⁶, the second best time is now. I find having detailed commit messages very valuable, and encourage everyone to do it if you're collaborating on professional software that needs maintaining. It's not much different to writing PR descriptions – you just put the info in the commit message body so it's easily accessible in git commands like log and blame. The benefits include:

Being able to see what you were thinking months or years ago when you made a change.
Being able to see what other engineers were thinking when they made their changes.
Having a clear chronological log of work you can look back through.
General spreading of knowledge and awareness in a team.
Forcing you to explicitly understand what changes you're making and why by writing it down.
Having this information at your fingertips in the tool we all use to manage changes in our codebase every day, and in a format that will persist for as long as the code does.

I've long given up thinking I can convince everyone to do this, and sometimes I wonder if I'm missing something because I've worked with plenty of good engineers who don't do it. Sure, it's not as important as shipping features, and you can certainly do good work without caring about commit messages. But my view is that the pros outweigh the cons and the ROI on writing messages for your git log is well worth it. It doesn't cost anything, it scales well across people and timezones, its value increases over time, and it presents useful information in an easily-accessible way that you just don't get from just using the web tools like Github.

Extending use-package's :bind to support evil and keymaps

hi@mattduck.com (Matt Duck) — Mon, 28 Aug 2023 09:55:00 +0100

Use-package is a popular macro for declaring, configuring and organising packages in your Emacs config. One of the features it offers is the :bind keyword, which allows you to declare bindings like this:

(use-package org
  :bind (:map org-mode-map
              ("C-c C-y" . org-store-link)))

This will bind C-c C-y to the function (org-store-link) in the org-mode-map keymap.

There are two limitations that I've wanted to fix for a while. The first is that I use evil to provide vim-style modal bindings. Evil's use of keymaps is slightly different to the usual Emacs convention as it contains an additional piece of state -- the editing mode. It provides (evil-define-key), which lets you do something like...

(evil-define-key 'normal org-mode-map "gk" 'outline-previous-visual-heading)

...to bind to org-mode-map specifically in evil's normal state. I don't believe it's possible to make these evil-state assignments using the :bind keyword.

The second limitation is that sometimes I want to bind a key to a keymap. Eg. maybe I want to assign h to help-map:

(use-package emacs
  :bind (:map my/leader-map
              ("h" . help-map)))

This doesn't work, because when the :bind handler expands, it quotes all of the values of the associations, so you end up with (bind-key "h" 'help-map my/leader-map). This gets interpretated as a function and fails -- to reference the keymap you need to pass it in without the quote⁷. Use-package does provide a :bind-keymap feature, but that expects the keymap to be defined in the package you're configuring, and it doesn't allow you to specify a :map -- you can only create global keymap bindings.

2.1. Extending use-package with new keywords

Enter: use-package extensions, which I explored yesterday for the first time. It's pretty easy to add new keywords to use-package. You have to add three things:

;; 1. Extend use-package-keywords
(add-to-list 'use-package-keywords :my-keyword t)

;; 2. Validate and return the args passed to the keyword
(defun use-package-normalize/:my-keyword (name keyword args)
  args)

;; 3. Handle the keyword -- expand into the code you want it to run.
(defun use-package-handler/:my-keyword (name _keyword args rest state)
  (let ((body (use-package-process-keywords name rest state)))
    `((with-eval-after-load ',name)
       (message "%s" ,@args))))

;; This will print Hello, world after org has loaded.
(use-package org
  :my-keyword "Hello, world")

If you use pp-macroexpand-last-sexp, you can see how the macro expands, which makes it a lot easier to understand what use-package does under the hood. For me, our use-package declaration above expands to:

(progn
  (straight-use-package 'org)
  (defvar use-package--warning46
    #'(lambda
        (keyword err)
        (let
            ((msg
              (format "%s/%s: %s" 'org keyword
                      (error-message-string err))))
          (display-warning 'use-package msg :error))))
  (condition-case-unless-debug err
      (let
          ((now
            (current-time)))
        (message "%s..." "Loading package org")
        (prog1
            (if
                (not
                 (require 'org nil t))
                (display-warning 'use-package
                                 (format "Cannot load %s" 'org)
                                 :error)
              (with-eval-after-load 'org)
              (message "%s" "Hello, world"))
          (let
              ((elapsed
                (float-time
                 (time-subtract
                  (current-time)
                  now))))
            (if
                (> elapsed 0.001)
                (message "%s...done (%.3fs)" "Loading package org" elapsed)
              (message "%s...done" "Loading package org")))))
    (error
     (funcall use-package--warning46 :catch err))))

2.2. The implementation

Back to the :bind issue: I don't want to modify the default behaviour of :bind, but I can imagine a new keyword that provides similar functionality. The implementation I got to was this:

(add-to-list 'use-package-keywords :md/bind t)

(defun use-package-normalize/:md/bind (name keyword args)
  "Custom use-keyword :md/bind. I use this to provide something similar to ':bind',
but with two additional features that I miss from the default implementation:

1. Integration with 'evil-define-key', so I can extend the keymap declaration
   to specify one or more evil states that the binding should apply to.

2. The ability to detect keymaps that aren't defined as prefix commands. This
   allows me to define a binding to a keymap variable, eg. maybe I want 'h'
   to trigger 'help-map'. This fails using the default ':bind', meaning that I
   have to fall back to calling 'bind-key' manually if I want to assign a
   prefix.

The expected form is slightly different to 'bind':

((:map (KEYMAP . STATE) (KEY . FUNC) (KEY . FUNC) ...)
 (:map (KEYMAP . STATE) (KEY . FUNC) (KEY . FUNC) ...) ...)

STATE is the evil state. It can be nil or omitted entirely. If given, it should be an
argument suitable for passing to 'evil-define-key' -- meaning a symbol like 'normal', or
a list like '(normal insert)'."
  (setq args (car args))
  (unless (listp args)
    (use-package-error ":md/bind expects ((:map (MAP . STATE) (KEY . FUNC) ..) ..)"))
  (dolist (def args args)
    (unless (and (eq (car def) :map)
                 (consp (cdr def))
                 (listp (cddr def)))
      (use-package-error ":md/bind expects ((:map (MAP . STATE) (KEY . FUNC) ..) ..)"))))

(defun use-package-handler/:md/bind (name _keyword args rest state)
  "Handler for ':md/bind' use-package extension. See 'use-package-normalize/:md/bind' for docs."
  (let ((body (use-package-process-keywords name rest
                (use-package-plist-delete state :md/bind))))
    (use-package-concat
     `((with-eval-after-load ',name
         ,@(mapcan
            (lambda (entry)
              (let ((keymap (car (cadr entry)))
                    (state (cdr (cadr entry)))
                    (bindings (cddr entry)))
                (mapcar
                 (lambda (binding)
                   (let ((key (car binding))
                         (val (if (and (boundp (cdr binding)) (keymapp (symbol-value (cdr binding))))
                                  ;; Keymaps need to be vars without quotes
                                  (cdr binding)
                                  ;; But functions need to be quoted symbols
                                  `(quote ,(cdr binding)))))
                     ;; When state is provided, use evil-define-key. Otherwise fall back to bind-key.
                     (if state
                         `(evil-define-key ',state ,keymap (kbd ,key) ,val)
                         `(bind-key ,key ,val ,keymap))))
                 bindings)))
            args)))
     body))))

If you can look past all the cars and cdrs, the main part of the handler logic here is that, if an evil state is provided, we pass the definition to (evil-define-key) instead of (bind-key). And, we check to see if the passed in variable is bound to a keymap – and then pass it in unquoted.

2.3. The result

The result is that I can finally get all of my bindings into the use-package declaration:

(use-package org
    :after (evil)
    :md/bind ((:map (org-mode-map) ;; Expands to (bind-key), similar to :bind
                    ("C-c C-y" . org-store-link))
              (:map (org-mode-map . normal) ;; Only bind in normal mode
                    ("gk" . outline-previous-visible-heading)
                    ("gj" . outline-next-visible-heading))
              (:map (org-mode-map . (normal insert))  ;; Bind in normal and insert mode
                    ("M-k" . org-metaup)
                    ("M-j" . org-metadown))
              (:map (md/leader-map)  ;; Bind to a keymap
                    ("h" . help-map))))

This is a lot cleaner than what I had previously: no more calls to (evil-define-key) and (bind-key) littered everywhere, and I can see everything defined in one place.

One nice thing about this is that I didn't need to do any special handling to support passing in a list of evil modes because (evil-define-key) supports that by default.

This implementation isn't at feature parity with the original :bind keyword. I haven't considered any kind of lazy-loading of bound commands here, and I don't pass the arguments through to the bind-keys macro, meaning that lots of the documented features aren't implemented (:repeat-map, :prefix etc.). I don't use these things though, so I think this new implementation is actually all I need.

The code can be found in my dotfiles repo.

Preventing checkbox inheritance in org-mode

hi@mattduck.com (Matt Duck) — Fri, 25 Aug 2023 15:17:00 +0100

3.1. Background

One feature in org-mode that I've wished I could customise for a long time is the checkbox inheritance feature. Org's behaviour is that if I have a list like this…

- [ ] This is a parent item
  - [ ] This is a child nested item

…then, as soon as I check all the child items as [X], the parent item will automatically be updated to [X] too.

This doesn't fit with how I like to use nested checkboxes: generally I do use them to represent a kind of "subtask" concept, but it's not necessarily the full set of items that make up the "parent" task, and completing them doesn't mean I'm done with the parent item. I'd prefer to be able to manually trigger the two checkboxes independently: changing state on the child item shouldn't affect the parent.

3.2. Fixing it by disabling org-list-struct-fix-box

A lot of org-mode's behaviour can be customised via variables, hooks etc. However, this particular behaviour doesn't seem to be customisable, and when searching I've not found any immediate solutions or ideas that people use to achieve this.

After some poking around in the source this week, I came up with something that seems to work for me: this checkbox behaviour is implemented in a function named org-list-struct-fix-box. That's the only thing this function does, and so if we turn this function into a no-op, then I get the behaviour that I want: updating child checkbox items has no affect on the parent.

To achieve this, I define advice⁸ around the original function that prevents the original implementation from being called:

(defadvice org-list-struct-fix-box (around md/noop last activate)
    "Turn org-list-struct-fix-box into a no-op.

By default, if an org list item is checked using the square-bracket
syntax [X], then org will look for a parent checkbox, and if all child items are
checked, it will set [X] on the parent too. This isn't how I personally use
child items -- I'll often use child checkboxes as subtasks, but it's almost
never an exhaustive list of everything that has to be done to close out the
parent -- and so I'd prefer to just control the parent checkbox state manually.

AFAICT org-mode doesn't provide a way to customise this behaviour, /but/ the
behaviour all seems to be implemented in 'org-list-struct-fix-box'. And so I'm
trying something out by turning it into a no-op. It seems to work nicely initially,
but I won't be surprised if it causes an issue at some point because it's very
hacky."
nil)

That's it: all this time and I only had to write one line of code to achieve what I want.

3.3. Why use advice over `defun`?

Instead of advice, another approach would be to just redefine the original function using defun. I prefer to use advice though, so I can still see the original function, still jump to its definition, explicitly know that it exists and that I'm modifying it, etc. If I use describe-function on org-list-struct-fix-box, it will tell me that the function is advised, and as I'm using the excellent helpful package, I can see additional information including the definition of both the original function and the advice.

3.4. Isn't this pretty hacky?

Yes. I've only been using it for a few days. org-list-struct-fix-box doesn't seem to be called in many places though, so hopefully nothing major will break. The most likely issue is that I break third-party packages that depend on the original behaviour, but I don't think any of the packages I'm using are likely to depend on this particular function.

I think this kind of approach can be risky in a prod-environment program that other people depend on, but as it's just my personal config, it's fine – I'm not sure I'd go as far to say it's encouraged, but this "advice" patching concept is one of the tools provided by Emacs to customise a module's behaviour for this exact kind of situation, where you otherwise don't have the ability to achieve the behaviour you want without redefining whole functions.

3.5. Contributing upstream?

It doesn't seem far-fetched to me that this could be contributed upstream: not as advice, but as a new custom variable that can be used to disable plain checklist inheritance – you might even be able to get away with an implementation that just looks at the value of this new variable in org-list-struct-fix-box.

Another more complex approach would be to support new syntax for opting into the "inheritance" behaviour on a per-list basis, similar to how you can set your checkbox value to [/] to have it automatically show the count of completed child items.

3.6. Am I missing something?

If you're aware of a way to achieve this without patching, please let me know!

You can find my config in my dotfiles repo.

Auto-show latest heading state in org-mode links

hi@mattduck.com (Matt Duck) — Mon, 14 Aug 2023 11:55:00 +0100

4.1. Intro

I just added some features to my org-mode setup to easily update links to other org headings, to pull in the current keyword and headline from the linked item.

The idea here is that I want to treat links to org items more like a current reference to the original heading item, instead of a stale duplicate. Org-agenda is a good example of doing this: you run the agenda command and it shows an up-to-date view of the current items, their keyword state and tags, and you can easily jump from the item to the original source. Links on the other hand, have no awareness of their target item's state, and even if I copy the name of the headline item when I create the link, it will became stale as soon as I edit the original headline⁹.

The reason I want this is because sometimes I want to build ad-hoc, informal collections of org items regardless of their current state. Eg. maybe I want to pull in some items that I need on hand this week, without having to formalise what those items are with keywords or tags, and without having to keep editing the link description to always be in sync with the original item.

4.2. Demo

To make it easier to understand, here it is in action:

This video isn't supported in your browser.

4.3. First: org-mode link features

Before we get to the implementation I want to run through some of the default org link behaviour, because I don't find it intuitive.

Storing text search links

There's a function (org-store-link), which is the entry point for storing a link to an object. You can call (org-store-link) on an org item to store it in a data structure¹⁰, and then use (org-insert-link) to insert the link. If you imagine we've called (org-store-link) on an item named My target heading, the result would look like this:

I want to link to [[*My target heading][My target heading]].

Now if I press C-c C-o on that link, it will run (org-open-at-point), which will open the linked item. If you're linking to a heading in different file, then the inserted link will contain a prefix like file:/path/to/myfile.org::.

This is simple, but it has a significant downside: it uses text search to identify the target item, and it therefore depends on an exact match to the headline. If I edit My target heading at all, it will break my link. This makes it a non-starter for me because I consider all my org headlines to be mutable.

The `<>` ID syntax

One way you can get around this problem is to use the <> angle bracket syntax to specify a target, similar to setting an ID on a HTML element. You can reference this ID in a link. Eg.

* TODO <> My target heading

* My other heading

I want to link to [[mytarget][My target heading]].

This works, and it persists if you change the headline description, as long as you keep the angle bracket ID the same. It can also be used to flexibly link to any point in an org file, not just the headline. It still has a couple of downsides though:

You have to manually think of your link name and make sure it's unique in the document.
The <> annotations are going to be visible all over your document. I don't really want to pollute all my headlines like that.

The `:ID:` property

A more reliable solution to this problem is provided by the included org-id library. This provides a set of features based around the :ID: property, which gets stored in the :PROPERTIES: drawer like this:

* TODO My target heading
:PROPERTIES:
:ID:       63F85B1C-68C7-4F15-B558-BAD6810812D1
:END:

One way you can get started with this is to call (org-id-get-create), which will generate a UUID¹¹ if one doesn't already exist, and store it in the properties draw.

Org maintains a map of your org files and their IDs in ~/.emacs.d/.org-id-locations, so that the file path doesn't have to be encoded in the ID.

For a long time I didn't find the org-id package very intuitive, because it didn't seem to play nicely with (org-store-link). For example, there's an (org-id-store-link) function, which automatically creates the :ID: property if it doesn't exist, and presumably saves it somewhere. But then if you call (org-insert-link), it doesn't actually have awareness of the state saved by (org-id-store-link), and if you use (org-store-link) it still saves a text-search version of the link.

`org-id-link-to-org-use-id`

The way I solved this weird mismatch turned out to be through a variable named org-id-link-to-org-use-id. If you set this to t or another supported value like 'create-if-interactive, then it tells (org-store-link) to always use the :ID: approach, instead of the default text-search approach. (org-store-link) and (org-insert-link) will then use the :ID: linking method as you'd expect.

The `:CUSTOM_ID:` property

To further confuse the situation, there's another property that org understands named :CUSTOM_ID:. My understanding is that this is more like the angle bracket ID syntax but is used to reference a headline item and stored in the property drawer (whereas the angle bracket can be any point in the document). If you're exporting your org document, it can be used to create readable anchors instead of the random IDs that org gives you by default. To reference it in a link, you have to prefix your link target with a hash character.

It seems that if org-id-link-to-use-id is set to a non-nil value, then (org-store-link) will prefer to store a link to :ID:, but will fall back to :CUSTOM_ID: if :ID: doesn't exist. You can set org-id-link-to-org-use-id to 'create-if-interactive-and-no-custom-id to only create the :ID: property if :CUSTOM_ID: doesn't already exist. This would allow you to use :CUSTOM_ID: manually where you prefer to specify it - but you have to make sure it's a unique reference to the current document.

Sidenote on footnotes

Not strictly a link feature, but I only recently looked into the footnotes feature for the first time. This allows you to use syntax like [fn::Here's my inline footnote] to define a footnote inline, or [fn:1] to link to a footnote definition that you put somewhere else in the file.

4.4. What I'm implementing

The main thing I want is a function which, if my cursor is on a link, will update the description/contents of the link to reflect the current headline and its current todo keyword, overwriting whatever the previous description was. I want to somehow hook this into C-c C-c, because I'm used to pressing that to update different things in org.

Additionally, I want configuration and bindings to:

Use org :ID: links automatically instead of the default text search method.
Easily store an :ID: link in org-mode and org-agenda-mode.
Easily insert the stored :ID: link in org-mode.

4.5. Implementation

Using IDs instead of text match

This part is easy - I just have to do:

(setq org-id-link-to-org-use-id 'create-if-interative)

Now (org-store-link) will automatically generate and use :ID: links.

I could set also this to t. The documentation suggests there are circumstances where it might not be desirable to always do this though - something to do with (org-capture), so I'm starting out by keeping it interactive-only.

Updating the link state

Now I want my function which, if the cursor is on an org link, will lookup the linked headline and update the link to match it. To start I'm only going to support :ID: links, because that's what I intend to use and I don't yet want to spend time handling other link types.

The function ended up looking like this:

(defun md/org-link-sync ()
    "Sync an org-link to show the target headline as the contents.

When the cursor is on an org-link that uses the ID type, lookup the current state of the linked
headline, and replace the link contents with the current headline value.

For example, an \"outdated\" link like this:

    [[id:3C5473CB-3DCF-4A9B-9387-750730DAEB7B][My link contents description]]

Might be replaced by an up-to-date link like this:

    [[id:3C5473CB-3DCF-4A9B-9387-750730DAEB7B][DONE [#A] The current description of the headline]]"
  (interactive)
  (let* ((link-context (org-element-context))
         (type (org-element-property :type link-context))
         (path (org-element-property :path link-context))
         (point-begin (org-element-property :contents-begin link-context))
         (point-end (org-element-property :contents-end link-context)))
    (when (and path (equal type "id"))
      (let ((new-link-text
             (md/with-widened-buffer (md/find-file-buffer (org-id-find-id-file path))
                                     (save-window-excursion
                                       (org-open-at-point)
                                       (org-get-heading t nil nil nil)))))
        (goto-char point-begin)
        (delete-region point-begin point-end)
        (insert new-link-text))
      (goto-char point-begin))))

Putting aside (md/with-widened-buffer) for a second, the steps in the function are:

Use (org-element-context) and (org-element-property) to retrieve information about the link. This includes the type of link (we want "id" links), the path (ie. the ID value itself), and the point begin and end values which denote where the link description starts and ends (this is the part we want to overwrite).
Use (org-open-at-point) to follow the link, and (org-get-heading t) to save our new heading description, which will include keywords but not tags¹². (save-window-excursion) prevents our visible windows from changing when (org-open-at-point) is called.
Use (delete-region point-begin point-end) to delete the existing contents portion of the link. We prefix this with (goto-char point-begin) to put the cursor in the right place for insert.
With the cursor in the right place, we call (insert new-link-text) with the contents. We then call (goto-char point-begin) again - this is just a quick way to put the cursor in a useful place, although it would be nicer if this could attempt to keep the cursor in the same place as it was originally, and only move it if it's now out of bounds of the new description.

That's it - the broad approach is pretty easy.

Handling narrowed buffers

The one complication here is narrowed/restricted buffers. If you've narrowed the buffer that the link points to, then (org-open-at-point) will open the right buffer but won't jump to the right place because the buffer contents will be restricted, and (org-get-heading) will then return the wrong information. AFAIK org just doesn't handle following links to a narrowed buffer.

Emacs does provide a (save-restriction) macro, which works like (save-excursion) or (save-window-excursion) but for restoring any current buffer restrictions. So the goal here is that we'll need to jump to the buffer that the link points to, save the restriction, widen that buffer, then go back to the original buffer, and call (org-open-at-point) to follow the link - and it should always hit the correct heading because it will have access to the full widened buffer. And then then the (save-restriction) macro should exit and restore any restrictions in the linked buffer.

The ordering of these operations is a bit awkward, because (save-restriction) operates on the current buffer at time of calling. And so I encapsulated it in a macro (md/with-widened-buffer), which accepts a buffer object (or name of the buffer) that you want to widen and restore.

  (defmacro md/with-widened-buffer (buffer-or-name &rest body)
    "Widen the given BUFFER-OR-NAME, execute BODY in the context of your current buffer, and restore restrictions on the given buffer.

This allows the calling code to not have to worry about manually handling
narrowed vs widened state."
    (let ((orig-buffer (gensym "orig-buffer")))
      `(let ((,orig-buffer (current-buffer)))
         (with-current-buffer ,buffer-or-name
           (save-restriction
             (save-excursion
               (widen)
               (with-current-buffer ,orig-buffer
                 ,@body)))))))

One detail here is that we use (gensym) to ensure that our orig-buffer variable is unique and doesn't leak into the outer code.

The final detail is that we need to grab the source file from the org link before we call (org-open-at-point), so we can jump to that buffer and widen it first. org-id provides the (org-id-find-id-file) function to grab the file path associated with a particular UUID. We then need to convert the returned file path to a buffer object in order to pass it to our macro. The (md/find-file-buffer) helper function handles this:

  (defun md/find-file-buffer (path)
    "Get or create a buffer visiting PATH without affecting current windows.

This is useful in situations where you have functions that accept a buffer object but you
only have the file path."
    (save-window-excursion
      (find-file path)
      (current-buffer)))

Arguably this could just be inlined somewhere but I figured it won't be the only time I need to do this and I don't like having to manually manage the restore macros like (save-window-excursion) in the calling code.

Hooking into C-c C-c

With the function in place, how do we call it? I want to hook into C-c C-c, which feels intuitive because it's the binding you hit in org-mode to update various different elements.

I thought this would require advising (org-ctrl-c-ctrl-c), but org actually provides a hook named org-ctrl-c-ctrl-c-hook, which is designed exactly for this - it lets you extend C-c C-c to support your own behaviour. The function has to lookup the current org element/context, do whatever it wants to do, and then return t if it did something, and nil if it didn't.

  (defun md/org-ctrl-c-ctrl-c ()
    "I use this to add custom handlers and behaviour to C-c C-c.

For example, C-c- C-c is often used to update the state of org elements, and so
it feels like a natural way for me to call md/org-link-sync, because that
function updates the state of a ID link to be in sync with the target heading."
    (condition-case nil
        (let* ((link-context (org-element-context))
               (type (org-element-property :type link-context)))
          (cond
           ((and (eq (car link-context) 'link) (equal type "id"))
            (md/org-link-sync)
            t)  ; Returning t tells org-ctrl-c-ctrl-c that we did something
           (t nil)))  ; Tell org-ctrl-c-ctrl-c there was no match
      (error nil)))  ; Catch any errors in case org-element-context failed

  (add-hook 'org-ctrl-c-ctrl-c-hook 'md/org-ctrl-c-ctrl-c)

The implementation looks similar to the link update function - we use (org-element-context) and (org-element-property) to detect if we're on a supported element, and if so we call (md/org-link-sync). Now if I hit C-c C-c on an ID link, it will call the function and update the link to show the latest keyword and headline.

Bindings

The last thing I wanted was some bindings - these aren't very interesting. I mapped (org-store-link) to be accessible via C-c y in both org-mode and org-agenda-mode, and I mapped C-c L to run (org-insert-last-stored-link), which I find nicer than (org-insert-link) as I never actually need the menu of choices that (org-insert-link) forces you to choose from.

4.6. Next steps - informal org agendas?

This all seems to work well and I think it's a nice improvement that makes links more useful for me. There are a few things I could look into next:

You could take the "mini informal org-agenda" idea further, and update keyword or tag state from the link in the same way you can with org agenda. Eg. maybe if I press C-c C-t on an org link, it could update the keyword state on the linked item.
You could automatically update multiple links at once, or update links on save, rather than requiring the links to be updated manually - this way they become live references to other heading items.
(org-insert-last-stored-link) inserts a newline after the link, and doesn't include the keyword. It would be nice if I could insert the link and automatically call (md/org-link-sync) to see the keyword, instead of having to insert the link and immediately call the function.
(md/org-link-sync) is inserting the target headline into the mark ring and posting a message about it - not sure that I really need this.
Supporting :CUSTOM_ID: links could be useful.

You can find the code I'm actually using in my dotfiles.

Firefox extension updates, thoughts on ChatGPT

hi@mattduck.com (Matt Duck) — Sat, 05 Aug 2023 21:13:00 +0100

5.1. Fixing a Firefox extension

I just fixed an issue in a Firefox extension that I built a few years ago. The extension itself is not particularly interesting: it lets me press Cmd-Shift-L to highlight any selected text, and then I can copy all the highlights to the clipboard. It's very basic but feels intuitive to me and I often use it to highlight things as I'm reading through web pages.

Unfortunately, it has a notable problem: it just wraps the selected text in a single span element, and this immediately breaks down as soon as you select text that spans multiple elements. It's too basic.

For example, selecting text within a single

element would work fine:

  This works
 #----------->
+-------------+ -  -  -  -  -  -  -  -  -  -
|.........|     
          

|.............|              |              |
+-------------++  -  -  -  -  +  -  -  -  -

But selecting text across multiple elements would fail to highlight anything:

This fails as we cross elements #-----------------------------------------> +-------------++-------------++-------------+ |....

...........

..........

....| |...........................................| +-------------++-------------++-------------+

The root problem is that I didn't think very hard when I built it: I did it in an hour or two of Stack Overflow-driven development, searching for how to make a Firefox extension and any relevant browser APIs, and changing the code until it worked. It did work, but barely.

5.2. From Stack Overflow to ChatGPT

Three years later, instead of Stack Overflow, I'm using ChatGPT to fix the code. It was interesting as it felt like a comparable exercise to when I first built the extension. It's still not production-grade code, I'm still the only one using it, I still don't need to understand all the implementation details in depth, and I still don't want to spend much time on it: I just want something that I can get to work within a couple of hours, without much effort.

I copied my old code into ChatGPT, described the problem, and it came back with an initial approach. From there I iterated it and asked for changes and fixes:

It initially wanted to replace whole parts of the DOM, which wasn't working and seemed like a rabbit hole - so I suggested we just take our initial approach of wrapping text in a span element, but parse the DOM and apply it to all the elements in the selection.
There were then a few parsing errors that had to be corrected.
It was adding unnecessary spans to elements with no text content, which I asked to remove.
I realised I wanted to skip any highlighted text that wasn't visible to the user but that existed in the underlying DOM.
When testing, I was confused why my range wasn't going beyond a particular svg element. I asked about this, and it helped me figure out that I needed to use the selection's rangeCount along with document.getSelection().getRangeAt(i), because certain elements break up the cursor selection state into multiple ranges, which you have to operate on individually.

It wasn't perfect: I had to prompt it to fix things a fair bit, tell it what approach I wanted to take, do some console logging to check why something wasn't working, make some manual corrections, tidy up the code myself, etc.

Overall though, there was a notable contrast in how far I could get in a couple of hours compared to three years ago when I was using search engines and Stack Overflow for this exact same task. I can't imagine a more basic implementation than the one I arrived at last time. This time around, with similar amounts of laziness, we're doing things that I didn't achieve before¹³: traversing the DOM, checking range boundary points, handling edge cases, etc.

I think the big differentiator is in how detailed and tailored the inputs and outputs are. Using Stack Overflow I tend to be focused on very narrow generic questions - how do I do X in Javascript? I don't think I've ever actually copy-pasted code from Stack Overflow, because the code is always small enough to type, and often needs customising. But with ChatGPT it's more specifically tailored to your situation and a copy-paste workflow can actually get you started off ok: confirm the code looks safe to run, check the output, and iterate from there.

5.3. Conclusion

Obviously there are downsides to this approach¹⁴. I didn't think about the problem or solution deeply, I didn't learn anything reliable, I've certainly missed other edge cases. If I did all my coding like this then I'd be missing out on a lot of understanding, shipping a lot of bad code, and I'd probably get into a state where I can't debug problems because I don't understand my code well enough. I'm glad these capabilities weren't around when I was a junior engineer, because the temptation to operate in copy-paste mode instead of thinking properly or reading the docs would have been very strong.

But, for this kind of situation where I don't care about understanding the problem or solution well, I'm not putting anything into production, I'm not doing anything novel, I just want something done quickly that appears to work - then unquestionably I can get there faster and with less effort using ChatGPT than I used to be able to via other tools. I don't have strong opinions on the future of LLMs yet, but I agree with the view that someone who uses these tools is going to have the capability to do certain things more quickly and effectively than before, and for me personally I'm finding it pretty useful - not to write all the code for me, but as a much more capable Stack Overflow-type tool.

Reorganising my encrypted partitions and backups

hi@mattduck.com (Matt Duck) — Fri, 12 Nov 2021 23:41:00 +0000

These notes give a high-level overview of some work I did to restructure how I was backing up data at home. YMMV - these aren't optimal decisions and the tools I'm using here will result in data loss if used incorrectly, so always be careful if you're doing anything similar.

6.1. My setup

My main home machine is a Thinkpad T450s running Arch. It has a single disk. The vast majority of my user data (around 1TB) is stored on one partition encrypted using LUKS, and for a few years I've been using Duplicity to make two separate backups of this data - one to S3, and one to a local disk.

This has worked OK so far, but if I ever need to make another full (ie. non-incremental) backup with Duplicity it's extremely slow, and I suspected that some of this data didn't actually need to be in an incremental backup system because either (1) it wasn't going to change again, or (2) I didn't actually need it in the first place. So I wanted to check what data I had and make some changes.

Analysing disk usage with `duc`

There are lots of tools that let you analyse disk usage, but I particularly like duc, as you can run duc index to build up the database and then query separately it without having to wait. The interactive visualisations provided by duc gui also look nice:

The changes

From the duc results I found some data that I was happy to delete. More significantly though, I realised that I only actually had a few GB of active working files, and hundreds of gigs of old files that I'm not going to change again. The storage and usage requirements of these files was different enough that it didn't make sense to use the same backup system for both.

So the plan was:

Split my main data partition into two. Move the old files that aren't going to change to a new encrypted partition and mount it as readonly.
Take some one-off backups of this "readonly" partition.
Ensure my future Duplicity backups ignore this readonly data.
While I'm here, securely delete the data on some old hard drives that I have lying around - as I've been putting this off for ages.

6.2. Resizing a LUKS-encrypted partition with gparted

I had to decrease the size of my main data partition to make space on the disk for the new "readonly" partition. At first my web-searching led to this Arch wiki page on resizing LVM on LUKS. I thought this might be applicable, but I'm not actually using LVM (see Logical Volume Management for background on what LVM is - it basically provides virtual partitions that exist as an abstraction separately from the physical partitions on disk).

All I really needed to do was to decrease the partition size in a way that was compatible with the LUKS encrypted volume. It turns out that gparted does all of this for you.

You first have to make sure the encrypted partition is decrypted, so that gparted can see that it's not full (otherwise it will prevent you from decreasing the partition size):

$ export EXAMPLE_DEVICE=/dev/sdb1
$ cryptsetup open $EXAMPLE_DEVICE mydevice

From there the "resize" feature in gparted just works. If you look at the logs it produces you can see what operations it applies, which are:

Decrease the partition size.
e2fsck -f -y -v -C 0 $EXAMPLE_DEVICE - check the filesystem on the decreased partition.
resize2fs -p /dev/mapper/mydevice 1044475904K - resize the filesystem.
cryptsetup -v --size 2088951808 resize 'mydevice' - resize the LUKS volume (if you were doing this manually this step wouldn't necessarily be required - LUKS calculates the volume size automatically whenever the volume is decrypted, so the resize command is only useful for a "live" resize).

6.3. Creating a new partition with cryptsetup

Once the main partition had been shrunk, I could use gparted to create a new partition in the now-unallocated space.

To set up the new LUKS volume, you can use cryptsetup:

$ export MY_NEW_PARTITION=/dev/sda6
$ cryptsetup luksFormat -y -v --type luks2 $MY_NEW_PARTITION

This sets up the encrypted volume, prompting you for a passphrase.

You can then use cryptsetup open $MY_NEW_PARTITION mynewpartition, and it will appear as /dev/mapper/mynewpartition.

With the volume decrypted, you then need to create a filesystem:

$ mkfs.ext4 /dev/mapper/mynewpartition
mke2fs 1.46.4 (18-Aug-2021)
Creating filesystem with 2555904 4k blocks and 638976 inodes
Filesystem UUID: 2ea513e3-4cd2-479e-9ac2-1288cb99eb22
Superblock backups stored on blocks:
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

You can then mount this to confirm it works:

$ mkdir /mnt/test
$ mount /dev/mapper/mynewpartition /mnt/test

With this mounted, I could move the desired files onto the new partition.

6.4. Mounting the partition as readonly in fstab

By default Arch uses systemd-boot. Assuming that you've got the sd-encrypt hook configured in /etc/mkinitcpio.conf, you can configure the boot loader to decrypt a volume on startup by editing /boot/loader/entries/$myfile.conf (for non-root partitions you can also configure /etc/crypttab, but as my root partition is encrypted I need to do that one here anyway, and so for consistency I've got the other partitions configured here too).

For me the boot loader config looks something like this:

$ cat /boot/loader/entries/arch.conf
title Arch Linux
machine-id b4f148a3b61e89219feec13da2c5cfe6
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img
options rd.luks.name=29e50f84-592e-4dd7-bba5-5a91132341df=arch root=/dev/mapper/arch rootfstype=ext4 rw rd.luks.name=b385ad32-233d-403b-a1a4-6599d4586f30=mynewpartition

The significant part here is rd.luks.name=b385ad32-233d-403b-a1a4-6599d4586f30=mynewpartition. This will map the device with the ID b385ad32-233d-403b-a1a4-6599d4586f30 to /dev/mapper/mynewpartition. To find the right ID for your partition, you can use lsblk, eg:

$ lsblk -f
NAME     FSTYPE      FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1   vfat        FAT32       41EA-BC5D                               484M    12% /boot
├─sda2   swap        1           a099ba9d-ab55-4e04-8308-79482bbcdf14
├─sda3   crypto_LUKS 2           29e5af84-592e-4d03-be85-5811957011df
│ └─arch ext4        1.0         da6db103-d6a5-43e8-a19b-a1d280184baa   13.2G    60% /
├─sda4   crypto_LUKS 2           b385ad32-233d-403b-a1a4-6599d4586f30
│ └─test ext4        1.0         e099b20-d45c-44e2-8ea1-31fcfa93ee58  823.2G    42% /f
└─sda5   ext4        1.0         08bacd07-b586-415c-b498-7105c41bdf4e

To mount the filesystem on boot, you can then edit /etc/fstab to add an entry for the partition, which will look something like this:

/dev/mapper/mynewpartition      /readonly       ext4        discard,ro,nofail      0 2

This will mount the decrypted volume to /readonly. It can be set as readonly just by providing the ro option.

6.5. Copying the partition to another disk with pv

Now that my "readonly" partition contains the data I want and automatically decrypts on boot (if I enter the passphrase), I want to backup the data to external disks. I want to do a block-level copy of the encrypted partition as this is simple and gives me some guarantees that the data is the same.

The first step here is to create an appropriate partition on the target disk. Gparted asks for the desired partition size in Mebibytes (MiB). This means I need to know the source partition size in MiB. I couldn't find a command that provided this directly, so had to figure it out using the byte value returned by blockdev:

$ echo $(($(sudo blockdev --getsize64 /dev/sda7) / 1048576))
600000

There are various ways to then copy from one device to another. I used pv, as it provides useful feedback on progress:

$ export MY_SOURCE_DEVICE=/dev/sda123
$ export MY_TARGET_DEVICE=/dev/sdb456
$ pv --timer --rate --progress --fineta -s "$(blockdev --getsize64 $MY_SOURCE_DEVICE)" $MY_SOURCE_DEVICE > $MY_TARGET_DEVICE

The -s argument is useful - it tells pv how many bytes to expect, so that it can display accurate progress information.

Once it's done, you can check that the partitions are the same, and verify that it worked by opening the new partition with cryptsetup and mounting it again.

6.6. Backing up LUKS headers

LUKS volumes contain a metadata header. It's useful to backup this header to avoid data loss. Quoting from the cryptsetup manpage:

If the header of a LUKS volume gets damaged, all data is permanently lost unless you have a header-backup. If a key-slot is damaged, it can only be restored from a header-backup or if another active key-slot with known passphrase is undamaged. Damaging the LUKS header is something people manage to do with surprising frequency. This risk is the result of a trade-off between security and safety, as LUKS is designed for fast and secure wiping by just overwriting header and key-slot area.

To support this, cryptsetup provides both a luksHeaderBackup command and a luksHeaderRestore command.

6.7. Erasing old disks

I also wanted to securely erase some old disks I had. There are a few ways to do this, but I just looked at two:

Using secure erase

It's possible to issue a Secure Erase ATA instruction to supported devices. This does a firmware-level erase, which can offer advantages over just writing bits to disk (eg. it can be significantly faster, and can erase things that can't be erased by writing to the device, like bad sectors).

You can check whether your device supports secure erase by using hdparm:

$ sudo hdparm -I /dev/sda | grep -i erase
                supported: enhanced erase
        4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.

$ sudo hdparm -I /dev/sdb | grep -i erase
                supported: enhanced erase
        396min for SECURITY ERASE UNIT. 396min for ENHANCED SECURITY ERASE UNIT.

If your disk does support this, then you can use hdparm to issue the erase in a couple of commands - see ATA Secure Erase for instructions.

Using pv (or similar)

If your disk doesn't support secure erase, then there are various options that basically involve writing bytes to the disk using either a dedicated erase tool like shred, or by coping bytes to the device manually using a tool dd or cat. For these disks I used pv again as I just wanted to do something simple and (relatively) fast - eg. I'm not worried about doing multiple iterations.

One choice here is what your source data should be. There are basically two sensible candidates:

/dev/zero: write zeros to the disk. This is often fine, but can supposedly be undone on older disks by amplifying the signal coming from the disk head to differentiate a "zero" that used to store 1 from one that used to store 0.
/dev/urandom: write random data to the disk. Slower than /dev/zero, but safer if your security model involves protecting against that read amplification attack. I'm curious how much slower this is in practice, but haven't tested it.

6.8. Monitoring my duplicity backup status

With these changes done I took the opportunity to swap my duplicity backup to a new S3 bucket, to remove any dependency on the old data.

My duplicity script runs on a cron schedule, and essentially just does this:

#!/bin/bash
set -e

echo "$$ | Starting job: $0"

# See man flock for this snippet. It locks so only one version of the script can
# run at once.
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock --verbose -en "$0" "$0" "$@" || :
echo "$$ | Obtained flock"

# Run the backup
duplicity /mysource boto3+s3://mybucket \
          --name mybackupname \
          --s3-use-ia \
          --archive-dir=/myarchivedir \
          --exclude-filelist=/mylist.exclude \
          --full-if-older-than 365D

# Then some cleanup steps that I'll ignore

# Store the last time the backup finished
date --iso-8601=seconds > /last_success

It would be easy for me to not notice if the backup stopped working, so I display the time of last success in my i3/exwm status bar:

This means the "S3" backup last succeeded 7 minutes ago. The logic for building this string is something like:

if test -f /last_success; then
    DUP_S3_DATE=$(cat /last_success)
    DUP_S3_SECONDS_AGO=$(( ( $(date +%s) - $(date -d "$DUP_S3_DATE" +%s) )))
    DUP_S3_MINUTES_AGO=$(($DUP_S3_SECONDS_AGO / 60))
    DUP_S3_HOURS_AGO=$(($DUP_S3_SECONDS_AGO / (60 * 60)))
    DUP_S3_DAYS_AGO=$(($DUP_S3_SECONDS_AGO / (24 * 60 * 60)))
    if [ $DUP_S3_DAYS_AGO -gt 1 ]; then DUP_S3_MSG="${DUP_S3_DAYS_AGO}d";
    elif [ $DUP_S3_HOURS_AGO -gt 1 ]; then DUP_S3_MSG="${DUP_S3_HOURS_AGO}h";
    elif [ $DUP_S3_MINUTES_AGO -gt 1 ]; then DUP_S3_MSG="${DUP_S3_MINUTES_AGO}m";
    elif [ $DUP_S3_SECONDS_AGO -gt 1 ]; then DUP_S3_MSG="${DUP_S3_SECONDS_AGO}s";
    else DUP_S3_MSG="?";
    fi
else DUP_S3_MSG="[nofile]";
fi

Thinking about it, it wouldn't be much work to also do an automatic restore to make sure I'm still able to recover, and to display that time too.

6.9. Backups are hard

My macbook was stolen a few years back, and at the time I was just running one regular backup to S3 - so I had no other copies of my non-cloud data. I hadn't tested restoring this backup from any machine other than the macbook itself, and when I did need to access the data in an emergency it was pretty stressful not knowing if I had all the right credentials accessible and if the restore was going to work. It was a big relief when it did work (thanks Arq).

We're approaching 2022 and I think backups for personal data are still hard to get right, even for technical users. Sure, I make things more complicated by using linux and being more conservative than a lot of people are about sharing private data with cloud services. But I don't think my requirements are too outrageous:

I have local data which doesn't natively belong to a cloud service, in the order of single-digit TBs.
I want my data to be protected if I accidentally erase something or if my hardware dies.
I want my data to be protected in case of house theft or fire - so I need at least one off-site copy.
I use my laptop on the sofa and other places where it's inconvenient to have an external drive connected via cable.
For a cloud service, I need to be able to control the encryption myself, or the provider needs to be very reputable so I'm comfortable that they can't access my data.
I don't want to be locked in to a single cloud service provider.

There's a lot to figure out to achieve this - it's not common knowledge and it's a big enough topic that it can't be self-taught in a quick web search. You have to think about choice of backup software, cost analysis of cloud services, purchasing disks and/or NAS devices, waiting days for the initial backups to run, doing maintenance to confirm that restore process actually works, potentially changing a cloud provider which means another long transfer process, etc.

I can't speak for Windows, but at least on MacOS this doesn't "just work". Time Machine is nice if you have an external disk connected, but I'd have to do some research to figure out how to set it up with a network drive, and then solve the "offsite" backup problem myself separately - and Time Machine isn't even enabled by default (probably as it requires an external drive).

I don't think this problem can be solved without requiring some thought from the user, but it feels like the default behaviour for consumer-facing OSes could be better - because I think most people I know don't go out of their way to back up anything at all. Which can be fine - in practice you can go for many years without losing data, and many people nowadays may not even have much data that doesn't already live in a cloud service. But it sucks if you do lose something important.

I think the closest I've seen to a multi-backend, set-and-forget solution is Arq, which is very nice but sadly doesn't support linux, and still requires some work to figure out your backend configurations. Tarsnap and Backblaze also have their uses - I have memories of very slow upload speeds for Backblaze but I suspect that's better now through some combination of disk/network/software performance.

Capturing screenshots and recordings with exwm

hi@mattduck.com (Matt Duck) — Fri, 18 Jun 2021 12:41:00 +0100

I've been using exwm as my window manager for a while now. I like the approach but there are a few rough edges. One of these was that I needed to implement something to take simple screenshots and screen recordings. Here's how I got it working.

7.1. The result

This video isn't supported in your browser.

This image shows me pressing super-shift-4 to capture a screenshot by running a function md/screenshot-image-selection. After capture the image is automatically opened in firefox.

The video itself was taken by pressing super-shift-5 to trigger a function md/screenshot-video-selection-start, and then another key to run md/screenshot-video-stop to stop recording.

7.2. The exwm bindings

I wanted something similar to the MacOS cmd-shift-4/5 bindings, so I added them to exwm-input-global-keys:

(setq exwm-input-global-keys
      `(;; Various other keys...

        ;; Prompt for a selection and take a screenshot
        (,(kbd "s-$") . md/screenshot-image-selection)
        ;; Prompt for a selectoin and start a video
        (,(kbd "s-%") . md/screenshot-video-selection-start)
        ;; Stop the video
        (,(kbd "s-^") . md/screenshot-video-stop)))

7.3. The elisp functions

The functions themselves just call out to a script I wrote named ,screenshot:

(defun md/screenshot-image-selection ()
  (interactive)
  (shell-command ",screenshot --image-selection"))

(defun md/screenshot-video-selection-start ()
  (interactive)
  (shell-command ",screenshot --video-selection-start"))

(defun md/screenshot-video-stop ()
  (interactive)
  (shell-command ",screenshot --video-stop"))

7.4. The script

The ,screenshot script implements shell equivalents of all three elisp functions. This could easily be moved inline to the elisp code rather than in a separate script, but I wanted to be able to use it elsewhere:

#!/bin/bash
#
# Features for capturing the screen as image or video.

_THIS_DATE="$(date --iso-8601=second)"
_IMAGE_OUTPUT="/f/inbox/screenshots/${_THIS_DATE}.png"
_VIDEO_OUTPUT="/f/inbox/screenshots/${_THIS_DATE}.mp4"

function image-selection () {
    maim -s >"$_IMAGE_OUTPUT" && firefox "$_IMAGE_OUTPUT"
}

function video-selection-start () {
    # Use slop to grab screen area
    slop=$(slop -f "%x %y %w %h %g %i") || exit 1
    read -r X Y W H G ID < <(echo "$slop")

    # make the width + height divisble by 2 so ffmpeg doesn't error
    if ! [ $((W%2)) -eq 0 ]; then W=$((W+1)); fi
    if ! [ $((H%2)) -eq 0 ]; then H=$((H+1)); fi

    # start capturing video.
    # We use yuv420p here otherwise it can't be played by Firefox.
    # See https://bugzilla.mozilla.org/show_bug.cgi?id=1368063
    ffmpeg -f x11grab -s "$W"x"$H" -r 60 -i :0.0+"$X","$Y" -vcodec h264 -crf 18 -pix_fmt yuv420p -y "$_VIDEO_OUTPUT"  >> /tmp/ffmpg-record.log 2>&1 &

    # store pid
    echo $! >/tmp/ffmpeg-record.pid
    echo "$_VIDEO_OUTPUT" >/tmp/ffmpeg-record.filename
}

function video-stop() {
   pkill --signal INT --pidfile /tmp/ffmpeg-record.pid && firefox "$(cat /tmp/ffmpeg-record.filename)"
}

case "$1" in
    --image-selection) image-selection;;
    --video-selection-start) video-selection-start;;
    --video-stop) video-stop;;
    *) echo "argument invalid or not provided, exiting." && exit 1
esac

How screenshots work

The image-selection function calls out to simple terminal tool called maim, which prompts you for a selection/window and takes the screenshot.

How videos work

The video-selection-start function uses ffmpeg with x11grab as the input source. ffmpeg doesn't provide an easy way to choose what part of the screen to record - you instead have to pass coordinates as arguments. So we use slop to make a selection on the screen and extract the coordinates, and then pass those to ffmpeg.

Unlike screenshots, videos also need a stop instruction. ffmpeg expects to receive a SIGINT to stop recording. So we write two files to /tmp:

A pid file. This allows us to stop the video by doing pkill --signal INT --pidfile /tmp/ffmpeg-record.pid
The output path of the video. This allows us to automatically open the video with firefox.

7.5. FFmpeg gotchas

I ran into a few issues configuring ffmpeg:

Firefox doesn't recognise the video format

The default ffmpeg video output was failing to play on Firefox - "Video can't be played because the file is corrupt".

The problem turned out not to be that the file was corrupt, but that Firefox doesn't support ffmpeg's default YUV444 chroma subsampling setting for H.264. A workaround is to specify -pix_fmt yuv420p. (I'm not sure whether this affects codecs other than H.264).

Dimensions must be divisible by 2

Sometimes slop would produce an odd number for the input height, which is invalid for H.264 and causes ffmpeg to throw a "height not divisible by 2" error. There are a few solutions suggested on this thread, but for now I'm just adding 1 to my width/height if they're not even because I don't need the values to be exact.

Screen tearing

Finally I had an issue where during playback I was seeing screen tearing in the video. This wasn't specific to exwm or ffmpeg - it also occurred when I was using i3 and other video recording tools.

I was using picom as an X compositor, and was able to isolate the issue to only occur when picom was running with the glx backend and vsync=true. There seem to be a few options to fixing it - eg. setting vsync=false in picom, switching to the xrender backend instead of glx, or even disabling picom entirely. Right now I'm disabling it entirely as I don't notice much difference when it's on.

With this fixed everything is working well. You can find the code in my dotfiles.

A fun problem with fzf-tab-completion and echo

hi@mattduck.com (Matt Duck) — Sun, 09 May 2021 16:25:00 +0100

Fzf is great, but I've always thought the bash tab completion wasn't as polished as it could be. I want to be able to hit and have fzf pop up to navigate the candidates, but it doesn't hook into bash completion like you'd expect. Instead, there are special implementations of completion for different commands - eg. you might have to call _fzf_setup_completion dir tree to configure an fzf completion that prompts you for directories when you run tree. In addition to requiring extra setup, it also doesn't provide support for ad-hoc flags, so it's pretty limited.

Yesterday I was looking for a better way to do this, and came across lincheney's fzf-tab-completion. This seems to work pretty well, but I had to fix a couple of things to get it working properly.

8.1. First - macOS support

You can check the github page for instructions on how to use fzf-tab-completion - basically you just source a bash script which defines a function fzf_bash_completion, and then you use bind to run fzf_bash_completion on a particular keypress (eg. ).

I tried this on macOS and hit a common problem: the GNU and BSD versions of certain utils aren't compatible - they have various differences in their flags/behaviour, and the bash script is assuming the GNU versions of sed and awk. On macOS you can brew install coreutils to get the GNU versions, but they get prefixed with g to avoid breaking anything that assumes the default BSD versions.

There are a few approaches to fixing this. I just opted for the one that required the least work - changing the script to prefer gawk and gsed if they're available.

8.2. The main event - why doesn't kubectl complete namespaces

After that it seemed to be working:

I could type git checkout and use fzf to complete the branch names, or ls -- and select a long flag. It even seems I can complete multiple flags at once by adding --multi to FZF_COMPLETION_OPTS.

One place where I wanted to use fzf for completion was kubectl. There are various tools that add fzf features to kubectl or that provide nice replacements for kubectl commands, but I'd find it useful to have fzf completion that "just works" when I'm running ad-hoc kubectl commands.

I tried this for a bit and it did work:

kubectl get pod - completes pod names
kubectl get - completes resources
kubectl -n foo rollout restart deployment - completes deployments in the "foo" namespace

Then I hit a problem running kubectl get pod -n . This is supposed to complete namespaces, but it was giving me a list of pods. I double-checked that the standard kubectl bash completion does complete namespaces as expected in this situation, so it looked like something might be wrong with the fzf completion.

8.3. Debugging

After trying a few variations I realised that the fzf completion would work if I specified the namespace flag as -n= instead of -n . I knew the bash script contained a decent amount of regex/parsing logic, so wondered if something could be wrong with how it handled flags.

I started decorating the script with debug lines like echo "some_variable: $SOME_VARIABLE" >> /tmp/debug.log, or | tee -a /tmp/debug.log. It turned out there were certain lines where a flag would usually be printed, but if I used -n, it would be missing. For example:

# (some printed output with -n=)
line: k get pod -n=
SHELL SPLIT ----
line: -n=
buffer:
line: pod
buffer:
line: get
buffer:
line: k
buffer:
COMP_WORDS: k get pod -n=
COMP_CWORD: 3

# (some printed output with -n)
line: k get pod -n
SHELL SPLIT ----
line: -n
buffer:
line: pod
buffer:
line: get
buffer:
line: k
buffer:
COMP_WORDS: k get pod  # <-- where did -n go?
COMP_CWORD: 3

I continued to try different inputs and then noticed that -e was also affected. At this point I ran through the whole lowercase alphabet… and -n and -e were the only flags that didn't work.

I wanted to narrow down where the flags were being stripped in the script. The -n and -e flags should have been a clue but I was still thinking about regex at this point, so I started by commenting out some of the regex/sed code, and also calling some of the intermediate functions manually.

Eventually I narrowed it to this function which gets called on each argument:

$ printf '%s\n' '-a' | _fzf_bash_completion_flatten_subshells
>> -a

$ printf '%s\n' '-n' | _fzf_bash_completion_flatten_subshells
>>

From there it was easy to find the offending line. It was this:

echo "$line$buffer"

This looks harmless enough, but $buffer was an empty variable, and $line contained our flag, so this was executing echo "-n" or echo "-e". -n and -e are echo flags: echo essentially interprets this as echo -n "" or echo -e "", and prints an empty line. This is why only these particular flags were affected.

(Had I tried capital letters, I'd have found that -E also gets stripped. I think there can be issues with other input too but I haven't delved into it).

8.4. The solution: printf

Weirdly, there doesn't seem to be a way to print the literal -n using echo. I'm sure there are various alternative ways to solve this, but I opted for replacing the invocations of echo "$foo" with printf '%s\n' "$foo". printf is a coreutils command similar to C's printf function. It's commonly available and was already used elsewhere in the fzf-tab-completion script, so it was an obvious choice.

After that it worked! I've tried various combinations of flags and everything is completing nicely so far. It's not as instant as using the standard bash completion, but it's going to be a lot nicer for certain commands, and is much more useful for me than the builtin tab completion that comes with fzf.

8.5. What version of echo was this?

Running which echo or man echo might point to the GNU echo program, but when you run echo in bash then by default you'll be using bash's builtin echo function. There are slight differences between the two, but both are affected by this issue. You can test this by doing enable -n echo to disable bash's builtin echo.

The GNU version has an additional fun (but easier to debug) problem: if I had been using this program and included the flag --help, it would have printed the help text instead of the literal --help value.

8.6. Why is the GNU echo program/bash function designed like this?

I'm curious about this - I think I had come across this problem before but I didn't have it in mind when I was debugging this issue. It's obviously a gotcha that many people will be aware of, but it seems like bad design from a couple of angles:

If you're unlucky enough to not know about this issue, it will violate expectations - we have a function that prints almost everything except when your string happens to clash with echo's own flags.
It's implicit and doesn't give feedback to the user. The risk isn't documented in the GNU manpage or the bash help text, and there's nothing in the output of the command that hints to the user that this issue may have occurred or why it happened.

One sign that it's easy to forget this issue is that even when I was thinking about this problem and trying to debug the script, I was using echo out of habit, and nearly fell into the exact same problem by doing echo "$variable_that_can_potentially_hold_-n".

I'm sure there are some historical/standards design reasons for it being this way, but I haven't looked into it yet.

8.7. Lesson - don't use echo to print variables

This is something I'll be conscious to pick up in code reviews and my own scripts going forwards. It seems wise to avoid echo for anything that handles user input, and particularly anything related to parsing command-line args.

fzf-tab-completion is available on github. Hopefully these fixes (or similar) can be made available for everybody upstream at some point, but if not you should be able to find them via that page.

Upgrading to Emacs 28.0 for native compilation

hi@mattduck.com (Matt Duck) — Sat, 08 May 2021 10:11:00 +0100

This is a record of how I built Emacs 28 with native compilation on macOS (Intel), and the issues I encountered upgrading my config from 26.3. I'm primarily writing this for myself in case I run into similar problems in the future, but hopefully it can be useful for somebody else too.

9.1. Why I wanted to upgrade

Emacs 27.1 - faster JSON with libjansson

Emacs 27.1 was released in August 2020. One of the changes was that it introduced was support for libjansson, a C library for working with JSON, which is significantly faster than json.el. A place where this is particularly beneficial is for LSP performance, as the LSP clients and servers communicate using JSON.

Emacs 28 - faster everything with libgccjit

This was my motivation for the upgrade. 2 weeks ago the native compilation branch (led by Andrea Corallo and previously named gccemacs) was merged into master (the development branch for what will later become Emacs 28.1). This project uses libgccjit to perform ahead-of-time compilation of emacs-lisp bytecode (.elc files) to native code (new .eln files), which adds general latency improvements to Emacs across the board.

9.2. Install steps for MacOS

I used jimeh's build-emacs-for-macos script, which does most of the work for you. I ran:

brew bundle - this installs all the dependencies contained in the Brewfile.
./build-emacs-for-macos --no-frame-refocus --git-sha 83a915d3dfafd5f3d737afe1e13b75e4dd3aef96 master - this compiles Emacs.
cd builds && tar -xjf Emacs.app-\[master\]\[2021-04-25\]\[83a915d\]\[macOS-10.15\]\[x86_64\].tbz - this extracts the compiled Emacs.app from the builds directory.
./Emacs.app/Contents/MacOS/Emacs --debug-init - start Emacs and see what issues occur.
After fixing a lot of new errors I replaced /Applications/Emacs.app with the new Emacs.app.

Some notes on the flags for the build script:

I chose commit 83a915d3dfafd5f3d737afe1e13b75e4dd3aef96 because it was the most recent commit in jimeh's list of known good commits for native-comp.
Native compilation is controlled by the --with-native-comp flag, but is enabled by default.
--no-frame-refocus prevents Emacs from raising another frame when a frame is closed. See Álvaro Ramírez's post on this.

9.3. Enabling native compilation

Native compilation should be enabled by default. Some site elisp files will have been compiled during Emacs' compilation, but other libraries will be compiled asynchronously when you load them (which means you don't get all the benefits straight away - after starting Emacs you have to wait for libraries to be compiled).

It was immediately clear to me that this compilation was executing, as my *Warnings* buffer started to fill with warnings. Some were harmless, like "assignment to free variable" when compiling my init.el, and others were actual errors.

I only had to make one change to enable something related to native compilation. I had a couple of places where I was using (load-file) rather than (require), and these didn't seem to be compiled automatically, so I did:

(when (fboundp 'native-compile-async)
  (native-compile-async "path-to-my-file.el"))

9.4. Things that went wrong

I ended up having to fix a lot of small issues before I could run my existing init.el without it failing on startup (or on native compilation). Most of the problems were due to upgrading Emacs and individual package versions rather than native compilation itself.

jka-compr recursive load issue when opening Emacs.app

For some reason this did not affect Emacs when Emacs.app/Contents/MacOS/Emacs was executed from the terminal - only when opened as an application. Whenever a package had a dependency on jka-compr, it would hit a recursive load error:

Error (use-package): evil/:catch: Recursive load: "/Applications/Emacs.app/Contents/Resources/lisp/jka-compr.el.gz", "/Applications/Emacs.app/Contents/Resources/lisp/jka-compr.el.gz", "/Applications/Emacs.app/Contents/Resources/lisp/jka-compr.el.gz", "/Applications/Emacs.app/Contents/Resources/lisp/jka-compr.el.gz", "/Applications/Emacs.app/Contents/Resources/lisp/jka-compr.el.gz", "/Applications/Emacs.app/Contents/Resources/lisp/rect.el.gz", "/Users/mattduck/.emacs.d/eln-cache/28.0.50-6e08c520/evil-common-4cbe422e-ef770841.eln", "/Users/mattduck/.emacs.d/elpa/evil-20210424.1855/evil.elc"

There turned out be various issues reported for this (eg. here). It's caused by load-prefer-newer, which is a variable that controls what happens if Emacs finds multiple versions of the same file (.el, elc, .so). When true, Emacs will load the newest one.

The workaround was to disable load-prefer-newer before loading jka-compr:

(setq load-prefer-newer nil)
(require 'jka-compr)
(setq load-prefer-newer t)

This worked fine, but I'm not sure what the root cause is, or why I can find examples of this error going back a few years but I've never seen it before now.

package-refresh-packages hangs when using Marmalade

I was having issues with package-refresh-packages hanging, but they disappeared when I removed Marmalade from package-archives. I'm not sure if this was related to the upgrade or not. Either way I've hardly ever used Marmalade so I just removed it from my package-archives definition.

wrong number of arguments window–display-buffer 5

The signature of the builtin function window--display-buffer changed in 27.1 - it removed the fifth argument DEDICATED. For me this broke my fork of shackle, which uses this function in a couple of places. It's fixed in the upstream repo at https://depp.brause.cc/shackle/, so I just pulled in the fix.

pyvenv-tracking-mode slowing everything down

This was a weird one. I set pyvenv-workon globally in my init.el, but I also had had a dir-locals setting for a project that was setting pyvenv-workon to a project-specific virtualenv using a symbol like this:

((python-mode
  (pyvenv-workon . foo\.bar)))

After upgrading, when pyvenv-tracking-mode was enabled, python buffers were extremely slow to respond to input. I pinned the issue down to pyvenv and upgraded pyvenv to 20201227.1623, which didn't help. I eventually realised that pyvenv was constantly switching between my dir-locals virtualenv and my global virtualenv.

The reason for the constant virtualenv switching was that pyvenv-tracking-mode runs a function called pyvenv-track-virtualenv on post-command-hook, and this command was comparing pyvenv-virtual-env-name as a string ("foobar") to the dir-locals declaration for pyvenv-workon as a symbol (foo\.bar). Because the string wasn't equal to the symbol, the virtualenv would keep getting reset every time a command was run in the buffer.

Updating the dir-locals declaration to use a string fixed it. I'm not sure what changed to make this issue start occurring now, as it wasn't anything specifically in my setup or pyvenv.

An org-eldoc "wrong-number-of-arguments" error

This error appeared on init:

eldoc error: (wrong-number-of-arguments (lambda nil Return breadcrumbs when on a headline, args for src block header-line,
  calls other documentation functions depending on lang when inside src body. (or (org-eldoc-get-breadcrumb) (org-eldoc-get-src-header) (let ((lang (org-eldoc-get-src-lang))) (cond ((or (string= lang emacs-lisp) (string= lang elisp)) (if (fboundp 'elisp-eldoc-documentation-function) (elisp-eldoc-documentation-function) (let (eldoc-documentation-function) (eldoc-print-current-symbol-info)))) ((or (string= lang c) (string= lang C)) (if (require 'c-eldoc nil t) (progn (c-eldoc-print-current-symbol-info)))) ((string= lang css) (if (require 'css-eldoc nil t) (progn (css-eldoc-function)))) ((string= lang php) (if (require 'php-eldoc nil t) (progn (php-eldoc-function)))) ((or (string= lang go) (string= lang golang)) (if (require 'go-eldoc nil t) (progn (go-eldoc--documentation-function)))) (t (let ((doc-fun (org-eldoc-get-mode-local-documentation-function lang))) (if (functionp doc-fun) (progn (funcall doc-fun))))))))) 1)

It was coming from org-eldoc, which is part of org-plus-contrib. It went away when I upgraded from org-plus-contrib version 20200518 to 20210426 (in the org package repo).

lua-mode/:catch: Unknown rx form ‘symbol’

I ran into this error in lua-mode:

lua-mode/:catch: Unknown rx form ‘symbol’ Disable showing Disable logging

It can be fixed by removing the existing lua-mode.elc file.

Error: Wrong number of arguments (3 . 4)

This "wrong number of arguments" error appeared when loading various packages. I didn't even look into where this was occurring as it disappeared as soon as I upgraded the packages. The upgrades were:

Package	Old version	New version
projectile	20200329.1908	20210407.707
dockerfile-mode	20200106.2126	20210404.2224
lsp-mode	20200425.434	20210501.508
evil	20200417.1238	20210424.1855

LSP dependencies needed upgrading

After upgrading LSP, my Python setup stopped working (eg. the language server would return errors that it couldn't find various modules for my python projects). This was fixed by upgrading to the latest versions of python-language-server, pyls-black, pyls-mypy and plys-isort.

powerline.el: Error: List contains a loop ("22", . #0)

During native compilation, this warning is displayed for powerline:

Warning (comp): /Users/mattduck/.emacs.d/elpa/powerline-20200105.2053/powerline.el: Error: List contains a loop ("22", . #0)

This is an outstanding issue for powerline. For now the fix is to not compile it by doing (setq comp-deferred-compilation-deny-list '("powerline")).

use-package's :pin feature doesn't work

This was another strange one. In my init.el I was using the use-package :pin argument to pin org-mode to use the org package archive instead of melpa. I had got to the point where everything would work OK when loading Emacs the first time. But after native-compiling init.el, the next time I opened Emacs it would skip my use-package declaration for org-mode. It turned out that this only happened when I included the :pin argument.

Removing :pin fixed the issue, and didn't have any detrimental effect for me because I don't have multiple org versions installed anymore. I'm curious what this use-package issue is though and whether it's specific to my setup.

Loading an external elisp file failed if compiled

I was using :load-path with use-package to load some elisp related to managing windows and buffers, which I keep in a separate repo. The first time I opened Emacs this worked fine, but if I opened Emacs after native-compiling, some of my config would error because it referenced void symbols from this external file.

I'm not sure what this problem was - I just copied the contents of the external file inline to init.el to workaround it. This was acceptable for me as I had been planning to move that code inline anyway.

SVG support didn't work

I tried to call build-emacs-for-macos with the --rsvg flag, which is supposed to provide svg support via librsvg. The log output suggested that it was being compiled as expected, but when I opened Emacs svg wasn't supported. I didn't fix this, and haven't looked into what it is yet.

9.5. Too many errors?

I think this is an unacceptable amount of work for the majority of people. There were a lot of separate issues that I had to investigate, and most of them remain a mystery to me - it just wasn't practical to invest time in understanding all the causes. Fortunately there were easy workarounds and I didn't have to disable anything that I actually use.

Some of this is on me: I have a lot of config code which has been hacked together over 7+ years, tons of packages installed, I hadn't upgraded for a while, I used master rather than a stable release, etc. But I think some of it is just the nature of Emacs and the ecosystem - packages are supported by a small number of contributors, breaking changes are reasonably common, you won't always find issues reported when you encounter a problem.

I'd expect this to be slightly easier if you're using a distribution like Spacemacs or Doom, because of their popularity and because a large chunk of the configuration is managed for you. But if you're starting with vanilla Emacs and you want to customise it with a lot of packages, the reality is that you're going to have spend time debugging problems like this, and you probably have to be invested in DIY editors, Emacs, Lisp etc. for it to be worth it.

For me upgrading has been a nice improvement. Emacs feels noticeably quicker than I've experienced before, and more aligned with what you'd expect in a modern IDE. It will be pretty cool when native compilation gets a proper release in 28.1.

You can find my config on github, and you can watch Andrea talk more about native compilation here.

A philosophy of software design - John Ousterhout

hi@mattduck.com (Matt Duck) — Wed, 07 Apr 2021 22:15:00 +0100

I read A Philosophy of Software Design about 18 months back. It's a well-structured, concise read about managing complexity in software design. I don't think the suggested approaches are applicable in all situations (and John Ousterhout says this himself IIRC), but I recognised a lot of the problems described in the book and found it provided some useful ways to articulate concepts during code reviews (eg. whether adding a shallow function is increasing complexity in a codebase, if complexity can be pulled down into an implementation, or where it's useful to have consistency in the code).

Below are the notes I made on takeaways from the book and my thoughts on a couple of the ideas (minus some fun references to real code that I've worked on). I'm publishing the notes as it's a nice way for me to re-read them and retain the information. This doesn't cover all the content in the book, and it's possible that I misrepresent the author in some of my paraphrasing. If you're interested in the content I definitely recommend buying a copy - it's not expensive and it's an easy read.

If anyone connected to the author thinks this is sharing too much detail then I'm happy to take it down.

10.1. Summary

When building software systems, the core challenge is managing complexity. Complexity makes it more difficult for a programmer to understand and change software, it increases the rate of errors, it slows development velocity, and has other negative affects.

Software design is one of the key tools for managing complexity. Ousterhout discusses the different types and causes of complexity, and then various software design considerations and their relationship to complexity - patterns, antipatterns, questions to ask, etc.

10.2. The nature of complexity

Ultimately, complexity makes it more expensive to modify a program: changes are more difficult, take longer, and are more likely to introduce errors to a program.

Ousterhout identifies three general ways that complexity manifests itself:

Change amplification: where a seemingly simple change requires code modifications in many different places.
Cognitive load: where a developer needs to know a large number of things in order to complete a task.
Unknown unknowns: where it's unclear what to do, or whether a proposed solution will even work.

The overall complexity of a system can be determined by the complexity of each part, weighted by the fraction of time developers spend working on that part. If you isolate complexity in a place where it will never be seen, then that's almost as good as eliminating it entirely.

10.3. Causes of complexity

Ousterhout recognises two main causes of complexity.

Dependencies between software components, which can lead to change amplification and a high cognitive load. Dependencies are a fundamental part of software and can't be eliminated, but one of the goals of software design is to eliminate dependencies where possible, and to make the dependencies that remain as simple and obvious as possible.
Obscurity - when important information is not obvious. This creates unknown unknowns, and also contributes to cognitive load.

10.4. Tactical vs strategic programming

Ousterhout advocates for a strategic approach to software development, rather than a wholly tactical approach. This essentially just means ongoing, regular investment of some of your development time towards system design, rather than just working code.

One pitfall is that complexity in software development is incremental. A single shortcut or tactical decision that adds complexity won't have much impact, but small decisions can accumulate to dozens or hundreds of things that do have an impact. Then refactoring becomes a big task that you can't easily schedule with the business, so you look for quick patches, and this creates yet more complexity, which requires more patches, and so forth.

Once a codebase gets complex enough, it is nearly impossible to fix, and you will probably pay high development costs for the rest of its life.

Agile does not encourage strategic programming

"Agile" and similar approaches to software development tend to be focused on small, tactical changes. It's easy in this environment to forget about investing in the codebase, especially in startup companies that have a lot of pressure to deliver features.

10.5. Write code for the reader, not the writer

If someone says your code is not obvious, then it isn't. This is stated a few times through the book.

10.6. Deep modules are less complex than shallow modules

This is one of the core themes in the book. A software system is usually decomposed into a collection of modules that are relatively independent. Modules inevitably have dependencies between them - they work together by calling each other, and therefore must know about each other. In order to manage these dependencies, we think about a module in two parts: an interface (what the module does), and an implementation (how it does it).

The idea of abstraction is closely related to modules. An abstraction is a simplified view of an entity which omits unimportant details, making it easier for us to think about and manipulate complex things.

Ousterhout argues that the best modules are those that provide powerful functionality, but have a simple interface. He describes these as deep modules, in contrast to shallow modules, which have a complex interface but not much functionality, thereby not hiding significant complexity.

The file I/O interface provided by Unix is a good example of a deep interface - the API only has a few system calls (open, read, write, seek, close), but hides a huge amount of complexity around implementation of files, directories, permissions, concurrent access, etc.

Programmers are often encouraged to write small modules

Ousterhout says that the conventional wisdom is to write small components (keeping the LoC low in each method) rather than deep components, but this results in large numbers of shallow classes and methods, which add to overall system complexity.

I have seen this myself, and also contributed to this problem. People often break a routine into multiple functions for the purpose of making it "easier" to read, or to avoid some code duplication, or to lower a cyclomatic complexity score. This kind of decomposition has a different purpose to designing public interfaces, but sometimes get added to the public API of a class or module.

10.7. General-purpose modules are deeper

This doesn't mean "generalised" implementations that support extra features that you don't need - it means writing methods that are not overly specialised. The sweet spot is a somewhat general-purpose approach, which hopefully provides a simpler and deeper interface.

"If you reduce the number of methods in an API without reducing its overall capabilities, then you are probably creating more general-purpose methods."

10.8. Pass-through variables add complexity

Pass-through variables add complexity because they force intermediate methods to be aware of their existence, even though the methods have no use for the variables.

Eliminating this anti-pattern can be very difficult. The two main approaches are to see if you can add the pass-through state onto an object that is already shared between the top and bottom methods, or to make it a global variable. These can have their own problems.

Ousterhout's most often-used approach to this problem is to introduce a context object, which stores the application's global state - anything that would otherwise be a pass-through or global variable. They are not an ideal solution because they have a lot of the disadvantages of global variables, but they can reduce the complexity of a method's signature.

10.9. Pass-through methods are shallow and add complexity

A pass-through method is one that does little except invoke another method with a similar signature. This increases the interface complexity of a module, without increasing the total functionality of the system. It can indicate that there is confusion over the division of responsibility between modules or classes.

This is not always bad - the important thing is that each new method should contribute significant functionality. For example, a dispatcher method is a pass-through method that can be very useful.

Decorators are often shallow pass-through methods that add a lot of boilerplate without adding much new functionality.

10.10. Pull complexity downwards

It's more important for a module to have a simple interface than a simple implementation. If you have complexity that is closely related to your module's functionality, you should consider pulling that complexity into the module's implementation.

One example Ousterhout provides is configuration parameters, which are an example of moving complexity upwards rather than down. Consider a network protocol that has to deal with lost packets: one way to determine an appropriate retry interval is to introduce a configuration parameter to provide control to the user. However, it may be preferable to compute a reasonable retry value automatically during runtime (by eg. measuring the response time). This approach pulls system complexity downwards.

10.11. Define errors out of existence

"Exception handling is one of the worst sources of complexity in software systems". They can leak abstraction details upwards, making for a more shallow abstraction. Programmers are often taught that they need to handle exceptional cases, leading for an over-defensive programming style.

There are two main ways to handle exceptions, each with their own complexity. The first is to complete the work in spite of the exception, and the second is to report the exception upwards (perhaps also running some unwinding / handling logic).

Ousterhout argues in favour of defining errors out of existence, by automatically handling certain cases. He contrasts the behaviour of deleting a file on Windows vs Unix. On Windows, if a process is using the file to be deleted, an error is raised. In the Unix implementation, if another process is using the file, the file is internally marked for deletion and the operation returns successfully. Unix actually deletes the file when the other process has finished using it. Errors are avoided in both the process that initiated the deletion, and the other process that was using the file. Another common example is when accessing a substring by index: you could raise an out of bounds error, or you could just return the whole string.

Where you must handle exceptions, you should prefer to handle many exceptions with a single piece of code.

10.12. A thoughtful approach to code comments

Ousterhout shares a few opinions on comments:

A programming language can't capture all of the important information that was in the mind of the developer when the code was written. Comments should be used to describe things that aren't obvious from the code.
"Developers should be able to understand the abstraction provided by a module without reading any code other than its externally visible declarations." If you want code that presents good abstractions, you must document those abstractions with comments.
Write comments early, because they can help to improve the system design. Comments are likely to be low-quality if you write them as a token gesture at the end of a piece of work.
Comments can indicate complexity: if a method or variable describes a long comment, it is a red flag that you don't have a good abstraction.
Comments should not repeat the code. Could somebody who has never seen the code write your comment just by looking at the code? If so, the comment is worthless.
Comments should be written at a different level of detail to the code - either higher (eg. concerned with top-level behaviour), or lower (concerned with specific details that are not obvious by reading the code). Comments are easier to maintain if they are higher-level and more abstract than the code. This is because they're less likely to be affected by minor changes in the code.
Comments should occur close to the code that they describe. This increases the ease of reading and updating them. It's more important that comments are in the code, than in the commit log. This is because it's significantly easier to find the comments if they're close to the code - finding the right log message is difficult.
Try to document each design decision exactly once. If information is already documented outside your program, don't repeat the documentation inside the program, just reference the external documentation.

Refuting arguments against comments

"Good code is self documenting": to this, Ousterhout argues that there are many things that can't be described in the code, such as the rationale for a particular design decision, or the conditions under which it makes sense to call a particular method. More importantly, if there are no comments accompanying the interface of a method, then there is no abstraction: you must read the method's code, and all of its complexity is exposed.
"I don't have time": good comments shouldn't add more than 10% to your development time. The benefits of having good documentation should quickly offset this cost.
"Comments become out of date and misleading": keeping comments up to date is not an enormous effort - it should be flagged in code reviews. Large changes to documentation should only be required if there are large changes to the codebase. You do have to take some care to structure comments to be useful though.
"All the comments I have seen are worthless": Ousterhout agrees that most documentation is "so-so at best".

Different types of comments

Ousterhout identifies some different types of comments:

Interface: a comment block that goes with the declaration of a class, data structure, function, or method. They describe the thing's interface - the overall behaviour, arguments and return values, any side effects or exceptions, and other requirements that the caller must satisfy. They can also fill in missing details: the units for a variable, whether null values are permitted, whether boundary conditions are inclusive or exclusive, etc.
Data structure member: a comment next to the declaration of a field, describing what it represents.
Implementation: a comment describe how a piece of code works.
Cross-module: a comment describing dependencies that cross module boundaries.

It's important not to mix up these purposes. Comments describing the interface of a component should not have to share any implementation details: this is information leakage, and it adds complexity for the reader. If interface comments must also describe the implementation, then the class or method is shallow.

Comments and programming fluency

My own thought on this is that in order to write and work with high-quality comments, you need to be at a certain level of fluency. When I started programming, I had to use comments liberally to describe to myself what certain lines of code meant, because it was significantly faster for me to understand an English description than to read a block of 4-5 lines of code.

I suspect that some of the "bad" comments where the comment just duplicates the information in the code fall under this category: for some people, those comments are helpful, but for others who are more fluent in the programming language, they're less useful. In a professional environment there should ideally be a base level of understanding and verbose comments that duplicate what the code is doing should not be required often.

10.13. Naming is important

Good names are a form of documentation, and another place where all programmers have seen incremental complexity: a single bad name does not hurt a program, but a program full of poorly-named components can be very difficult to use.

Names should be precise (you usually want to avoid overly-general names like data), and consistent - if a name is used in multiple places it should refer to the same thing. If the same name is used to refer to different concepts, then at some point a reader will be confused, which is likely to introduce errors to the program.

10.14. Whitespace can help break up code into logical blocks

One way to make code more obvious to the reader is to have blank lines between parts that are logically separate, and maybe to preface the code block with an implementation comment.

10.15. TDD focuses on features rather than abstractions and design

"The problem with test-driven development is that it focuses on getting specific features working, rather than finding the best design." Ousterhout argues that the unit of development should be abstractions rather than features, and that once you discover the need for an abstraction, you should design it all at once.

He says that TDD does make sense when fixing bugs (which aligns with my experience).

10.16. Consistency is important

Consistency minimises complexity because it provides cognitive leverage: you learn how something is done in one place, and you can immediately understand other places that use the same approach. If a system is not implemented consistently, then developers must learn about each situation separately, which takes more time.

Consistency also reduces mistake: if a system is not consistent, two situations that appear the same may actually be different.

Consistency can be considered in things like:

Naming.
Coding style.
Interfaces.
Design patterns.
Invariants - cases that are always true. For example, a data structure storing lines of text might enforce an invariant that each line is terminated by a newline character. The programmer can then always assume this to be true.

There are things you can do to ensure consistency:

Write conventions down.
Enforce conventions, ideally automatically through tools.
"When in Rome…" follow existing conventions. Having a better idea is not a sufficient excuse to introduce inconsistencies. Is your new approach so much better that it's worth taking the time to update all of the old uses?

10.17. Implementation inheritance increases complexity

Implementation inheritance creates dependencies between the parent class and each of its subclasses, results in information leakage between the parent and child classes, and makes it hard to modify one class in the hierarchy without looking at the other.

Ousterhout suggests composition can be a less-complex alternative to inheritance.

The first example that jumps to mind personally is Django - the class hierarchy always seemed to add more complexity than it solved in a lot of cases. In general I think it's very easy for the complexity of inheritance to outweigh the utility.

10.18. Event-driven programming makes code less obvious

Event-driven programming makes it hard to follow the flow of control. This is because handler functions aren't invoked directly - it depends on which handlers were registered at runtime.

This doesn't mean that there is no use-case for event-driven programming, but care should be taken to mitigate this complexity.

10.19. Design it twice

You'll end up with a much better result if you consider multiple options for each major design decision. There isn't much to say about this, other than that I intuitively agree.

Filtering Django querysets without re-querying the database

hi@mattduck.com (Matt Duck) — Sun, 31 Jan 2021 16:26:00 +0000

11.1. ORM problems

I think most ORM patterns I've seen present a bad abstraction. Rather than providing an interface that safely hides the details of the underlying SQL, they make it extremely easy to trip yourself up by writing OOP code which looks conventional but is actually problematic when translated to relational data access patterns. One impact this has for users of your program is increased latency, as the impedance mismatch between SQL and the ORM code causes time to be wasted on:

Issuing many more queries to the database than required (See the N+1 problem).
Reading columns over the wire that aren't actually used in the application (SELECT * FROM mytable).
Querying the same data multiple times in a request.
Loading excess rows into the application to do processing that could easily have been pushed into the database.

(This doesn't mean ORMs don't have a place - the downsides may be worth it in situations where you can work much quicker with ORM classes than with SQL, performance isn't a concern, you have a very small dataset, etc.).

11.2. Django ORM and the QuerySet cache

Django's ORM suffers from these problems. Unlike SQLAlchemy, it doesn't provide a core API to compose SQL - you're forced to deal with Django's Model and QuerySet interfaces. It's very easy to end up in a place where your functions and SQL queries are mismatched, with queries fired off at various arbitrary points in the request path.

One of the ways that Django attempts to mitigate this problem is through the queryset result cache. Querysets try to be smart about not re-querying the database to answer basic questions that they already have the answer to. For example, if I iterate over a queryset multiple times, the results will be cached:

users = User.objects.all()
for user in users:  # SQL: SELECT id, name, etc FROM users
    print(user)

for user in users:  # Does not issue SQL
    print(user)

Similarly, if I iterate over the queryset and later call .count(), the queryset will return the length of the cached results, instead of issuing SELECT count(*) FROM users:

users = User.objects.all()
for user in users:  # SQL: SELECT id, name, etc FROM users
    print(user)

print(users.count())  # Does not issue SQL

11.3. How the result cache is implemented

The QuerySet class stores the cache on self._result_cache. When populated, this object appears to just be a list of model instances. It gets initialised to None when building a new queryset:

class QuerySet:
    """Represent a lazy database lookup for a set of objects."""

    def __init__(self, model=None, query=None, using=None, hints=None):
        self.model = model
        self._db = using
        self._hints = hints or {}
        self._query = query or sql.Query(self.model)
        self._result_cache = None
        # ...

You can read the Django source code on Github (these snippets are copied from master at time of writing). Here's the full implementation of .count(), which will only issue a COUNT(*) statement if the result cache is empty:

def count(self):
    """
    Perform a SELECT COUNT() and return the number of records as an
    integer.
    If the QuerySet is already fully cached, return the length of the
    cached results set to avoid multiple SELECT COUNT(*) calls.
    """
    if self._result_cache is not None:
        return len(self._result_cache)

    return self.query.get_count(using=self.db)

self._result_cache is mentioned 23 times in django/db/models/query.py. In addition to the count() implementation, it's used:

In exists(), __len__(), __iter__(), __bool__() and __getitem__() - these all operate on the cache if it's populated.
In _fetch_all() - this queries the database to populate the cache if it isn't already populated, and is called by methods like __len__().
In delete() and update() - these both clear the cache.
In __deepcopy__() - this explicitly ignores the cache so it isn't copied to new querysets.

11.4. When the cache is cleared

There are two main places where the result cache is invalidated:

After writes, and
when returning a clone of the queryset, which happens after calling common functions like .filter().

Avoid using `.all()`

Something to watch out for is that calling .all() returns a new queryset without any cached results. The code below looks very similar to the first snippet, but issues a redundant query:

users = User.objects.all()
for user in users:  # SQL: SELECT id, name, etc FROM users
    print(user)

for user in users.all():  # SQL: SELECT id, name etc FROM users
    print(user)

These additional queries can add up fast in real-world cases - particularly if you've used select_related() and prefetch_related() in an attempt to avoid N+1 problems, in which case calling .all() again could issue multiple queries to populate your pre-fetching.

11.5. Where the cache breaks downs

There are times where it seems like Django could re-use the cached results, but doesn't. For example, imagine we have already retrieved all the users, and we want to narrow the queryset further:

users = User.objects.all()
for user in users:  # SQL: SELECT id, name, etc FROM users
    print(user)

narrowed_users = users.filter(id__in=[1, 2, 3])
for user in narrowed_users:   # SELECT id, name, etc FROM users WHERE id IN (1, 2, 3)
    print(user)

For complex filters that mimic SQL operations, we of course want to push that into the database. But it's easy to imagine how this particular filter could be implemented in Python - we know the rows are already cached in memory, and the filter we're applying can be done in one line of Python ([row for row in result_cache if row.id in (1, 2, 3)]). But filter() returns a new QuerySet with our new IN clause and re-fetches the matching User rows from the database.

11.6. Why doesn't the QuerySet provide an API to support this?

It would be nice to have an additional API to filter against cached data if the queryset has already fetched rows from the database. It must be pretty common that Django codebases end up with business logic split between different parts of the codebase - sometimes operating on querysets, sometimes on model instances, sometimes issuing new queries, etc. Eg. I can imagine a code path like:

You do some expensive pre-fetching at the top of your request path, and you pass the queryset along to some other code.
You need to call a method on your model instances which returns a value that you want to use to narrow the queryset further. Iterating your queryset to call each model instance is OK because you're re-using the cache. But then…
You have to call a third-party library which expects to operate on a QuerySet. You can give it a new queryset using .filter(myfield__in=my_matching_row_ids), but this is going to hit the database again and re-run all your expensive prefetching queries.

For this case, it would be useful if you could pass a queryset to the third party API which contains a narrowed view of the queryset that you already fetched from the database in step 1, with the cache already populated.

There are definitely arguments against offering an API like this - eg. it could introduce confusion around whether data is up to date or not, and "ideally" you should have structured your code using the right patterns to not get into these problems. But when that ship has already sailed I think a tool like this would have uses.

11.7. Hacking the result cache to avoid hitting the database

One way we can implement this behaviour is by overriding the result cache ourselves. For example:

# Build the root queryset and fetch the data
users = User.objects.all()
for user in users:  # SQL: SELECT id, name, etc FROM users
    print(user)

# Narrow your models to a list
narrowed_users = [u for u in users if u.id in (1, 2, 3)]

# Build your filtered queryset
narrowed_queryset = users.filter(id__in=[u.id for u in narrowed_users])

# Set the cache on the queryset
narrowed_queryset._result_cache = narrowed_users

# Iterate the queryset freely
for user in narrowed_users:   # Does not re-issue the query
    print(user)

# Further filtering will issue a new queryset that includes our filter
registered_users = narrowed_queryset.filter(is_registered=True)
for user in register_users:   # SELECT id, name, etc FROM users WHERE is_registered=true AND id IN (1, 2, 3)
    print(user)

This solves the problem we had - now we can call some Model/OOP functions to get the IDs of the rows we care about, and pass a new QuerySet downstream which matches on those IDs without re-fetching rows we already had from the database. They will only be re-fetched if the downstream code needs to further modify the queryset.

Is this terrible?

I searched for a bit and couldn't find examples of people doing this. I wouldn't expect it to be "supported" behaviour and it isn't something I'd want to do without a good test suite around my program to catch any changes in how QuerySet uses self._result_cache. You also want to be careful about using IN for large result sets. But it does seem to work nicely for some cases.

11.8. Further reading

There are a few Django libraries that provide different solutions for ORM caching. I don't think any of them would be helpful in this exact situation, but it's probably worth understanding what they can do.

You can find the QuerySet source code at django/db/models/query.py. Here it is on Github. It's quite readable, and is useful for understanding what a queryset does.

There are many articles about ORM pitfalls. I think a good starting point is Cal Paterson's The troublesome "Active Record" pattern and the pages linked there.

Questions about AWS Transfer

hi@mattduck.com (Matt Duck) — Sun, 24 Jan 2021 21:41:00 +0000

AWS Transfer Family is an AWS product that provides hosted FTP services as a frontend to manage files in S3 (or in EFS, newly added in January 2021, which I'm ignoring for this post). This is just me exploring a few of the options and concepts involved in configuring a transfer server.

12.1. What decisions are required when setting up a transfer server?

There are a few things to consider, including:

How to design and configure your S3 bucket(s).
Choosing the protocol(s) to use - FTP, SFTP or FTPS, and authentication methods (eg. whether to hook into external services like Active Directory).
Network design - whether this is an internet-facing or internal service, whether you need a fixed IP, DNS configuration, any ingress network security rules, or running on a non-standard port (eg. SFTP listening on something other than port 22).
Designing the IAM role setup, and the mapping of FTP users to home directories, to control the S3 paths that users have access to.

I'm primarily interested in configuring a public-facing SFTP service, so will focus on that.

12.2. How are IP addresses assigned to my server?

This depends on the type of "endpoint" chosen for your transfer service. There are two endpoint types: public and VPC. Public endpoints are only accessible over the internet. They have fewer network configuration choices to think about, but come with limitations. One of these limitations is that IPs are assigned by AWS and subject to change. This makes public endpoints unsuitable for cases where the FTP client is on a network with IP-based firewall rules.

If you want to assign a static IP to the FTP server, you must choose the VPC endpoint type, which means access to the server will be controlled by VPC security groups.

12.3. If I use a public endpoint type, will the server IP be in a known range?

AWS publishes a list of IP ranges used by its services and regions. It's not immediately clear to me which "service" the Transfer product corresponds to - I guess S3. These ranges can change, so if you want to make use of this information you have to be able to handle updates. AWS publishes updates as an SNS topic, which enables people to do this programmatically. I'm not sure how often these updates are published. If your only use-case is setting up an FTP service, it will be significantly simpler to use a static IP for the service than to worry about IP ranges.

12.4. What other features depend on the endpoint type?

There are a few differences. For example, public endpoints must use the SFTP protocol, and have no way to configure an allow-list of client IPs, or to change the port used for the FTP service.

AWS recommends using the VPC endpoint type as it provides more security features. I haven't seen any features that you get with the public endpoint type that can't be replicated with the VPC endpoint type. The AWS docs include a comparison of the different endpoint types.

12.5. What's a VPC?

Virtual Private Cloud is Amazon's "virtual network" abstraction - it's what provides your networking configuration. It's a logically-isolated network that you define with traditional network features configuration features like subnets, routing tables and internet gateways.

VPC isn't a feature that you can opt out of - certain AWS resources (eg. EC2 instances) must be launched into this network. If you don't specifically configure a VPC, AWS will initialise a "default" VPC when you first provision EC2 resources.

Security groups (which control the traffic that is allowed in/out of an EC2 instance) and network ACLs (which control traffic entering and exiting a subnet) are both features of the VPC.

12.6. How do I assign a static IP to a VPC-hosted transfer server?

VPC-hosted transfer servers can either be internal to the VPC, or internet-facing. To be internet-facing, they must be allocated an Elastic IP.

Elastic IPs are public IPv4 addresses that you can associate with resources in your AWS account. You have the option of importing your own address range, or receiving an address from Amazon's IPv4 pool. There are limits on the number of IPs you can request from Amazon (the default limit is 5).

If you're using the AWS console to create your transfer server, the endpoint details are pretty self-explanatory - there are dropdown fields to select a VPC, subnet and Elastic IP.

If you're using Terraform's aws_transfer_server to create the transfer server, you have to provide the VPC details as endpoint_details. This gives you the option of specifying the VPC ID, subnet, and a list of address_allocation_ids, which point to Elastic IPs. Allocations IDs are essentially just IDs used to identify an Elastic IP - you can see them clearly in the Elastic IP section of the console.

12.7. Why does the AWS console prompt me for a custom hostname?

If you're using Route53 to manage DNS for your domain, you can configure an "alias" record, which is a Route53-specific feature to automatically route traffic to an AWS resource.

In the AWS console, if you select to use "other DNS" instead of Route53, the transfer server form still contains a field for filling in the hostname - eg. I can enter ftp.mattduck.com. This confused me initially as I wasn't sure the purpose of entering a hostname if your DNS is configured externally. It turned out that this will create a new public zone in Route53 for the domain that you enter (mattduck.com), with a CNAME record (ftp) pointing to the auto-generated AWS hostname of the server.

Although this is a public zone, Route53 isn't the authoritative nameserver for mattduck.com, so it has no affect if I try to access ftp.mattduck.com on my laptop. I believe EC2 images are configured by default to use AWS nameservers, and that the CNAME entry would be resolved on an EC2 box within the VPC (subject to some VPC settings like enableDnsSupport).

12.8. How do I configure an external DNS entry to resolve to the transfer server?

You just create a CNAME record where the value is the public hostname of the transfer server. I'm not aware of any approaches other than this.

12.9. What types of server-side encryption does S3 support? How are these supported in AWS Transfer?

This is very surface-level, but there are three kinds of server-side encryption in S3:

SSE-S3: this is where AWS encrypts each object with a unique key, which it stores to automatically decrypt the object on read. Each key is itself encrypted with a master key, for which AWS manages rotation. If your bucket/objects use SSE-S3 encryption, then the only consideration is making sure the policy associated with the FTP user has the appropriate S3 read/write permissions - there's nothing extra to worry about.
SSE-KMS: KMS stands for Key Management Service. Instead of Amazon's master keys, encryption uses Customer Master Keys, which are managed in your account, and which have some additional features like being limited via policies and having audit trails for access. For buckets/objects that are encrypted with KMS, the policy associated with the FTP user must have some additional permissions for the relevant key - kms:Decrypt, kms:ReEncrypt, kms:GenerateDataKey and kms:DescribeDataKey.
SSE-C: in this option, the user must provide an encryption key when uploading an object to S3, and then that same key must be provided in subsequent requests to retrieve the object. This is not supported in AWS Transfer - which makes sense because FTP doesn't provide a way to specify this key.

12.10. How do SFTP users map to policies in AWS?

This is pretty simple, particularly using Terraform - you create an aws_transfer_user, assign them a role with the appropriate policy, and then create an aws_transfer_ssh_key. The user can then authenticate with the SFTP server and perform the actions granted by the policy.

12.11. How do you limit FTP users to particular directories?

This is probably the most involved / fiddly part to get right (at least if you want to support multiple isolated users with their own directories). There are a two types of home directory - PATH and LOGICAL. With the path-type, you're dependent on IAM policies to control bucket access. With the logical-type, you can create a map of the user's FTP directories to particular S3 targets.

The IAM policies support some user-relative variables like ${Transfer:UserName} and ${Transfer:HomeDirectory}, to make it easier to reuse policies. I haven't dived into this but I think there are some limitations on where they can be used - eg. only on roles but not users (or vice-versa).

12.12. Does the SFTP server support 4096-bit RSA keys?

Yes. I was curious about this. The docs state that the public key has a maximum length of 2048. I wondered if this meant the key size when using ssh-keygen would be limited to 2048. In fact, the concept of ssh key sizes (when specifying the number of bytes via eg. ssh-keygen -b 3072) refers to the length of the modulus value used to compute the RSA key pair - it's not directly related to the size of the files. When decoded, the files themselves contain the modulus, respective public/private values and some other data.

I think the AWS public key length just refers to the number of characters in the public key. When I use default ssh-keygen options with -b 4096 this comes out at 735.

Building a Firefox extension to highlight text

hi@mattduck.com (Matt Duck) — Sat, 27 Jun 2020 16:08:00 +0100

When I'm reading technical books on paper, I often highlight takeaways as I go, and then come back later to skim the highlighted passages and write notes.

For a while now I've wanted to be able to do this when reading long web pages, but browsers don't provide this as a feature, and I couldn't find a suitable Firefox extension, so I wrote one:

(If you're dying to find out how the text in the screenshot ends, you can find out here.)

If you press Ctrl + Shift + L when some text is selected, it will be highlighted as in the screenshot. If you press Ctrl + Shift + :, then all highlighted snippets on the page will be copied to the clipboard. Both these functions are also available through the right-click context menu.

That's it. Highlighting is not persisted between page visits.

13.1. Existing solutions

There are some existing Firefox extensions that provide highlighting features (eg. Textmarker), but they tend to either:

have a lot of features that I don't want to manage,
not have keyboard shortcuts,
not offer clipboard export, and/or
have very few users, in which case I prefer not to use them, because highlighting features like this require permissions to view all website data.

13.2. Learning how to write Firefox extensions

I hadn't done any webextensions work previously, but I suspected that it wouldn't be much effort to build these features. I needed to find out:

How to layout an extension and get it to run locally

This is covered in Mozilla's your first extension tutorial. Ultimately I just needed three files:

manifest.json, which contains some metadata about the extension,
content.js which runs in the context of a each web page, and
background.js, which runs independently of any particular pages or windows.

These files have to be bundled into a zip folder, which can be installed directly using Firefox Developer Edition.

What permissions configuration would be required

was required to run on all web pages, contextMenus was required to add items to the right-click menu, and clipboardWrite was required to write to the clipboard.

How to add keyboard shortcuts

This was accomplished through the usual JS event listeners for keydown.

How to write to the clipboard

There is a navigator.clipboard API, which worked fine.

How to highlight text

This part I already knew - it's trivial to add some CSS styling.

How to add right-click context menus

This required using the background script, and sending messages between background.js and content.js.

How to get the extension signed so I could run it on Firefox

I wanted to do this without having to go through the audit process for AMO and publishing it publicly. Once I had read the docs it was simple: you sign up at https://addons.mozilla.org, upload an extension, choose the option to not publish a public version, and then wait for the automated signing process to run. After a few minutes you receive an email, and a new "approved" version will appear on the addon page in your account. You can click to install the new version in Firefox.

13.3. The result: a super dumb web highlighter

You can find the code on Github: https://github.com/mattduck/firefox-sdwh. It's not very good, but it (mostly) works, and I'm finding it helpful.

I haven't made it downloadable publicly. I may do that later when I've used it for a while, am confident there aren't any issues, and have put the time in to let users customise it (eg. right now the keyboard shortcuts are hard-coded).

13.4. Extending your tools is fun and useful

I'm pretty fluent at extending typical developer tools (the shell, Emacs etc.), but hadn't applied this to the browser before. Now that I've done it once, it will be trivial to add small features in the future.

I think this is one of the major benefits of knowing how to code - having the ability to build and customise your software tools in a way that makes sense for you personally. If you have any ideas for features that you wish were included in Firefox, I recommend finding some boilerplate from the Mozilla docs and giving it a go.

Getting started with lsp-mode for Python

hi@mattduck.com (Matt Duck) — Tue, 28 Apr 2020 08:20:00 +0100

Language Server Protocol is a JSON-RPC protocol that allows text editors and IDEs to delegate language-aware features (such as autocomplete, or jump-to-definition) to a common server process. It means that editors can support smart language features just by implementing a generic LSP client.

I'm migrating my Emacs config to use LSP, starting with Python support. It's more powerful than my old setup, and I expect it will only improve over time, as language servers will benefit from more community attention than the long tail of Emacs packages.

Although there was not much configuration required to get LSP working in Emacs, it wasn't always obvious what I needed to do, and I couldn't find many examples of a full Python configuration to get started.

Here are a few questions and issues I encountered.

14.1. What do I need to install?

The most popular LSP client for Emacs is lsp-mode (although eglot is also in active development). lsp-mode tries to integrate with sensible existing tools to minimise user configuration - it supports popular language servers, and it hooks into Emacs packages like Flycheck and Company.

You will at least want to install lsp-mode, and probably also lsp-ui (which is focused on UI-altering features like popups and "sideline" information).

For Python support, there are two main language servers - pyls and Microsoft Python Language Server. I have used pyls. You have two options for installing pyls and its dependencies:

pip install python-language-server[all]. This will install pyls, and also install its various dependencies that provide particular features: rope for renaming, pyflakes for detecting errors, mccabe for complexity, etc.
pip install python-language-server, and install the dependencies you want directly.

14.2. Fixing a Jedi compatibility issue with pyls

At time of writing, python-language-server is not compatible with jedi version 0.16. Make sure you adhere to jedi<0.16,>=0.14.1.

14.3. How do I enable `lsp-mode` for Python?

For the base lsp-mode, the only required config is to call (lsp) when in python-mode:

(use-package lsp-mode
  :hook
  ((python-mode . lsp)))

(use-package lsp-ui
  :commands lsp-ui-mode)

14.4. pyls plugins: mypy, isort and black

Some integrations are not available by default in pyls, but are supported by plugins. You can install these with pip install pyls-black pyls-isort pyls-mypy.

To then enable them in lsp-mode, you can use (lsp-register-custom-settings):

(use-package lsp-mode
  :config
  (lsp-register-custom-settings
   '(("pyls.plugins.pyls_mypy.enabled" t t)
     ("pyls.plugins.pyls_mypy.live_mode" nil t)
     ("pyls.plugins.pyls_black.enabled" t t)
     ("pyls.plugins.pyls_isort.enabled" t t)))
  :hook
  ((python-mode . lsp)))

14.5. Fixing a pyls-mypy issue

At time of writing, the mypy plugin has an issue due to a missing future import. It can be resolved by pip install future. See pyls-mypy #37.

14.6. What about flake8?

flake8 is not mentioned in the pyls README, but it is supported. There are two options to enable it:

You can use (lsp-register-custom-settings) as before:

(lsp-register-custom-settings
 '(("pyls.plugins.flake8.enabled" t t)))

Alternatively, lsp-mode automatically turns some configuration parameters into custom variables, including the flake8 parameters. So:

(setq lsp-pyls-plugins-flake8-enabled t)

14.7. What other options does `pyls` support?

I'm not sure if there is a standard way to retrieve all supported configuration options from a language server. There is a pretty long list of pyls options in the vscode-client.

14.8. How do I inspect what lsp-mode is doing?

There are a few places you can look for info:

The *pyls::stderr* buffer. If something isn't working as expected, this may help identify the problem - eg. it will show issues loading particular plugins.
(setq lsp-log-io t) - this will log messages between the lsp client and server to a buffer. You can view the buffer by calling (lsp-workspace-show-log).
The lsp-client-settings variable. This contains all the lsp-mode settings for different language servers, and seems to be what you eventually modify when you run (lsp-register-custom-settings).
(lsp-describe-session) shows the capabilities of the current session. See the troubleshooting section of the lsp-mode README.

14.9. How do I support multiple projects?

Python projects often use virtual environments to manage project dependencies. My understanding is that pyls has to be installed in the project's virtualenv, in order to access dependencies for project-aware features like checking symbol references.

Neither lsp-mode nor pyls seem to provide features to manage multiple projects - the appropriate virtualenv needs to be managed separately.

In the past I've used pyvenv for this with some success. For various reasons pyvenv is only able to set a single global virtualenv at a time, but it does have a "tracking" mode, which can automatically change the global virtualenv using dir-locals.

I use a default "emacs" virtualenv as a fallback for editing things like org-mode Python buffers.

(use-package pyvenv
  :demand t
  :config
  (setq pyvenv-workon "emacs")  ; Default venv
  (pyvenv-tracking-mode 1))  ; Automatically use pyvenv-workon via dir-locals

You can use (add-dir-local-variable) to set pyvenv-workon for a particular project.

14.10. The final code

My initial config for lsp-mode also included a few other settings. It looked roughly like this:

(use-package lsp-mode
  :config
  (setq lsp-idle-delay 0.5
        lsp-enable-symbol-highlighting t
        lsp-enable-snippet nil  ;; Not supported by company capf, which is the recommended company backend
        lsp-pyls-plugins-flake8-enabled t)
  (lsp-register-custom-settings
   '(("pyls.plugins.pyls_mypy.enabled" t t)
     ("pyls.plugins.pyls_mypy.live_mode" nil t)
     ("pyls.plugins.pyls_black.enabled" t t)
     ("pyls.plugins.pyls_isort.enabled" t t)

     ;; Disable these as they're duplicated by flake8
     ("pyls.plugins.pycodestyle.enabled" nil t)
     ("pyls.plugins.mccabe.enabled" nil t)
     ("pyls.plugins.pyflakes.enabled" nil t)))
  :hook
  ((python-mode . lsp)
   (lsp-mode . lsp-enable-which-key-integration))
  :bind (:map evil-normal-state-map
              ("gh" . lsp-describe-thing-at-point)
              :map md/leader-map
              ("Ff" . lsp-format-buffer)
              ("FR" . lsp-rename)))

(use-package lsp-ui
  :config (setq lsp-ui-sideline-show-hover t
                lsp-ui-sideline-delay 0.5
                lsp-ui-doc-delay 5
                lsp-ui-sideline-ignore-duplicates t
                lsp-ui-doc-position 'bottom
                lsp-ui-doc-alignment 'frame
                lsp-ui-doc-header nil
                lsp-ui-doc-include-signature t
                lsp-ui-doc-use-childframe t)
  :commands lsp-ui-mode
  :bind (:map evil-normal-state-map
              ("gd" . lsp-ui-peek-find-definitions)
              ("gr" . lsp-ui-peek-find-references)
              :map md/leader-map
              ("Ni" . lsp-ui-imenu)))

(use-package pyvenv
  :demand t
  :config
  (setq pyvenv-workon "emacs")  ; Default venv
  (pyvenv-tracking-mode 1))  ; Automatically use pyvenv-workon via dir-locals

This is not perfect and will definitely require future work, but it's a useful start. In theory, adding support for a new language should only require installing the language server and adding a couple of lines of elisp to enable the new language.

I'll be updating my config on github.

Some Emacs Lisp exercises

hi@mattduck.com (Matt Duck) — Thu, 05 Mar 2020 15:15:00 +0000

15.1. Background: emacs.london

Since mid-last year I've been helping to organise a monthly Emacs event. It's a pretty low-key affair: we hang out in a room, sometimes do hands-on coding, sometimes share things on a screen, and then sometimes go to the pub afterwards.

Our usual format doesn't include speakers. This is partly because the best part of the meetup for us is meeting other attendees - presentations can be hit-or-miss. But it's also because it's a lot of work to arrange speakers for a monthly event, and a low-maintenance format means we can reliably keep it running.

Sometimes though, it can be too unstructured, and it would be nice to be given some direction on what to work on. We thought it would be fun to write some Emacs Lisp exercises to help with these situations, in the spirit of 4clojure.

15.2. The exercises

You can find the most recent exercises on at https://emacs.london/dojo.html. They're written as a single org-mode file, which you can get directly from github.

15.3. How it works

The idea is similar to the 4clojure exercises: you should fill in the function labelled __, so that when you execute it, the Pass: message shows t.

Here's the very first exercise. You just have to write a function that transforms the given string into uppercase:

(defun __ (s))

(let* ((result (__ "helloworld"))
       (pass (string= result "HELLOWORLD")))
  (format "Pass:%S\n%S" pass result))

The above code snippet can be executed in org-mode by pressing C-c C-c. When you do this, you'll see the results message added to the buffer:

#+RESULTS:
: Pass:nil
: nil

The Pass:nil indicates that the test failed. The second line shows us the value that our function returned - in this case also nil.

Let's plug in a solution that uses the standard upcase function:

(defun __ (s)
  (upcase s))

(let* ((result (__ "helloworld"))
       (pass (string= result "HELLOWORLD")))
  (format "Pass:%S\n%S" pass result))

Now on execution we see:

#+RESULTS:
: Pass:t
: "HELLOWORLD"

This time we're good.

15.4. You can contribute on on github

We're still in the early stages of building these materials - we may or may not find them useful over time. More exercises would be welcome though, and I'm sure there are some easy improvements or mistakes that can be corrected.

If you're interested in any of this then you'll be very welcome at the meetup.

More Emacs fuzzy integrations

hi@mattduck.com (Matt Duck) — Tue, 03 Mar 2020 16:14:00 +0000

A week ago I wrote about using a Helm pop-up frame as fuzzy launcher, to provide an Alfred-like interface for launching programs, running system commands and making web searches.

Since then, I've added some new integrations that I'm finding really useful, and I'm wondering why I didn't write them a long time ago.

16.1. How it looks

Everything below is using the Emacs theme named intellij.

Opening org-mode headlines

This opens a headline matching a particular keyword in a popup frame, using an indirect buffer:

Running org-capture from anywhere

This displays a frame which is automatically closed when the capture is finished:

Stopping the org-mode clock from anywhere

I always found it awkward that I had to open an org-mode buffer in Emacs just to clock out:

Serving a particular directory

There isn't much to see here, but in the background it's starting a webserver in a pre-configured directory:

Listing and killing processes

This could easily be extended to add other actions, eg. yanking the PID:

Opening directories with a file browser

This uses fd to find directories:

Opening files with their default program

This uses fd to find files, and xdg-open to open them:

16.2. Emacs features now feel natural from anywhere

Certain Emacs features are core to how I work. I don't mean coding features, but managing my work: capturing notes quickly in meetings, checking my outstanding notes for the day, recording time spent on particular tasks, etc.

I want these features to be available at all times, and I want invoking them to feel second nature. Not because it saves any substantial amount of time, but because it minimises interruption, and keeps me focused on the content of my work.

Although I've always been very quick at invoking commands in Emacs, it hasn't always felt like a natural interaction. I have to bring up Emacs in my window manager, potentially mess up my existing buffer arrangement, and display unrelated work side-by-side (eg. my org-capture window will appear alongside the code that I was working on).

This popup frame approach fixes that: I'm now one global keybinding and a couple of letters away from whatever I need. It feels much better, and I expect it to become a permanent part of my workflow.

16.3. The code

You can find the code for everything above in my dotfiles. It's all very similar to my previous post, which covers the base "Alfred" code in more depth, including the i3 settings to create a global key binding and configure floating frames.

Emacs as a fuzzy launcher and Alfred-replacement

hi@mattduck.com (Matt Duck) — Tue, 25 Feb 2020 00:00:00 +0000

When I was using MacOS, I used to like Alfred, the multi-purpose fuzzy finder/program-launcher. If you haven't used it, it's similar nowadays to Spotlight, which is built-in and accessed by pressing cmd+space.

On Linux, the closest thing I've used to Alfred is Albert, which has a lot of the same functionality. It's a good project, and I was happy with it for a while.

Sometime last year though I read a great post by Álvaro Ramírez, demonstrating a proof of concept for building a similar interface in Emacs, using Ivy and Hammerspoon (on MacOS).

For six months or so I've been using my own version of this to launch programs, run system commands and perform searches:

17.1. How?

Most of the credit goes to Álvaro. I just adapted his frame-managing code to work with Helm and i3, and wrote some Helm sources to implement the features that I want.

17.2. Tell me more

Helm is a popular fuzzy completion framework for Emacs. I use it for many things already (selecting buffers, M-x commands, help commands, etc.), so it's the natural choice for me to implement any kind of fuzzy matching feature.

The entry point is the md/alfred function below, which does a few things:

Create a buffer named *alfred*.
Make a new "frame" (ie. a new X window) for this buffer, applying some parameters to resize and bring it into focus.
Set variables to disable the mode-line and the message area, and apply some Helm styling parameters.
Call Helm with a list of custom "sources" (which we'll get to next), telling it to use our new buffer.
After Helm is done, delete the new frame and kill our *alfred* buffer.

(defun md/alfred ()
  (interactive)
  (with-current-buffer (get-buffer-create "*alfred*")
    (let ((frame (make-frame '((name . "alfred")
                               (window-system . x)
                               (auto-raise . t) ; focus on this frame
                               (height . 10)
                               (internal-border-width . 20)
                               (left . 0.33)
                               (left-fringe . 0)
                               (line-spacing . 3)
                               (menu-bar-lines . 0)
                               (right-fringe . 0)
                               (tool-bar-lines . 0)
                               (top . 48)
                               (undecorated . nil) ; enable to remove frame border
                               (unsplittable . t)
                               (vertical-scroll-bars . nil)
                               (width . 110))))
          (alert-hide-all-notifications t)
          (inhibit-message t)
          (mode-line-format nil)
          (helm-mode-line-string nil)
          (helm-full-frame t)
          (helm-display-header-line nil)
          (helm-use-undecorated-frame-option nil))
      (helm :sources (list (md/alfred-source-system)
                           (md/alfred-source-apps)
                           (md/alfred-source-search))
            :prompt ""
            :buffer "*alfred*")
      (delete-frame frame)
      ;; If we don't kill the buffer it messes up future state.
      (kill-buffer "*alfred*")
      ;; I don't want this to cause the main frame to flash
      (x-urgency-hint (selected-frame) nil))))

17.3. Helm sources

The code above provides a pop-up window that runs Helm, but that's it. To implement useful functionality, we need to write some Helm "sources", which control the input and output integrations with Helm.

17.4. System commands: lock, sleep, restart, shutdown

Sleep, restart and shutdown are all features of systemctl. Lock features are also accessible via the command line. This means we just have to build a Helm source that runs an external command when we enter a particular word.

There are two parameters to helm-build-sync-source that make this easy: :candidates and :action.

(defun md/alfred-source-system ()
  (helm-build-sync-source "System"
    :multimatch nil
    :requires-pattern nil
    :candidates '(("Lock" . "xset dpms force off")  ;; turns laptop screen off and triggers i3lock
                  ("Sleep" . "systemctl suspend -i")
                  ("Restart" . "systemctl reboot -i")
                  ("Shutdown" . "systemctl poweroff -i"))
    :action '(("Execute" . (lambda (candidate)
                             (shell-command (concat candidate " >/dev/null 2>&1 & disown") nil nil))))))

Each item in :candidates is an alist where the car (the left side) represents the displayed value in Helm, and the cdr (the right side) represents the value that gets passed to our action.

:action defines a lambda which operates on these right-side values when selected. We just have to use (shell-command) to execute the selected command. We redirect all output to /dev/null so it doesn't display anywhere, and also run disown so that the process is no longer owned by the shell - this will let you close Emacs without affecting any program that you've executed with Helm.

17.5. Launching programs

This has a similar solution to our system commands implementation, with one extra step: where do we find the list of GUI programs to launch? You could define this manually, but it would be nice if we could automatically retrieve them, Alfred-style.

On Arch Linux, you can find a list of .desktop files installed in /usr/share/applications. These Desktop entries implement the XDG Desktop Menu specification, which tells environments like GNOME and KDE how to launch GUI programs, what name to display in a launcher menu, what icons to use, etc.

In theory, we could parse these files to get the user-friendly name for the program (maybe by using lsdesktop). Instead, I've done something worse but much quicker to implement: we just list all the .desktop files in the directory, and then pass them to gtk-launch to execute them.

As above, just make sure to disown the process, so that it isn't coupled to Emacs:

(defun md/alfred-source-apps ()
  (helm-build-sync-source "Apps"
    :multimatch nil
    :requires-pattern nil
    :candidates (lambda ()
                  (-map
                   (lambda (item)
                     (s-chop-suffix ".desktop" item))
                     (-filter (lambda (d) (not (or (string= d ".") (string= d ".."))))
                              (directory-files "/usr/share/applications"))))
    :action '(("Launch" . (lambda (candidate)
                            (shell-command (concat "gtk-launch " candidate " >/dev/null 2>&1 & disown") nil nil))))))

17.6. Web search

Web search is a bit different, as we're not directly launching programs - we instead need to build a URL with the typed search term.

I also want one more feature from Alfred: key prefixes to trigger particular searches. Typing d my search term should open a search in DuckDuckGo, and typing g my search term should search Google.

So let's again define a :candidates list, with the displayed value as the car and the actual value as the cdr. This time though, our "value" is itself going to be an alist, containing the letter prefix, and the URL structure for that search:

(defvar md/alfred-source-search-candidates
  '(("DuckDuckGo" . ("d" . "https://www.duckduckgo.com/?q=%s"))
    ("Google" . ("g" . "https://www.google.co.uk/search?q=%s"))))

In our previous Helm sources, we would type something, and Helm would match it against the display value of our candidate: eg. if I typed "Loc", it would put me on the entry named "Lock". This time, we're going to use :match to define a custom matching function, which will look up the assigned letter for each candidate.

...
  ;; Count it as a match if the prefix matches, eg. "d ..."
  :match '((lambda (candidate)
             (string= (car (cdr (assoc candidate md/alfred-source-search-candidates)))
...                    (car (split-string helm-pattern)))))

This should work, but we can make it a bit nicer to use: instead of just displaying "DuckDuckGo" as the selected item in Helm, we could display "DuckDuckGo: my current search term". This can be done with :filtered-candidate-transformer, which transforms the displayed value for our currently-narrowed list of candidates:

...
  ;; Instead of displaying the exact thing that you type, display "DuckDuckGo: %s..."
  :filtered-candidate-transformer '((lambda (candidates source)
                                      (map 'list (lambda (c)
                                                   (cons (format "%s: %s" (car c)
                                                                 (md/strip-first-word helm-pattern)) (cdr c)))
                                           candidates)))
...

Finally, we have the :action stage. This is simple: it will just build and encode the URL, and use (browse-url) to open it in our preferred browser.

Overall, it looks like this:

(defvar md/alfred-source-search-candidates
  '(("DuckDuckGo" . ("d" . "https://www.duckduckgo.com/?q=%s"))
    ("Google" . ("g" . "https://www.google.co.uk/search?q=%s"))))

(defun md/strip-first-word (s)
  "Remove the first word from a string"
  (string-remove-prefix (format "%s " (car (split-string s))) s))

(defun md/alfred-source-search ()
  (helm-build-sync-source "Search"
    :nohighlight t
    :nomark t
    :multimatch nil
    :requires-pattern t
    :candidates md/alfred-source-search-candidates
    ;; Count it as a match if the prefix matches, eg. "d ..."
    :match '((lambda (candidate)
               (string= (car (cdr (assoc candidate md/alfred-source-search-candidates)))
                        (car (split-string helm-pattern)))))
    :fuzzy-match nil
    ;; Instead of displaying the exact thing that you type, display "DuckDuckGo: %s..."
    :filtered-candidate-transformer '((lambda (candidates source)
                                        (map 'list (lambda (c)
                                                     (cons (format "%s: %s" (car c)
                                                                   (md/strip-first-word helm-pattern)) (cdr c)))
                                             candidates)))
    ;; Build the URL, replacing %s with your input. Open it with browse-url.
    :action '(("Search" . (lambda (candidate)
                            (browse-url (format (cdr candidate) ;; the url
                                                (url-hexify-string
                                                 ;; This removes the "g " part from the string
                                                 (md/strip-first-word helm-pattern)))))))))

17.7. Launching `(md/alfred)` with i3

We now have all the above functionality inside Emacs. It can be launched with (md/alfred). However, to properly take advantage of these features, we need a global keybinding.

My window manager is i3, which allows you to configure keybindings by editing ~/.config/i3/config. We can add a binding like this:

bindsym $mod+space fullscreen disable; exec "emacsclient -ne '(call-interactively (quote md/alfred))'"

When I press $mod+space, it will now disable fullscreen, execute Emacs and call (md/alfred). I use emacsclient because it's significantly faster than starting a new Emacs instance.

Floating window

To ensure that the window doesn't get tiled by i3, I set frames marked as "alfred" to be floating:

for_window [class="^Emacs$" instance="^alfred$"] floating enable

You can also use this selection to enable other custom parameters for the frame. For example, I can set a border width:

for_window [class="^Emacs$" instance="^alfred$"] border pixel 1

17.8. The end

That's it. I've been happily using the described setup for six months now (with a couple of extra features).

17.9. Why?

Aside from the mega fun we just had, I think there are some genuine upsides:

It can support cross platform. Having the same launcher and fuzzy features between OSes seems really useful. It will require some platform-specific code (eg. to launch programs appropriately), but that's not a huge amount of work.
Compared to Spotlight, Alfred, and even Albert, it's really easy to edit. I can even do it on the fly - just eval something in Emacs and I'll see the result immediately. You have close control over appearance if that's important to you: you can set Helm faces, frame parameters, etc. You also have fully customisable keybindings - it's Emacs, so you can do whatever you want there.
When I used Alfred, I found myself installing it on different machines, and having to manually set up my preferred searches etc. each time. Now it's just part of my dotfiles and works automatically.
It means one less program to care about.

17.10. Next steps

There are a few features that I'd like to expand:

A proper .desktop-aware program launcher.
A calculator that can eval basic math on the fly (without me having to write it as lisp).
A dictionary lookup feature. A workaround is to add a search feature for a dictionary website, but something more native would be nice.
A clipboard manager: this has the most unknowns for me. Alfred had features where it could retrieve clipboard history, but I'm not sure if this is achievable via Emacs. If anybody has any pointers then I'd be very happy to hear them.

You can look through my init.el to see more details of my implementation.

Undo implementations in classic text editors

hi@mattduck.com (Matt Duck) — Tue, 28 Jan 2020 00:00:00 +0000

I wanted to add undo/redo support to my version of the kilo text editor. I wrote some separate notes on that here. As part of that work I had a look at how undo features are implemented in Nano, Emacs and Neovim.

18.1. Common patterns for undo

I think undo/redo can be considered a solved problem. It has been implemented many times in different types of software, and there are a few standard solutions that people go to.

One solution is the command pattern. The idea is that for each action in your program, you can execute a corresponding undo action. For example, for an action like "insert these 5 characters at cursor position xy", the undo action might be "delete 5 characters at cursor position xy". You store these commands on respective undo and redo stacks. When the user wants to undo an action, you perform the undo operation and pop the original action onto the redo stack.

The most written-about alternative (in my experience reading online) is often named the memento pattern. The key difference is that rather than storing an undo command that can mutate state, you store the past state itself (eg. the state of the rows). Your undo operation can then just restore the previous state.

Both these patterns are described in the GoF Design Patterns book - I suspect this has popularised the terminology that I see online. There are many ways you can implement undo features that roughly follow the described patterns, but don't strictly adhere to the class structure or functions in the book.

Broadly though we have two approaches - one that stores actions that can be reversed later, and one that stores state that can be restored.

18.2. How do the classic text editors do it?

These programs have been undoing for decades, so they must be doing something right.

18.3. Nano

Nano clearly follows the reverse-an-action approach. nano.h describes an enum of supported undo actions, named undo_type. Actions have labels like INSERT, JOIN, and COMMENT - these are specific, granular actions that have dedicated undo operations.

A history item is represented by the undostruct type, which stores the action and some associated data (this includes a *next pointer to go further back in time):

typedef struct undostruct {
	undo_type type;
		/* The operation type that this undo item is for. */
	int xflags;
		/* Some flag data to mark certain corner cases. */
	ssize_t lineno;
		/* The line number where the operation began or ended. */
	size_t begin;
		/* The x position where the operation began or ended. */
	char *strdata;
		/* String data to help restore the affected line. */
	size_t wassize;
		/* The file size before the action. */
	size_t newsize;
		/* The file size after the action. */
	groupstruct *grouping;
		/* Undo info specific to groups of lines. */
	linestruct *cutbuffer;
		/* A copy of the cutbuffer. */
	ssize_t mark_begin_lineno;
		/* Mostly the line number of the current line; sometimes something else. */
	size_t mark_begin_x;
		/* The x position corresponding to the above line number. */
	struct undostruct *next;
		/* A pointer to the undo item of the preceding action. */
} undostruct;

Throughout the codebase, there are calls to a function add_undo(), which accepts an undo_type, creates an undostruct with the appropriate metadata, and sets it to a field named undotop, which represents the top of the undo history.

The do_undo() function then executes appropriate logic in a switch to reverse an action. Here's a snippet of the main switch statement in do_undo() (from text.c):

	case INDENT:
		handle_indent_action(u, TRUE, TRUE);
		undidmsg = _("indent");
		break;
	case UNINDENT:
		handle_indent_action(u, TRUE, FALSE);
		undidmsg = _("unindent");
		break;
#ifdef ENABLE_COMMENT
	case COMMENT:
		handle_comment_action(u, TRUE, TRUE);
		undidmsg = _("comment");
		break;
	case UNCOMMENT:
		handle_comment_action(u, TRUE, FALSE);
		undidmsg = _("uncomment");
		break;
#endif

It's pretty easy to read the undo code in Nano - most of it can be found in the linked files.

18.4. Emacs

Emacs also has an action-oriented implementation, although it's a much bigger codebase.

The Emacs source code consists of a smaller C core which implements some central components and the emacs-lisp interpreter. Most of the code is then written in emacs-lisp. Undo features span both of these layers. We start at undo.c.

Fun fact alert: using git blame, parts of undo.c go back to a version labelled "Initial revision" from April 1991 (29 years ago! According to Wikipedia the first GNU Emacs release was 1985, so I'm not sure the significance of this exact commit).

In undo.c, we can see a structure named undo_list, which stores the undo history for a buffer. Like with Nano, multiple action types are supported. One difference to the Nano implementation is that undo_list can store a nil value as a "boundary". The boundary is used to group multiple actions into a single undo operation.

In the elisp layer, undo_list is represented by the buffer-undo-list variable. If you're in Emacs and you call (describe-variable 'buffer-undo-list) you'll see an excellent overview of the types of data in this file, and you can also see its current state. Here's a snippet of buffer-undo-list for the Emacs buffer that I'm using to write this post:

(nil
 (4720 . 4721)  ;; insertion at point
 4719  ;; cursor movement ?
 nil  ;; boundary
 (4697 . 4702)  ;; insertion at point
 (#("w" 0 1  ;; ?? this one isn't obvious. Maybe a deletion or property change.
    (fontified t))
  . -4697)
 (# . -1)  ;; changes to a "marker" point in the file
 (# . -1)
 (# . -1)
 ...)

My history continues for about 2000 lines just in this buffer. There is a global undo limit (defined as 80,000 for me in the C source). According to the undo-limit documentation, old undo history gets cleaned up in garbage collection.

undo.c describes some functions that add state to the undo_list. For example, record_point() ("point" means cursor position), record_insert(), record_delete(), etc. These record_$thing() functions are called in a few places in the C code to store actions to later undo (for example in insdel.c). I'm sure there are some similar functions in the elisp layer too, but I haven't looked in depth.

The actual execution of the undo actions also happens in the elisp layer, in (primitive-undo). The comments make it reasonably clear to understand. For example, in order to undo a text insert, it does (delete-region beg end). The snippet below is from the equivalent of Nano's switch statement:

---
;; Element (nil PROP VAL BEG . END) is property change.
(`(nil . ,(or `(,prop ,val ,beg . ,end) pcase--dontcare))
 (when (or (> (point-min) beg) (< (point-max) end))
   (error "Changes to be undone are outside visible portion of buffer"))
 (put-text-property beg end prop val))
...
;; Element (BEG . END) means range was inserted.
(`(,(and beg (pred integerp)) . ,(and end (pred integerp)))
 (when (or (> (point-min) beg) (< (point-max) end))
   (error "Changes to be undone are outside visible portion of buffer"))
 ;; Set point first thing, so that undoing this undo
 ;; does not send point back to where it is now.
 (goto-char beg)
 (delete-region beg end))
...
;; Element (apply FUN . ARGS) means call FUN to undo.
...

You can see that one of the action types is an arbitrary function that gets applied when executing the undo. I haven't looked at exactly where this is used, but I assume that one of the use-cases is so third-party extensions can integrate with the undo feature.

I'm sure there is a lot of undo-related code through the codebase that I haven't mentioned here, especially in the lisp parts. I think these cover the basics though.

18.5. Neovim

Neovim's implementation is the most daunting to jump into - its undo.c comes in at 3000 lines. It has similarities to the other implementations, but stores and restores some general state fields, with less explicit differentiation between types of user actions.

The undo functions and data structures are prefixed with u_. The first main struct (defined in undo_defs.h) is u_header, which holds pointers to manage the undo history, and also holds some buffer state like cursor position.

u_header also stores the time of the undo action. This is used to support the handy time-based command :earlier, which lets you revert back to previous state by time (eg. :earlier 2h will undo back to the state recorded 2 hours ago). As far as I'm aware neither Nano nor Emacs support this.

The second important struct is u_entry_T, which stores an array of text lines in **ue_array (which is used to restore buffer text), and some other metadata. I've copied u_entry_T below:

typedef struct u_entry u_entry_T;
struct u_entry {
  u_entry_T   *ue_next;         /* pointer to next entry in list */
  linenr_T ue_top;              /* number of line above undo block */
  linenr_T ue_bot;              /* number of line below undo block */
  linenr_T ue_lcount;           /* linecount when u_save called */
  char_u      **ue_array;       /* array of lines in undo block */
  long ue_size;                 /* number of lines in ue_array */
#ifdef U_DEBUG
  int ue_magic;                 /* magic number to check allocation */
#endif
};

undo.c defines a few functions to record an undo, such as u_save(), u_save_cursor() and u_savedel(), although most of the logic is delegated to u_savecommon(), which generates new u_entry_T and u_header instances where appropriate. These functions are then called in various places in the codebase where the buffer is modified, such as ops.c, edit.c, normal.c and ex_cmds.c.

There are a couple of entry points to perform an undo operation. The handler for pressing u in normal mode is nv_undo(). This calls a chain of functions: nv_kundo() -> u_undo() (back in undo.c) -> u_doit() -> u_undoredo().

u_undoredo() is where the main undo logic is applied. Some of the more significant lines are the calls to ml_delete(), ml_replace() and ml_append(), which delete or update text lines (all defined in memline.c). The text state is retrieved from u_entry_T->ue_array. For example:

/* insert the lines in u_array between top and bot */
if (newsize) {
  for (lnum = top, i = 0; i < newsize; ++i, ++lnum) {
    /*
     * If the file is empty, there is an empty line 1 that we
     * should get rid of, by replacing it with the new line
     */
    if (empty_buffer && lnum == 0) {
      ml_replace((linenr_T)1, uep->ue_array[i], true);
    } else {
      ml_append(lnum, uep->ue_array[i], (colnr_T)0, false);
    }
    xfree(uep->ue_array[i]);
  }
  xfree((char_u *)uep->ue_array);
 }

Reading u_undoredo(), it's clear that Vim's implementation is more generalised than in Nano and Emacs, which both have clear switch-like statements that say "if you find item x on the undo history, apply operation y". In Nano different actions are represented by an enum, and in Emacs different actions have different lisp structures.

In Vim, regardless of the type of user action performed, general buffer state is stored and restored using u_header_T, u_entry_T and visualinfo_T. The data they contain looks quite general - I don't see anything action-specific here, like the JOIN type in Nano, or the custom function that you can store in Emacs. Emacs' items are comparatively simple as they only record state that is needed - eg. an insertion can just be represented in the undo history by (BEG . END).

It looks like there is a lot more to uncover in this implementation, but hopefully that describes some of the basic points. Let's look at something different.

18.6. A reusable solution that copies state: Redux

I'm not super fluent in modern JS frameworks, but I knew that Redux was based around reducers that operate on immutable state, and that there were generalised undo solutions that could implement undo/redo by keeping a copy of that state.

The Redux documentation actually describes an implementation for using Redux to implement undo by restoring past state. Coding this from scratch isn't required though as there is a popular implementation: Redux Undo.

Redux Undo is a generalised implementation of the restore-past-state approach: you only need to decorate specific reducers as undoable, and it will store all the state changes for you.

The advantage here is that you can add undo/redo behaviour to components without having to do any specific work to record or reverse particular operations: the actions are the same for any use-case as you're just copying and restoring your state.

At its most basic, you can start recording your state history in a few lines of code (There are more features and various configuration documented in the README):

// context: 'counter' is a reducer that increments or decrements a number.

import { combineReducers } from 'redux'; // Redux util functions.
import undoable from 'redux-undo'; // the undoable higher-order reducer.
combineReducers({
  counter: undoable(counter, {
    limit: 10 // set a limit for the size of the history.
  })
})

// See https://github.com/omnidan/redux-undo for more details.

18.7. What's the catch?

This sounds exciting - it looks like much less work to implement than even Nano's undo feature. The ease of implementation doesn't come for free though. The main cost I see is that state copies will need to be stored in memory. Whereas with a command-based approach you could have said "delete the characters at position xy to reverse the user's insert action", you're now storing the old state in order to restore it. This means you need to be conscious that you're not copying extra state when it hasn't changed, and that you have sensible limits on your undo history.

I'm not sure how often this is an issue with Redux Undo in practice. I did read this redux-undo issue discussion about using diffs between states to reduce the memory overhead. It led me to a couple of small projects that suggest it has been a concern at least for some people:

Undox is like Redux Undo but uses actions instead of storing state, to reduce the memory cost.
Redux Undo Stack is like Redux Undo but stores incremental changes instead of entire states.

I doubt that a third-party solution like this would work for a performance-conscious text editor, but it's definitely a valuable tool.

18.8. Conclusions

I'm not sure we learned anything profound here, but it's cool to see a few different approaches to a common software feature.

I'd like to look at the Vim implementation more as I've only scratched the surface.

I think it's worth reading through parts of the Nano/Vim/Emacs codebases. They're three ubiquitous text-editing programs that have been around for ages and are all written in C (at least partially), but that have some major differences. Nano is the easiest place to start unless you're particular enamoured with one of the others, because it's smaller and you don't need to learn about editor-specific concepts to follow it.

Adding undo/redo to kilo (and debugging memory usage)

hi@mattduck.com (Matt Duck) — Tue, 28 Jan 2020 00:00:00 +0000

I add a naive undo feature to kilo and then go on a detour to investigate memory usage.

19.1. The code

The code is on github. The important context is that we have a couple of structs, editorConfig, which represents the global state of our editor, and erow, which represents a row of text and some associated data:

typedef struct erow {
  int idx;  // which row in the buffer it represents
  int size;  // the row length
  char *chars;  // the characters in the line
  char *render;  // the "rendered" characters in the line, where eg. \t will expand to spaces
  int rsize;  // the length of the "rendered" line
  unsigned char *hl;  // the highlight property of a character
  int hl_open_comment;  // whether this line is part of a multiline comment
} erow;

typedef struct editorConfig {
  int cx, cy;  // cursor position
  int rx;  // render index, as some chars are multi-width (eg. tabs)
  int rowoff; // file offset
  int coloff; // same as above
  int screenrows; // size of the terminal
  int screencols; // size of the terminal
  int numrows;  // size of the buffer
  erow *row;  // current row
  int dirty;  // is modified?
  char *filename;  // name of file linked to the buffer
  char statusmsg[80];  // status message displayed on at bottom of buffer
  time_t statusmsg_time;  // how long ago status message was written, so we can make it disappear
  struct editorSyntax *syntax;  // the syntax rules that apply to the buffer
  struct termios orig_termios;  // the terminal state taken at startup; used to restore on exit
  int mode;  // vim-like normal/insert mode
} editorConfig;

19.2. The approach

A common undo solution is to store the user action, and then perform an equivalent reverse operation when the user presses "undo". For example, if the user inserts 5 characters at cursor position xy, we later may want to delete 5 characters at cursor position xy.

I wanted something more general though: this a toy editor and I'd like to be able to hack arbitrary features onto it without having to code a corresponding undo feature. I started with the easiest possible approach: just copy and restore all the global state. This could be done with relatively few changes:

19.3. Turn our `editorConfig` value into a pointer

The editor stores a global editorConfig in a variable E. This holds state like the cursor position and pointers to the erows that make up the buffer. I wanted to be able to swap this out for an older version, so I turned it into a pointer that represents the current global state.

I also added undo and redo pointers to the E struct. This could be used to link a linear branch of undo/redo states.

19.4. Write a `history_push()` function

…which uses malloc() and memcpy() to clone editorConfig and the erow structs that make up all the global state.

I can then update E to be a pointer to the current state, and set E->undo and E->undo->redo to point to the right versions of state based on the undo history.

I added a keyboard handler so I could manually invoke history_push() to test it.

19.5. Add `history_undo()` and `history_redo()`

These functions just return the appropriate E->undo/redo values. I added vim-compatible u and R keybindings to invoke these.

19.6. Behold:

That's about it. We have an undo feature.

19.7. Job done?

I suspect not! My biggest concern is that this is surely very memory inefficient. Let's validate that.

19.8. Snapshotting process memory changes with `pmap`

I want to see how the editor's memory usage changes when I run my my history_push() function. I'm sure there are heap-profiling or C-debugging tools I can use to inspect state as it changes, but for speed I'm just going to take some snapshots using pmap.

To see the delta in subsequent pmap calls we can write to a tempfile and use diff:

#!/bin/bash
#
# pmap-diff.sh
#
# Snapshot the results of pmap to a file, then diff it on
# subsequent calls. Scope per PID.
ARG="$1"
if [ -z "$ARG" ]; then exit 1; fi
pidof "$ARG" || exit 1;
PROCESSPID="$(pidof $ARG)"
F="/tmp/pmap-$ARG.$PROCESSPID"

if [ ! -f "$F" ]; then
    # pmap hasn't been called yet for this PID. Display the whole output.
    pmap "$(pidof $ARG)" -x > "$F";
    cat "$F";
    exit 0;
else
    # pmap has been called. Display the diff
    mv "$F" "$F".previous;
    pmap "$(pidof $ARG)" -x > "$F";
    head -n 2 "$F";
    diff "$F".previous "$F" && echo "no change";
    rm "$F".previous;
fi

19.9. The test

I have an example text file that we can use for this. It's about 28.7KB:

$ wc kilo-org.c
 1056  3826 28722 kilo-org.c

I open this file with the editor:

./kilo kilo-org.c

When I run ./pmap-diff.sh kilo for the first time, I see the full output of pmap:

Address           Kbytes     RSS   Dirty Mode  Mapping
00005652b5815000       8       8       0 r---- kilo
00005652b5817000      24      24       0 r-x-- kilo
00005652b581d000       8       8       0 r---- kilo
00005652b581f000       4       4       4 r---- kilo
00005652b5820000       4       4       4 rw--- kilo
00005652b67a1000     288     216     216 rw---   [ anon ] // this is the heap
00007f486d32b000     148     148       0 r---- libc-2.30.so
00007f486d350000    1332     888       0 r-x-- libc-2.30.so
00007f486d49d000     296     128       0 r---- libc-2.30.so
00007f486d4e7000       4       0       0 ----- libc-2.30.so
00007f486d4e8000      12      12      12 r---- libc-2.30.so
00007f486d4eb000      12      12      12 rw--- libc-2.30.so
00007f486d4ee000      24      24      24 rw---   [ anon ]
00007f486d52e000       8       8       0 r---- ld-2.30.so
00007f486d530000     124     124       0 r-x-- ld-2.30.so
00007f486d54f000      32      32       0 r---- ld-2.30.so
00007f486d558000       4       4       4 r---- ld-2.30.so
00007f486d559000       4       4       4 rw--- ld-2.30.so
00007f486d55a000       4       4       4 rw---   [ anon ]
00007ffc63ea8000     136      20      20 rw---   [ stack ]
00007ffc63f00000      12       0       0 r----   [ anon ]
00007ffc63f03000       4       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 --x--   [ anon ]
---------------- ------- ------- -------
total kB            2496    1676     304

I have a lot to learn about the intricacies of measuring process memory usage, but I know that RSS represents memory that is held in RAM, and Kbytes is the total usage include virtual memory, paging etc.

The row at address 00005652b67a1000 is the heap (you can confirm this by running pmap -X to see more verbose output):

00005652b67a1000     288     216     216 rw---   [ anon ]

Out of curiosity, if I run kilo without opening a file, the heap size is about 156KB smaller:

00005601d9ca8000     132       4       4 rw---   [ anon ]

Back on the version of kilo that has my example file open, if I invoke history_push() and run my pmap script again, I just see the diff since last time:

Address           Kbytes     RSS   Dirty Mode  Mapping
8c8
< 00005652b67a1000     288     216     216 rw---   [ anon ]
---
> 00005652b67a1000     420     400     400 rw---   [ anon ]
27c27
< total kB            2496    1676     304
---
> total kB            2628    1860     488 // +132, +184, +184

The only change is the heap size. My total heap memory has increased by 132K, and the RSS by 184K. That's not a great start considering my whole file is less than 29K.

Let's see what happens after a few more invocations of history_push(). Below are the changes. Each row is a diff against the row above it:

num calls	Heap Kbytes	Heap RSS
start	288	216
1	+132	+184
2	+288	+188
3	+132	+192
4	+132	+192
5	+312	+192
10	+836 (avg 167.2)	+956 (avg 191.2)
end	2120	2120

So it looks like our RSS memory allocation increases consistently by 192KB with every state copy. The first couple of calls instead increase by 184 and 188 respectively, but I'm ignoring this as I think it's easily explainable by the editor behaviour. However, I do wonder why sometimes our total heap size increases by 132, and other times it's more.

19.10. Where is this memory going?

We can reason about some of this heap increase by looking at the size of our data. Using sizeof() on our copied structs, I can see the following:

type	size (bytes)
struct editorConfig	232
struct termios	60
struct erow	48

We only make one copy of editorConfig and termios, so their footprint should be negligible. However, although erow is only 48 bytes, a new erow is put on the heap for each line in the file. There are 1056 lines, so (sizeof(erow) * 1056) immediately accounts for 50.6KB.

The char array for each row is stored in erow->chars. This stores all the text in the file, and we know the sum of all erow->chars arrays should come to about 29KB (because that's the file size). But there are two other char arrays of approximately the same size: erow->render and erow->hl, which respectively store a pretty-rendered version of the row and the syntax highlighting data for the row. Taking these into account is 3 * 28.7 = ~86KB.

We can now account for roughly 135KB between the size of the erow struct, and the contents of the erow arrays. I was initially expecting it might be close to 192KB, as we see the 192KB increase consistently. 135KB doesn't exactly match to anything, although it is close to the 132KB total memory increases that we see.

19.11. What about the other increases?

Why do we consistently see +192KB in RSS? I'm actually not sure without diving into some more detailed profiling. There isn't any obvious new data being allocated to the heap in the code. I ran a test where I disabled most of the editor's code other than history_push(), and still saw this +192KB pattern. I wonder if the copying logic makes the program move some other data that was already allocated to memory into RSS.

I have no good ideas about the total memory heap increases that are higher than 132KB - I think the most likely explanation is that I need to run valgrind and make sure I have no memory issues. I'm also curious why sometimes the total heap memory and the RSS memory are the same size, but sometimes they're different - that's a whole separate investigation though.

19.12. How can we make this feature more efficient?

I have a few questions about those memory numbers, but regardless we have some useful information: each copy of history will cost a fixed 48 bytes per row for the erow struct, and then an extra 3x the actual text size of the row (for row->chars, row->render and row->hl respectively). For our example file, that's ((1056 * 48bytes) + (28.7KB * 3)), which in total is about 135KB.

If I want to keep 50 of these revisions, it will cost at least ~6.75MB, and in practice it will be higher (as we have these unaccounted-for memory increases).

There are various ways we could try to reduce this cost. Some that immediately come to mind:

Stop storing the `erow->render` and `erow->hl` fields

…and instead just re-compute them on restore, as they can be completely recreated from the erow->chars state. This is cheap to implement, and will reduce the cost of storing the text data by 3x.

Don't store an `erow` struct at all in our undo state

All of the state in the erow can be recomputed based on the contents of our text. Every erow struct costs 48 bytes, but in our example text file, we average at 27.2 bytes per line. It would be cheaper to just store the entire contents of the file and recreate the erows later based on the text contents.

Shallow copy instead of deep copy

We're deep copying everything, even for parts of the buffer that don't change. This isn't necessary - we could just create pointers to rows that haven't changed. This could prevent a lot of duplicate storage.

Serialize and compress the state history

If we have to store the editor state, maybe we can make it smaller. Gzip can compress our example text file with ratios between 2.8:1 (quickest compression level) and 3.4:1 (slowest compression level). This is something that I could prototype in a few lines of Python or Go, but in C I'm not familiar with the ecosystem, so it might take some time to develop.

Compute a delta between different states

…and use it to recreate state on restore. This is a similar situation to serialization/compression for me - it's a lot more work to do this than to just write individual undo operations for the commands in the editor.

19.13. Rethink the generalised state-restore approach entirely?

Ultimately there are cases where all of these approaches can fall down, especially when dealing with very large files, or files with very long lines. There are also tradeoffs to make on whether we can compute the values that we're no longer storing in memory within a latency that's suitable for the user.

For any performance-conscious use case I think it will be more efficient to just store specific undo operations for each action that you want to undo - ie. follow the command pattern.

19.14. Next steps

I could have implemented shallow copy (as that sounds interesting), or even just written specific undo actions for all my editor operations.

Instead, I went on a massive tangent to see how Emacs, Vim and Nano implemented their undo features.

"Build your own text editor"

hi@mattduck.com (Matt Duck) — Wed, 22 Jan 2020 00:00:00 +0000

I recently followed snaptoken's build your own text editor booklet, which talks you through building a basic text editor in about 1000 lines of C (the kilo editor, written by Antirez). It was fun, and I'd recommend it to anybody who either (1) is interested in how graphical terminal programs work, or (2) wants to play a bit with C.

20.1. What was in the chapters?

Roughly in order, the steps were:

Write a main loop that uses read() to respond to input from stdin.
Put the terminal into "raw" mode - disable echoing, read one keypress at a time, etc. Save and restore the terminal configuration on program exit.
Add cursor movement.
Add file I/O and the ability to view files.
Add scrolling for when the file is bigger than the screen size.
Add a "rendering" translation layer which case be used to eg. display \t as a fixed number of spaces.
Add a status bar that shows the filename, current line etc. Also add a message area that can display user messages.
Add the ability to insert and delete text, with a "dirty" flag that tell the user if the buffer has been modified since last save.
Add a generic prompt, and then use it to implement incremental search.
Add basic syntax highlighting, which is triggered by filetype detection. This only supports C files, but can be extended.

20.2. The program structure

It was quite simple. There were two main data structures: the global editor, and the row. The editor kept an array of pointers to rows, each row representing a single line of text (plus some metadata, eg. the row size). The editor also kept track of the cursor position, the file offset for scrolling, the current status message, etc.

The program roughly just did:

main loop:
  read keypress;
  in response to keypress:
    update global editor state and row state;
    maybe quit if "q" was pressed;
  refresh screen with latest state;

The terminal interaction and cursor movement is all done using VT100 terminal escape sequences. I'm not sure how portable this is, but in practice I think it works for the few terminal emulators that I use.

The syntax highlighting has one feature that can tokenize across multiple lines: it recognises when comments begin and end. It is otherwise pretty naive, just matching keywords, strings and numbers. Having said that, at a glance it looks pretty similar to the highlighting I see in Vim.

Many of the functions operate on the global editor state. If I was going to seriously work on this project, I'd want to rewrite some of them to accept the editor as an argument rather than all mutating a single variable.

20.3. It was easy to extend the project

I added a few features that I use in Vim and Emacs (see Github):

Splitting user input into normal and insert modes.
Word-based cursor movement that is normally found with w/W/b/B
A new prompt to simulate :wq and :q!.
Standard cursor movement with hjkl, ^/$, C-f/C-b, gg and G.
Using dd to remove lines, and J to join lines.
Adding the jj and jk bindings that I use in insert mode to exit to normal mode (which means waiting for a follow-up key to j, and inserting it into the row if it doesn't come after a set timeout).

I was surprised at how much it felt like my usual environment for file browsing and basic editing. Although my implementation wasn't extensible or composable, most of the time I just rely on the same few bindings.

Given that my personal Emacs config has grown to about 5x the size of this program, it almost seems worth just writing the features I want from scratch!

20.4. I did it as literate programming with org-mode

I wanted to try writing notes alongside the code as I progressed, using org-mode. I compiled kilo from my README.org file, which can be done in a couple of lines of lisp:

;; Don't ask me to confirm each time I evaluate the file
(setq-local org-confirm-babel-evaluate nil)
;; concatenate all embedded C snippets to kilo-org.c
(org-babel-tangle nil "kilo-org.c" "c")
;; run make
(compile "make")

In the end I didn't find it very useful - it hid the actual source code too much, which made it harder to refactor and jump between sections of code quickly. I wonder if the org-mode approach is more useful for detailing one-off scripts and troubleshooting.

20.5. What similar projects exist?

openemacs is a small fork of kilo that implements some Emacs navigation features, which is worth reading if you're interested in modifying kilo.

The other good "build something from scratch" project that I've followed was The Elements of Computing Systems, where you build a (virtual) computer from first principles.

The Destroy All Software "From Scratch" screencasts follow the same idea, and are pretty enjoyable.

I'm interested to know what else is out there. Although not so interested to have ever searched for it. If you know of anything good, let me know.

pytest-it: a plugin for building BDD test specs

hi@mattduck.com (Matt Duck) — Sun, 21 Jul 2019 00:00:00 +0100

I recently published pytest-it. It's a pytest plugin to decorate tests with markers inspired by Rspec (describe, context, it). These markers are used in the pytest test reporting to display a plaintext spec of the features under test:

from pytest import mark as m

@m.describe("The test function report format")
class TestPytestItExample(object):

    @m.context("When @pytest.mark.it is used")
    @m.it("Displays an '- It: ' block matching the decorator")
    def test_it_decorator(self):
        pass

pytest --it
...

* tests/test_pytest_it.py...

- Describe: The test function report format...
  - ✓ It: Displays a test pass using '- ✓ '
  - ✓ It: Displays a test fail using '- F '
  - ✓ It: Displays a test skip using '- s '
  - ✓ It: Displays the pytest ID for parameterised tests
  - ✓ It: Does not use the docstring in the test name

  - Context: When @pytest.mark.it is used...
    - ✓ It: Displays an '- It: ' block matching the decorator

    - ...when -v is higher than 0...
      - ✓ It: Displays the full module::class::function prefix to the test

  - Context: When @pytest.mark.it is not used...
    - ✓ It: Displays the test function name

    - ...but the test name starts with 'test_it_'...
      - ✓ It: Prettifies the test name into the 'It: ' value

  - Context: When multiple @pytest.mark.it markers are used...
  - ✓ It: Uses the lowest decorator for the 'It : ' value

We use pytest extensively at Ometria. It provides a lot of features (eg. composable fixtures, parameterised tests, assert rewrites) that make it easy and pleasant to write test code in Python. So why is pytest-it useful?

21.1. The problem

Software gets complex, and clear communication is crucial to mitigate this. Tests are a critical tool for reducing unwanted changes in software behaviour, and in some cases they will be the only spec you have for legacy software.

Conventional Python test code (including pytest) has up to three levels of hierarchy:

# Level 1 - module
"""
test_my_module.py
"""

# Level 2 - class
class TestMyClass(object):

    # Level 3 - method
    def test_my_method(self):
        """
        This is a test.
        """
        assert 2 + 2 == 4

Under these constraints, it requires a lot of discipline to communicate clearly to the reader, especially as software evolves and requirements change over time:

Is there clear separation between setup, test and teardown code?
What input situations are tested?
What is the output behaviour under test? Is it logic that should only be changed in coordination with the business, or is it a side-effect?
Are multiple business rules tested by the same test function?
What business logic is not covered by tests?

Despite their importance, tests tend to receive less attention than the main program code (in my experience), with copy/paste approaches being more common, and less care given to structure, naming and documentation.

These are problems that BDD and other testing tools (including pytest-it) try to mitigate.

21.2. Trivial to adopt

There are various methods to help improve test clarity and bring it closer to business logic. For example:

Rewrite the test code using a BDD framework that provides tools for this purpose (eg. Behave, Mamba).
Rewrite the program code to facilitate better test structure.
Write a spec of component behaviours that exist separately to the system and test code.

Decorating an existing pytest codebase with pytest-it is an incremental step. It's cheap to adopt compared to the above solutions, as it can be retroactively applied to a test suite without requiring any other changes - the decorators only provide the spec, they don't alter the behaviour of the test.

This is also a good way to introduce BDD thinking to a Python codebase (or team) without requiring a major change in tooling or workflow.

21.3. More expressive

I often see test names or docstrings that describe the context of the test input (eg. test_new_customer_with_one_order), but not the output. I can see that the test expects the output of a function to be 3, but what does that represent?

The Rspec semantics provide a framework to specifically communicate these concepts, and by using decorators to apply these states to a test, we can display them clearly in the test report. This is significantly more expressive than trying to convey that information in a function name, and the decorators make it easier to parse compared to using a docstring.

21.4. Behaviour at a glance

Even in well-structured test suites, it can be difficult to scan the behaviour under test, because docstrings are scattered throughout the file(s). Pulling this state into a plain-text spec makes it easier to read, think about and modify.

21.5. Org-mode

For those who use org-mode, the spec output can be copied straight into an org buffer to work with. This is the killer feature for me, as it means I can focus on the behaviour of the program: where tests need to be added, deleted or moved, and where program behaviour needs to be clarified with the business.

21.6. Habit forming

Introducing a framework to structure tests can encourage care around test code and communication. The semantics provide a sensible default that can be implemented without requiring a lot of extra tooling or thought. Integrating with pytest also provides an easy way to evaluate the tradeoffs of BDD approaches compared to the classic Python unittest structure.

21.7. Further information

You can try pytest-it by running:

pip install pytest-it
pytest --it

See Github for more information.