Undo/redo implementations in text editors

I wanted to add undo/redo support to my version of the kilo text editor. I wrote some separate notes on that here. As part of that work I had a look at how undo features are implemented in Nano, Emacs and Neovim.

1. Common patterns for undo

I think undo/redo can be considered a solved problem. It has been implemented many times in different types of software, and there are a few standard solutions that people go to.

One solution is the command pattern. The idea is that for each action in your program, you can execute a corresponding undo action. For example, for an action like "insert these 5 characters at cursor position xy", the undo action might be "delete 5 characters at cursor position xy". You store these commands on respective undo and redo stacks. When the user wants to undo an action, you perform the undo operation and pop the original action onto the redo stack.

The most written-about alternative (in my experience reading online) is often named the memento pattern. The key difference is that rather than storing an undo command that can mutate state, you store the past state itself (eg. the state of the rows). Your undo operation can then just restore the previous state.

Both these patterns are described in the GoF Design Patterns book - I suspect this has popularised the terminology that I see online. There are many ways you can implement undo features that roughly follow the described patterns, but don't strictly adhere to the class structure or functions in the book.

Broadly though we have two approaches - one that stores actions that can be reversed later, and one that stores state that can be restored.

2. How do the classic text editors do it?

These programs have been undoing for decades, so they must be doing something right.

3. Nano

Nano clearly follows the reverse-an-action approach. nano.h describes an enum of supported undo actions, named undo_type. Actions have labels like INSERT, JOIN, and COMMENT - these are specific, granular actions that have dedicated undo operations.

A history item is represented by the undostruct type, which stores the action and some associated data (this includes a *next pointer to go further back in time):

typedef struct undostruct {
	undo_type type;
		/* The operation type that this undo item is for. */
	int xflags;
		/* Some flag data to mark certain corner cases. */
	ssize_t lineno;
		/* The line number where the operation began or ended. */
	size_t begin;
		/* The x position where the operation began or ended. */
	char *strdata;
		/* String data to help restore the affected line. */
	size_t wassize;
		/* The file size before the action. */
	size_t newsize;
		/* The file size after the action. */
	groupstruct *grouping;
		/* Undo info specific to groups of lines. */
	linestruct *cutbuffer;
		/* A copy of the cutbuffer. */
	ssize_t mark_begin_lineno;
		/* Mostly the line number of the current line; sometimes something else. */
	size_t mark_begin_x;
		/* The x position corresponding to the above line number. */
	struct undostruct *next;
		/* A pointer to the undo item of the preceding action. */
} undostruct;

Throughout the codebase, there are calls to a function add_undo(), which accepts an undo_type, creates an undostruct with the appropriate metadata, and sets it to a field named undotop, which represents the top of the undo history.

The do_undo() function then executes appropriate logic in a switch to reverse an action. Here's a snippet of the main switch statement in do_undo() (from text.c):

	case INDENT:
		handle_indent_action(u, TRUE, TRUE);
		undidmsg = _("indent");
		break;
	case UNINDENT:
		handle_indent_action(u, TRUE, FALSE);
		undidmsg = _("unindent");
		break;
#ifdef ENABLE_COMMENT
	case COMMENT:
		handle_comment_action(u, TRUE, TRUE);
		undidmsg = _("comment");
		break;
	case UNCOMMENT:
		handle_comment_action(u, TRUE, FALSE);
		undidmsg = _("uncomment");
		break;
#endif

It's pretty easy to read the undo code in Nano - most of it can be found in the linked files.

4. Emacs

Emacs also has an action-oriented implementation, although it's a much bigger codebase.

The Emacs source code consists of a smaller C core which implements some central components and the emacs-lisp interpreter. Most of the code is then written in emacs-lisp. Undo features span both of these layers. We start at undo.c.

Fun fact alert: using git blame, parts of undo.c go back to a version labelled "Initial revision" from April 1991 (29 years ago! According to Wikipedia the first GNU Emacs release was 1985, so I'm not sure the significance of this exact commit).

In undo.c, we can see a structure named undo_list, which stores the undo history for a buffer. Like with Nano, multiple action types are supported. One difference to the Nano implementation is that undo_list can store a nil value as a "boundary". The boundary is used to group multiple actions into a single undo operation.

In the elisp layer, undo_list is represented by the buffer-undo-list variable. If you're in Emacs and you call (describe-variable 'buffer-undo-list) you'll see an excellent overview of the types of data in this file, and you can also see its current state. Here's a snippet of buffer-undo-list for the Emacs buffer that I'm using to write this post:

(nil
 (4720 . 4721)  ;; insertion at point
 4719  ;; cursor movement ?
 nil  ;; boundary
 (4697 . 4702)  ;; insertion at point
 (#("w" 0 1  ;; ?? this one isn't obvious. Maybe a deletion or property change.
    (fontified t))
  . -4697)
 (#<marker at 4721 in undo-redo.org> . -1)  ;; changes to a "marker" point in the file
 (#<marker at 4697 in undo-redo.org> . -1)
 (#<marker at 4697 in undo-redo.org> . -1)
 ...)

My history continues for about 2000 lines just in this buffer. There is a global undo limit (defined as 80,000 for me in the C source). According to the undo-limit documentation, old undo history gets cleaned up in garbage collection.

undo.c describes some functions that add state to the undo_list. For example, record_point() ("point" means cursor position), record_insert(), record_delete(), etc. These record_$thing() functions are called in a few places in the C code to store actions to later undo (for example in insdel.c). I'm sure there are some similar functions in the elisp layer too, but I haven't looked in depth.

The actual execution of the undo actions also happens in the elisp layer, in (primitive-undo). The comments make it reasonably clear to understand. For example, in order to undo a text insert, it does (delete-region beg end). The snippet below is from the equivalent of Nano's switch statement:

---
;; Element (nil PROP VAL BEG . END) is property change.
(`(nil . ,(or `(,prop ,val ,beg . ,end) pcase--dontcare))
 (when (or (> (point-min) beg) (< (point-max) end))
   (error "Changes to be undone are outside visible portion of buffer"))
 (put-text-property beg end prop val))
...
;; Element (BEG . END) means range was inserted.
(`(,(and beg (pred integerp)) . ,(and end (pred integerp)))
 (when (or (> (point-min) beg) (< (point-max) end))
   (error "Changes to be undone are outside visible portion of buffer"))
 ;; Set point first thing, so that undoing this undo
 ;; does not send point back to where it is now.
 (goto-char beg)
 (delete-region beg end))
...
;; Element (apply FUN . ARGS) means call FUN to undo.
...

You can see that one of the action types is an arbitrary function that gets applied when executing the undo. I haven't looked at exactly where this is used, but I assume that one of the use-cases is so third-party extensions can integrate with the undo feature.

I'm sure there is a lot of undo-related code through the codebase that I haven't mentioned here, especially in the lisp parts. I think these cover the basics though.

5. Neovim

Neovim's implementation is the most daunting to jump into - its undo.c comes in at 3000 lines. It has similarities to the other implementations, but stores and restores some general state fields, with less explicit differentiation between types of user actions.

The undo functions and data structures are prefixed with u_. The first main struct (defined in undo_defs.h) is u_header, which holds pointers to manage the undo history, and also holds some buffer state like cursor position.

u_header also stores the time of the undo action. This is used to support the handy time-based command :earlier, which lets you revert back to previous state by time (eg. :earlier 2h will undo back to the state recorded 2 hours ago). As far as I'm aware neither Nano nor Emacs support this.

The second important struct is u_entry_T, which stores an array of text lines in **ue_array (which is used to restore buffer text), and some other metadata. I've copied u_entry_T below:

typedef struct u_entry u_entry_T;
struct u_entry {
  u_entry_T   *ue_next;         /* pointer to next entry in list */
  linenr_T ue_top;              /* number of line above undo block */
  linenr_T ue_bot;              /* number of line below undo block */
  linenr_T ue_lcount;           /* linecount when u_save called */
  char_u      **ue_array;       /* array of lines in undo block */
  long ue_size;                 /* number of lines in ue_array */
#ifdef U_DEBUG
  int ue_magic;                 /* magic number to check allocation */
#endif
};

undo.c defines a few functions to record an undo, such as u_save(), u_save_cursor() and u_savedel(), although most of the logic is delegated to u_savecommon(), which generates new u_entry_T and u_header instances where appropriate. These functions are then called in various places in the codebase where the buffer is modified, such as ops.c, edit.c, normal.c and ex_cmds.c.

There are a couple of entry points to perform an undo operation. The handler for pressing u in normal mode is nv_undo(). This calls a chain of functions: nv_kundo() -> u_undo() (back in undo.c) -> u_doit() -> u_undoredo().

u_undoredo() is where the main undo logic is applied. Some of the more significant lines are the calls to ml_delete(), ml_replace() and ml_append(), which delete or update text lines (all defined in memline.c). The text state is retrieved from u_entry_T->ue_array. For example:

/* insert the lines in u_array between top and bot */
if (newsize) {
  for (lnum = top, i = 0; i < newsize; ++i, ++lnum) {
    /*
     * If the file is empty, there is an empty line 1 that we
     * should get rid of, by replacing it with the new line
     */
    if (empty_buffer && lnum == 0) {
      ml_replace((linenr_T)1, uep->ue_array[i], true);
    } else {
      ml_append(lnum, uep->ue_array[i], (colnr_T)0, false);
    }
    xfree(uep->ue_array[i]);
  }
  xfree((char_u *)uep->ue_array);
 }

Reading u_undoredo(), it's clear that Vim's implementation is more generalised than in Nano and Emacs, which both have clear switch-like statements that say "if you find item x on the undo history, apply operation y". In Nano different actions are represented by an enum, and in Emacs different actions have different lisp structures.

In Vim, regardless of the type of user action performed, general buffer state is stored and restored using u_header_T, u_entry_T and visualinfo_T. The data they contain looks quite general - I don't see anything action-specific here, like the JOIN type in Nano, or the custom function that you can store in Emacs. Emacs' items are comparatively simple as they only record state that is needed - eg. an insertion can just be represented in the undo history by (BEG . END).

It looks like there is a lot more to uncover in this implementation, but hopefully that describes some of the basic points. Let's look at something different.

6. A reusable solution that copies state: Redux

I'm not super fluent in modern JS frameworks, but I knew that Redux was based around reducers that operate on immutable state, and that there were generalised undo solutions that could implement undo/redo by keeping a copy of that state.

The Redux documentation actually describes an implementation for using Redux to implement undo by restoring past state. Coding this from scratch isn't required though as there is a popular implementation: Redux Undo.

Redux Undo is a generalised implementation of the restore-past-state approach: you only need to decorate specific reducers as undoable, and it will store all the state changes for you.

The advantage here is that you can add undo/redo behaviour to components without having to do any specific work to record or reverse particular operations: the actions are the same for any use-case as you're just copying and restoring your state.

At its most basic, you can start recording your state history in a few lines of code (There are more features and various configuration documented in the README):

// context: 'counter' is a reducer that increments or decrements a number.

import { combineReducers } from 'redux'; // Redux util functions.
import undoable from 'redux-undo'; // the undoable higher-order reducer.
combineReducers({
  counter: undoable(counter, {
    limit: 10 // set a limit for the size of the history.
  })
})

// See https://github.com/omnidan/redux-undo for more details.

7. What's the catch?

This sounds exciting - it looks like much less work to implement than even Nano's undo feature. The ease of implementation doesn't come for free though. The main cost I see is that state copies will need to be stored in memory. Whereas with a command-based approach you could have said "delete the characters at position xy to reverse the user's insert action", you're now storing the old state in order to restore it. This means you need to be conscious that you're not copying extra state when it hasn't changed, and that you have sensible limits on your undo history.

I'm not sure how often this is an issue with Redux Undo in practice. I did read this redux-undo issue discussion about using diffs between states to reduce the memory overhead. It led me to a couple of small projects that suggest it has been a concern at least for some people:

Undox is like Redux Undo but uses actions instead of storing state, to reduce the memory cost.
Redux Undo Stack is like Redux Undo but stores incremental changes instead of entire states.

I doubt that a third-party solution like this would work for a performance-conscious text editor, but it's definitely a valuable tool.

8. Conclusions

I'm not sure we learned anything profound here, but it's cool to see a few different approaches to a common software feature.

I'd like to look at the Vim implementation more as I've only scratched the surface.

I think it's worth reading through parts of the Nano/Vim/Emacs codebases. They're three ubiquitous text-editing programs that have been around for ages and are all written in C (at least partially), but that have some major differences. Nano is the easiest place to start unless you're particular enamoured with one of the others, because it's smaller and you don't need to learn about editor-specific concepts to follow it.

2020-Jan-28