A fun problem with fzf-tab-completion and echo

Fzf is great, but I've always thought the bash tab completion wasn't as polished as it could be. I want to be able to hit <TAB> and have fzf pop up to navigate the candidates, but it doesn't hook into bash completion like you'd expect. Instead, there are special implementations of completion for different commands - eg. you might have to call _fzf_setup_completion dir tree to configure an fzf completion that prompts you for directories when you run tree. In addition to requiring extra setup, it also doesn't provide support for ad-hoc flags, so it's pretty limited.

Yesterday I was looking for a better way to do this, and came across lincheney's fzf-tab-completion. This seems to work pretty well, but I had to fix a couple of things to get it working properly.

1. First - macOS support

You can check the github page for instructions on how to use fzf-tab-completion - basically you just source a bash script which defines a function fzf_bash_completion, and then you use bind to run fzf_bash_completion on a particular keypress (eg. <TAB>).

I tried this on macOS and hit a common problem: the GNU and BSD versions of certain utils aren't compatible - they have various differences in their flags/behaviour, and the bash script is assuming the GNU versions of sed and awk. On macOS you can brew install coreutils to get the GNU versions, but they get prefixed with g to avoid breaking anything that assumes the default BSD versions.

There are a few approaches to fixing this. I just opted for the one that required the least work - changing the script to prefer gawk and gsed if they're available.

2. The main event - why doesn't kubectl complete namespaces

After that it seemed to be working:

fzf-tab-completion git checkout completed

I could type git checkout <TAB> and use fzf to complete the branch names, or ls --<TAB> and select a long flag. It even seems I can complete multiple flags at once by adding --multi to FZF_COMPLETION_OPTS.

One place where I wanted to use fzf for completion was kubectl. There are various tools that add fzf features to kubectl or that provide nice replacements for kubectl commands, but I'd find it useful to have fzf completion that "just works" when I'm running ad-hoc kubectl commands.

I tried this for a bit and it did work:

kubectl get pod <TAB> - completes pod names
kubectl get <TAB> - completes resources
kubectl -n foo rollout restart deployment <TAB> - completes deployments in the "foo" namespace

Then I hit a problem running kubectl get pod -n <TAB>. This is supposed to complete namespaces, but it was giving me a list of pods. I double-checked that the standard kubectl bash completion does complete namespaces as expected in this situation, so it looked like something might be wrong with the fzf completion.

3. Debugging

After trying a few variations I realised that the fzf completion would work if I specified the namespace flag as -n=<TAB> instead of -n <TAB>. I knew the bash script contained a decent amount of regex/parsing logic, so wondered if something could be wrong with how it handled flags.

I started decorating the script with debug lines like echo "some_variable: $SOME_VARIABLE" >> /tmp/debug.log, or | tee -a /tmp/debug.log. It turned out there were certain lines where a flag would usually be printed, but if I used -n, it would be missing. For example:

# (some printed output with -n=)
line: k get pod -n=
SHELL SPLIT ----
line: -n=
buffer:
line: pod
buffer:
line: get
buffer:
line: k
buffer:
COMP_WORDS: k get pod -n=
COMP_CWORD: 3

# (some printed output with -n)
line: k get pod -n
SHELL SPLIT ----
line: -n
buffer:
line: pod
buffer:
line: get
buffer:
line: k
buffer:
COMP_WORDS: k get pod  # <-- where did -n go?
COMP_CWORD: 3

I continued to try different inputs and then noticed that -e was also affected. At this point I ran through the whole lowercase alphabet… and -n and -e were the only flags that didn't work.

I wanted to narrow down where the flags were being stripped in the script. The -n and -e flags should have been a clue but I was still thinking about regex at this point, so I started by commenting out some of the regex/sed code, and also calling some of the intermediate functions manually.

Eventually I narrowed it to this function which gets called on each argument:

$ printf '%s\n' '-a' | _fzf_bash_completion_flatten_subshells
>> -a

$ printf '%s\n' '-n' | _fzf_bash_completion_flatten_subshells
>>

From there it was easy to find the offending line. It was this:

echo "$line$buffer"

This looks harmless enough, but $buffer was an empty variable, and $line contained our flag, so this was executing echo "-n" or echo "-e". -n and -e are echo flags: echo essentially interprets this as echo -n "" or echo -e "", and prints an empty line. This is why only these particular flags were affected.

(Had I tried capital letters, I'd have found that -E also gets stripped. I think there can be issues with other input too but I haven't delved into it).

4. The solution: printf

Weirdly, there doesn't seem to be a way to print the literal -n using echo. I'm sure there are various alternative ways to solve this, but I opted for replacing the invocations of echo "$foo" with printf '%s\n' "$foo". printf is a coreutils command similar to C's printf function. It's commonly available and was already used elsewhere in the fzf-tab-completion script, so it was an obvious choice.

After that it worked! I've tried various combinations of flags and everything is completing nicely so far. It's not as instant as using the standard bash completion, but it's going to be a lot nicer for certain commands, and is much more useful for me than the builtin tab completion that comes with fzf.

5. What version of echo was this?

Running which echo or man echo might point to the GNU echo program, but when you run echo in bash then by default you'll be using bash's builtin echo function. There are slight differences between the two, but both are affected by this issue. You can test this by doing enable -n echo to disable bash's builtin echo.

The GNU version has an additional fun (but easier to debug) problem: if I had been using this program and included the flag --help, it would have printed the help text instead of the literal --help value.

6. Why is the GNU echo program/bash function designed like this?

I'm curious about this - I think I had come across this problem before but I didn't have it in mind when I was debugging this issue. It's obviously a gotcha that many people will be aware of, but it seems like bad design from a couple of angles:

If you're unlucky enough to not know about this issue, it will violate expectations - we have a function that prints almost everything except when your string happens to clash with echo's own flags.
It's implicit and doesn't give feedback to the user. The risk isn't documented in the GNU manpage or the bash help text, and there's nothing in the output of the command that hints to the user that this issue may have occurred or why it happened.

One sign that it's easy to forget this issue is that even when I was thinking about this problem and trying to debug the script, I was using echo out of habit, and nearly fell into the exact same problem by doing echo "$variable_that_can_potentially_hold_-n".

I'm sure there are some historical/standards design reasons for it being this way, but I haven't looked into it yet.

7. Lesson - don't use echo to print variables

This is something I'll be conscious to pick up in code reviews and my own scripts going forwards. It seems wise to avoid echo for anything that handles user input, and particularly anything related to parsing command-line args.

fzf-tab-completion is available on github. Hopefully these fixes (or similar) can be made available for everybody upstream at some point, but if not you should be able to find them via that page.

2021-May-08