Detecting dubious shadowing in Go

If you hadn’t already heard, we’re porting the TypeScript compiler to Go. This is a certified Big Deal, no small feat. Language choice is one of those contentious things, and I’m not going to go into great detail about it here, but one factor in the language choice is that the ported code is very, very close to the original TypeScript code.

But obviously Go isn’t TypeScript. There are whole classes of bugs that can happen in Go that won’t happen in TypeScript (and vice versa). If you aren’t careful, a direct translation could behave differently, or you could add all new bugs.

There was one specific kind of bug that came up the most as more and more code was ported: unintentional shadowing.

Enter the Shadow Realm

Can you spot the bug?

func (c *Checker) getUnresolvedSymbolForEntityName(name *ast.Node) *ast.Symbol {
    // ...
    result := c.unresolvedSymbols[path]
    if result == nil {
        result := c.newSymbol(ast.SymbolFlagsTypeAlias, text)
        c.unresolvedSymbols[path] = result
        result.Parent = parentSymbol
        c.declaredTypeLinks.Get(result).declaredType = c.unresolvedType
    }
    return result
}

The intent here was to return the new symbol, but Go’s := operator creates a new variable within the scope of the if statement’s block, so this function always returns nil. := should have been =. One character, very hard to spot.

Seasoned Go devs are probably screaming right now. “NoOoOoOo why aren’t you using an early return here??” And they’re right! A more idiomatic translation would be:

func (c *Checker) getUnresolvedSymbolForEntityName(name *ast.Node) *ast.Symbol {
    // ...
    if result := c.unresolvedSymbols[path]; result != nil {
        return result
    }
    result := c.newSymbol(ast.SymbolFlagsTypeAlias, text)
    c.unresolvedSymbols[path] = result
    result.Parent = parentSymbol
    c.declaredTypeLinks.Get(result).declaredType = c.unresolvedType
    return result
}

But remember, this is a port. We’re not really trying to change the style of the code all of the time. In fact, the code may have been partially autogenerated and then copy/pasted. Or just split-screened with the original code and typed out (i.e., error-prone).

If you look at the original TypeScript code, you’ll see what we’re trying to emulate:

function getUnresolvedSymbolForEntityName(
    name: EntityNameOrEntityNameExpression,
) {
    // ...
    let result = unresolvedSymbols.get(path);
    if (!result) {
        unresolvedSymbols.set(
            path,
            result = createSymbol(SymbolFlags.TypeAlias, text),
        );
        result.parent = parentSymbol;
        result.links.declaredType = unresolvedType;
    }
    return result;
}

Should this have been an early return? In my opinion? Yes, absolutely.¹ Regardless of the language. But, I didn’t write this code, it just is what it is.

This bug kept happening over and over during the port; multiple people on the team complained about it. It’s not hard to see why; it becomes muscle memory to type :=, and then it’s just one character away from = so not too easy to notice (yourself, or in review). The above example is one of the easier ones to see visually, but we have other examples too. For example, this kind of code is all over our type relations code:

switch {
// ...
case source.flags&TypeFlagsIndexedAccess != 0:
    result = r.isRelatedTo(source.objectType, target.objectType)
    if result != TernaryFalse {
        result &= r.isRelatedTo(source.indexType, target.indexType)
        if result != TernaryFalse {
            return result
        }
    }
// ...
}

You can imagine maybe that assignment should have been a :=.² Or maybe something else is wrong? Hard to say.

A little `go/analysis` goes a long way

How do we avoid these problems?

As a compiler dev, I only have one solution to every problem: static analysis.

Go has a great static analysis framework called go/analysis, which is itself built upon Go’s built-in AST and type checking packages (go/ast, go/types). With it, we can create our own analyzers to run over our code and find errors. For convenience, those analyzers can then be compiled into golangci-lint to be run with the rest of the linters.

An Analyzer looks like this:

var myAnalyzer = &analysis.Analyzer{
    Name: "myAnalyzer",
    Run: func(pass *analysis.Pass) (any, error) {
        // ...
        pass.Reportf(node.Pos(), "%s is doing a bad thing!", node.Name)
        // ...
    },
}

Declare an Analyzer, then the Run function is given all of the information about the code being analyzed, report errors (even fixes and related positions, LSP-style), even return analysis results to be consumed by other passes.

The Go team has actually already created a shadow analyzer that looks for mistakenly shadowed variables. How does it work? Consider the following code:

func f() int {
    value := 1
    println(value)
    if condition {
        value := 2
        // ...
        println(value)
    }
    return value
}

This code probably has a shadowing bug. The shadow pass determines this by checking every variable to see if there’s another variable it could shadow (same name in a parent scope, with the same type), then checks to see if that potentially shadowed variable is used “after” the inner declaration, where “after” is a position check. This avoids false positives like when we don’t use the outer value later:

func f() int {
    value := 1
    println(value)
    if condition {
        value := 2
        // ...
        println(value)
    }
    // no use of "value" here!
    // ...
    return someOtherValue
}

This method is pretty good and does catch many bugs, but it’s not perfect. Take for example:

func f() int {
    value := 1
    if condition {
        value := 2
        // ...
        println(value)
        return value + 1234
    }
    return value
}

Is this a shadowing bug? I’d say it isn’t. Nothing we do in the inner scope is observable in the outer scope since we’re returning early. In different words, no use of the outer scope’s value is reachable from the shadowing in the inner scope.

Going with the (control) flow

Yes, I said it, the magic word, reachable. The thing we want to be checking is whether any assignment to the inner declaration could reach a use of the outer declaration. This is all just a dataflow analysis question in disguise.³

The existing shadow pass approximates reachability using source positions. This is fast and easy for sure; just look up the scope chain, find what you might shadow, then check if any use is “after” the shadowing.

But this had enough false positives that I wasn’t comfortable enabling it.⁴ I figured there must be a way to use a proper control-flow graph (CFG) to figure this out.

Fortunately, control-flow graphs are a well established concept and the Go tooling has ways to get them. The predominant way to do this is to use the go/ssa package, which builds a static single assignment (SSA) representation of the code. I opted not to use it, however. The SSA representation is pretty fine-grained with its own quirks and (in my opinion) better suited for more complicated analyses.⁵

Instead, I used go/cfg, which builds simple control flow graphs out of the AST.

Asking for this info is pretty straightforward; just declare that your analyzer needs it, and then grab its result.

var shadowAnalyzer = &analysis.Analyzer{
    // ...
    Requires: []*analysis.Analyzer{ctrlflow.Analyzer},
    Run: func(pass *analysis.Pass) (any, error) {
        // ...
        cfgs := pass.ResultOf[ctrlflow.Analyzer].(*ctrlflow.CFGs)
        cfg := cfgs.FuncDecl(node)
        // ...
    },
}

What does the CFG look like for our previous example? For reference, the code was:

func f() int {
    value := 1
    if condition {
        value := 2
        // ...
        println(value)
        return value + 1234
    }
    return value
}

This produces a CFG that looks like:

Expand to see the above in textual form

.0: # Body@L5
        value := 1
        condition
        succs: 1 2

.1: # IfThen@L7
        value := 2
        println(value)
        return value + 1234

.2: # IfDone@L7
        return value

This is exactly what we need! If we start at the inner declaration of value, we can see that it doesn’t flow into any use of the outer declaration, so this code is safe.

Let’s try something a little more complicated:

func f() int {
    value := 1
    for i := 0; i < N; i++ {
        value := 2
        // ...
        if i%2 == 0 {
            println("continue!")
            continue
        }
        println(value)
        return value + 1234
    }
    return value
}

Is there a shadowing bug here? Let’s consult the CFG:

Expand to see the above in textual form

.0: # Body@L5
        value := 1
        i := 0
        succs: 3

.1: # ForBody@L7
        value := 2
        i%2 == 0
        succs: 5 6

.2: # ForDone@L7
        return value

.3: # ForLoop@L7
        i < N
        succs: 1 2

.4: # ForPost@L7
        i++
        succs: 3

.5: # IfThen@L10
        println("continue!")
        succs: 4

.6: # IfDone@L10
        println(value)
        return value + 1234

If we follow the control flow from the node containing the inner declaration, we can see that there is a path to return value in the outer scope. If := was meant to be =, the behavior would change; the function would return 2 instead of 1. So, we say this is a potential shadowing bug, and so that inner variable would be best renamed.

This CFG-based approach also works nicely for other constructs as well; we can follow the inner declaration all the way through if statements, loops, switches, everything; no special casing required.

The only interesting case is function literals. Is there a shadowing bug in this code?

func f() int {
    value := 1

    callIt(func() {
        value := 2
        println(value)
    })

    return value
}

I’d say “maybe”, since swapping := to = could change the behavior of the code. The CFG wouldn’t exactly show this, since the two functions have different CFGs entirely. How we choose to relate them is pure choice! It’s actually a pretty similar problem to the infamous TypeScript issue 9998; do we consider this code to have executed immediately? Never? Potentially at any time? I chose to ignore the problem and simply disallow shadowing across function boundaries, which thankfully didn’t have too high of a false-positive rate to feel bad.

Enabling the analyzer

With all of this in place, I created a lint rule that we could add to our customlint plugin for golangci-lint, which you can see in typescript-go PR #365. Surprisingly, this didn’t negatively affect lint time. Most of the time, there isn’t shadowing, ast/inspector avoids the need to check a lot of the AST⁶, and there are some simple checks that we can do on top of that.

The PR only fixed one bug, which is pretty good, but less than I was hoping for; it turns out that people on the team had already been wasting time debugging to find these issues, so they were all pretty much fixed already. Thankfully, the rule seems to be very reliable, and no new shadowing bugs have appeared.

Anyway, hopefully that was an interesting look at one source of potential bugs in Go (especially for ported code), and how static analysis can help. There are of course other sources of bugs in Go, and I’m planning to write about how I was above to catch those too. Stay tuned for that!

I wouldn’t have used an assignment expression either. Expressions shouldn’t assign things! You cannot convince me otherwise. My brain just isn’t wired for looking for side effects in the middle of something else. Sorry (not sorry). ↩︎
Or maybe you can imagine this code should have been written with early returns too. I digress. ↩︎
There’s probably some proper way to describe this in terms of “dominators”, “dominance frontiers”, a “dominator tree”, something like that. But if I do the math, it’s been almost 10 years since I took a compiler optimization course, and I can’t say I understood the terminology back then either. So let’s just stick with “reachability”. ↩︎
I think the Go team would agree; the docs say: “[this analyzer] generates too many false positives and is not yet enabled by default”. ↩︎
I found that it modified the structure of the code enough to make it difficult to work with for this particular use, doing things like constant propagation and simplification. This is definitely great for other analyzers, but it seemed a little challenging to use in my case. It’s a shame; go/analysis allows passes to share results, so using go/ssa would have effectively been “free” via one of the other lint rules we run. Thankfully, go/cfg is plenty fast and it doesn’t seem to have much of an impact on our lint time. ↩︎
This deserves its own post, honestly. In short, the ast/inspector package (along with the inspector pass) amortizes the cost of AST walks looking for specific node kinds by walking all ASTs once, keeping track of which nodes are present in 32-bit bitmasks, which works since Go has fewer than 32 node types. Then you can repeatedly ask for certain nodes, and “is this one of the nodes I need” is just a bitwise OR. I briefly thought “oh man, can we do this in TypeScript?”, but then I realized that our AST has 349 node kinds, way more than could fit in a single integer. Drat. It turns out that simple languages are in fact faster to analyze (shocker). ↩︎

Enter the Shadow Realm#

A little go/analysis goes a long way#

Going with the (control) flow#

Enabling the analyzer#

Enter the Shadow Realm

A little `go/analysis` goes a long way

Going with the (control) flow

Enabling the analyzer