Avoiding Reflection (And Such) In Go

2015-12-03 (Last Modified: 2021-05-11)

So, as previous posts show, I like Go well enough. But as a computer-language polyglot, "Go programmer" is not part of my identity, and I try for a balanced view. It is very human to go overboard in both praise and criticism, both easy to find online.

I open with this because this post will take a, ahh, let's say nuanced position on Go, in that it is going to agree a bit with both sides. Go's type system is weak and there are cases where you can only accomplish something via interface{}, reflection, copy/paste, or code generation... but in a rush to talk about what Go doesn't have, what it does have is too often neglected. When using a tool to solve problems what matters is whether there exists a good solution with that tool, not whether a direct port of a good solution from some other tool works. There certainly are real problems that lack good solutions in Go, but that set of problems is smaller than is sometimes supposed.

Some languages provide many power tools, and it is easy to use a bit of one and a bit of another without ever using any individual tool very deeply. Go provides you only a few tools, so you should use each fully.

If your Go code is a horrible copy/paste disaster, you must figure out whether it is because you are not using tools fully, or if you are missing tools. If you are missing tools, use another tool set. I do not consider that any sort of "concession", because I'm not too interested in whether you use Go; I'm interested in showing people how to fully use tools. All of these techniques will work in other languages as well, with varying degrees of convenience.

The Ground Rules

Let me set the ground rules for what I'm trying to accomplish here.

I'm not out to completely avoid interface{}, reflection, copy/paste, or code generation in every way. In particular, I really don't care if a library wraps some code that uses any of them, but offers a type-safe public API.

It's like a Haskell library that uses unsafePerformIO internally but exposes a pure interface, or a Rust library that judiciously uses unsafe code. I know that in theory you can pass something via interface{} in that causes a runtime failure, but for that to happen, you'd have to be using interface{}-typed values carelessly enough that your code can be convinced to make a mistake. If your code solely uses concrete types, this statically can not happen. This code is still de facto statically-typed even if the compiler can't see it.

But, nevertheless, it is weaker than a language that cleanly has a type that expresses exactly what is and is not JSON-encodable. I don't deny this. I like Haskell. I like type families and the full power of lens and pipes and all that stuff. I acknowledge that these issues can compound in sufficiently large programs, but, at the same time, I know I can control a lot of these issues and that many programs will never have any units that reach that "sufficiently large" size.

So, with that in mind, what have I got up my sleeve?

Little Types

As a prerequisite for much of the following, remember how easy it is to create new types.

type MyID uint32

Done.

Even in not-Go, I like little types a lot. I think any future language that makes it much harder than that to create a new type ought to rethink things. I use a lot of little types just to prevent mixing different things together accidentally, or to prevent function signatures like

func Operate(int, int, int, int)

when I really ought to be able to say what's an ID, a width, a height, etc., right there in the the type, which is the most important part of the function's documentation:

func Set(ID, Width, Height, Color)

In fact, at least in the code I write, string specifically means "I know nothing more about this than that it is a series of bytes", and only shows up when interacting with things that require exactly that semantic.

But even ignoring those things, I find they almost invariably grow multiple useful methods in under five minutes.

Many languages have enough tools that you can get away with flinging ints and strings about. I still think it's a bad idea, but it "works". In Go you really can't afford to leave this tool unused.

Composition/Decoration As A Default Must Be Learned

If you've clocked a lot of OO time, you naturally design OO solutions in terms of inheritance. Most languages syntactically privilege inheritance and make you work for composition. Go syntactically privileges composition and makes you work for inheritance.

Even with my experience in functional languages, it took me a couple of weeks of bashing against this to tune my brain to think in terms of OO!composition instead of OO!inheritance. You know you're encountering this problem when you find yourself trying to factor out some chunk of code into a small object that then has to manually be passed a pointer to the "object" containing it.

It's worth working through this process though, because the experience of the OO community in general has turned to favoring composition. Particularly remarkable in light of the fact most languages are still OO!inheritance, which means this had to be discovered against the grain of most existing languages of the past 30 years.

Suppose you have many different types of things that need unique IDs. Suppose this ID yields a serialization of the ID that is the type of the thing with the ID number, i.e., "employee/23". The natural inheritance-based OO formulation is to create either a base class or some sort of mixin of "things that have an ID", and then inherit from that class or mixin to get IDs. A naive first stab in Go could look like this:

type ID struct {
    ID uint64
}

type ThingWithID struct {
    Field1 string // some ThingWithID-local field
    ID            // "subclass" ID, right?
}

func (id *ID) GetID1() string {
    // no, wait, this doesn't work, ID doesn't get it's
    // "parent" value...
}

The function signature for GetID1 is exact; it receives the *ID, precisely that, with no trace of the calling ThingWithID. No inheritance here; the type signature is WYSIWYG. So you'll naturally try a couple of other things...

func (id *ID) GetID2(thing *ThingWithID) string {
    // This doesn't work because we need to be able 
    // to work with more than just the ThingWithID, meaning
    // the ID struct would need one method per struct
    // it ends up in. This won't work at all if you want to
    // inherit from a "class" in another package,
    // because that's a guaranteed circular dependency.
}

// So, maybe declare an interface for getting the type?
type Typeable interface {
    GetType() string
}

func (twi *ThingWithID) GetType() string {
    return "thing"
}

func (id *ID) GetID3(t Typeable) {
    return fmt.Sprintf("%s/%d", t.GetType(), id.ID)
}

// which then has to be used like:

    id := NewID()
    thing := &ThingWithID{"my value", id}

    // want the ID? what is this weird stuttering with the
    // "thing" value?
    fmt.Println("ID is:", thing.GetID3(thing))

An object call that has to pass itself as a parameter? That there is some weird stuff. Plus there's no guarantee that something embedding ID conforms to the Typeable interface.

The composition solution:

type ID struct {
    value uint64
    type string // previous ID did not have this
}

func (i *ID) GetID() string {
    return fmt.Sprintf("%s/%d", i.type, i.value)
}

type ThingWithID struct {
    Field1 string
    ID
}

func NewID(s string) ID {
    nextValue := somethingToGetNextUint64Value()
    return ID{nextValue, s}
}

func NewThingWithID(s string) ThingWithID {
    return ThingWithID{s, NewID("thing")}
}

This makes it so that anything that embeds an ID has the GetID() call.

Note the critical thing is that the NewThingWithID constructor is responsible for initializing the ID with everything it needs to know, because once the value is constructed, it needs to be able to do its job without the "base" object. However, because the ID struct is composed in, it turns out that even if you were separating out the ID concept during a refactoring of ThingWithID, you wouldn't even need to rewrite any place you referenced twi.type; that's still syntactically valid. The composing object still "has", in the full sense of the term, all the composed fields. (Subject to the Go visibility rules.)

To get closer yet to a "subclass", you can declare:

type HasID interface {
    GetID() string
}

and now you can refer to anything that composes in an ID as a HasID. Note the composing object gets this by doing the easiest thing to the ID that takes the fewest tokens; if you compose an ID in, no additional work is required by that struct to conform to HasID. Code that manipulates objects entirely in terms of interfaces can sort of "drop privs" on their arguments, enabling a rough & ready parametric polymorphism.

Composition also makes it easy to do things like pass environments around. In a nested series of composed structs, only the "bottom" one needs to compose in an environment. All other levels of the composed object automatically receive the full environment as local methods. Dependency injection often requires a lot of code in OO!inheritance, but despite not having it as an official "feature", in OO!composition it often has a relatively small code footprint, because the objects get the methods of the composed object automatically added. Dependency injection in OO!composition is so easy it's hardly even a "pattern", it's just a thing that happens.

Finally, I find that when natively thinking in OO!composition, what are monolithic objects in OO!inheritance naturally get sharded into multiple small objects. Many classes have natural cleave planes, where inheritance is forcing together things that ought to be separate. And I find even when I don't expect to reuse a smaller chunk, I'm often doing so five minutes later anyhow. There's a reason the OO community has been moving in the direction of composition, and this is one of those places where in can genuinely be a pleasure to work in Go, because it syntactically supports this operation.

As I said before, you hear so much about what Go doesn't have that you don't often hear about what it does, and when you do hear about Go features, it's often all about the channels and interfaces. After using Go for a while, I'd submit that supporting OO!composition is every bit as important as the headline features, and a legitimate thing that Go has that few other languages do. Few languages make it this syntactically transparent. Because composition doesn't intermingle objects the way inheritance does, it allows much more interesting "chains" of objects to be created that extend each other, without the complexity exploding.

(Those with a hankering to write a new, popular language, take note that this space is still surprisingly poorly explored by current languages!)

The Bits of Generics Go Does Have

"Generics" are a large concept, which includes at least "generic data structures" and "generic algorithms". Go lacks the former, but interfaces are the latter. So, "Go doesn't have generics" is only half true.

Both the good and the bad can be seen in the standard library sort module, which provides a generic sorting algorithm to types that implement an interface that provides the operations the sort algorithm needs.

It is good that the sort routine can function through an interface. It can be profitably argued that the primary virtue of "generics" is generic algorithms rather than generic types. C++, for instance, is very big on the generic algorithms despite not having generic types at runtime.

It is bad that there's no way to implement the sort interface short of direct implementation of the relevant methods. There is no way at the Go code level to factor anything out. It's one thing to have to specify how to sort your custom tree-based data structure, one hardly expects a useful default implementation out of their imperative language for that, but having to explain how to sort arrays of this particular type of integer gets old.

I am very serious about that. It is dangerous, because cognitive energy burned on writing the repetitive code to "sort these integers" means it's that much more tempting to do something easy rather than correct. For instance, you might choose to stick to a slice of int so you can use the default IntSlice sorting, giving up the advantages of tiny types. That's bad. I take cognitive energy expenditure very seriously.

However, I find that being able to copy/paste the implementation of a generic interface is the exception rather than the rule. Generally when you encounter the need for a generic or templated (in the design patterns sense) algorithm, the implementation is less trivial and repetitive, since it's often on complicated objects rather than simple ints. So, if you see:

type MyData struct {
    payload interface{}
    ...
}

and you've got some method like

func (md *MyData) Output(w io.Writer) string {
    switch p := md.payload.(type) {
        case A:
            // extract some data
            w.Write(somedata)
        case B:
            // extract something else
            w.Write(someotherdata)
        // and so on
    }
}

The act of "extracting data" will often be quite specific and non-repetitive to each type.

You should consider whether you can abstract out the act of "extracting data":

type DataWriter interface {
    WriteData(io.Writer) error
}

and rewrite your original MyData type to:

type MyData struct {
    payload DataWriter
}

As you find other operations your MyData wrapper may need to do, add them to the interface.

As will be a recurring theme in this post, this isn't new, and is good OO engineering anyhow: This is Replace Conditionals With Polymorphism. Most (but not all) case statements switching on the internal type of an interface{} should be replaced with this.

All the type-switches I still have in my code are switching on types internal to the package, often used for internal protocol types.

(As a side note, if Go ever does grow full generics, I expect it to do so by extending the half it has, rather than adding a new concept of generics on the side. It's easy to imagine a generic BalancedBinaryTree that requires its nodes to conform to a LessThan interface. I also submit that the fact that full generics could conceivably be obtained by extending what exists is further evidence that Go does have half of generics. If it had nothing, there would be no features to overlap.)

You Don't Need Permission To Use Class Methods

I have a generic "game-playing programming contest" server that I wrote. It has an "engine" that implements routines for user and team management, real-time scoring, win/lose tracking, network communication, scheduling who is playing whom, tracking how long programs take to move, and a bunch of other generic functionality not specific to a given game. It then allows you to "plug in" a game to play, which need only contain the game logic and use highly abstracted message passing to the clients.

The Engine creates millions of Game objects in the contest, so that the Engine can run the .Play() method of the Game. To ensure deterministic playback is possible by simply logging the moves each player makes, each game object type has a "seed" type that stores the full initial state of the game. This Game is of course behind an interface since the entire point of the Engine is to avoid being tied to one concrete implementation.

Someone who has spent too much time staring at the reflection module, and perhaps has heard too much about how bad Go's type system is, might be tempted to create a new game by reflecting in to the type object for the game, then reflecting in to get the seed type, then reflecting in to the seed type generically to figure out some generic way of configuring a new random seed (perhaps using the struct tags to specify things, because that seems really cool), probably with some recursive reflection if the seed or game itself has further structure, then feeding that into a new game object obtained via reflection before finally reflecting in on the .Play method to finally run the result of this mess of reflection that you probably couldn't even follow the description of without your eyes glazing over.

After this exhausting code, programmer will probably join in on being angry at Go.

Alternatively, if you need to be able to create new instances of a given type, just... say so.

type Game interface {
    // returns a new game, generates a new "seed"
    // automatically, because a Game is what knows
    // how to do that.
    NewGame() Game

    // returns the Seed the game used; must round-trip
    // through encoding/json
    Seed() interface{}
}

Then, the call to create my GameServer looks something like:

func NewGameServer(game Game) *GameServer {
    return &GameServer{game: game}
}

and usage looks like

    gameserver := NewGameServer(&ConcreteGame(nil))

where ConcreteGame is the type that actually implements the particular game in question (Tic-Tac-Toe vs. Chess, let's say). If all the methods are implemented on the pointer, which they probably are in this case since the methods will modify the Game, we don't even need an object, just a correctly-typed nil. (Which is perfectly legal to call methods on.)

This is, again, not new: It's a "class method". Go doesn't have "class methods"... except that of course it does, all OO languages have "class methods". A "class method" is merely a method that doesn't refer to the object's instance data. Some languages provided syntactic support for not needing an instance in hand to run class methods, but this is not required, merely convenient. And not really that much more convenient than a correctly-typed nil.

(Pardon the mismatch between "struct" and "class" here; "class methods" are an established terminology, whereas nobody's ever heard of a "struct type method".)

Similarly, there's nothing wrong with an interface having a method or three that are essentially just "getters" for some property, such as some sort of "size" or a "name". And there's a lot of metaprogramming you can do with a map[string]CanConstructInstances. I find myself particularly often using the combination of

type Registrable interface {
    Name() string
    NewInstance() Registrable
}

var registry = map[string]Registrable{}

func Register(r Registrable) {
    registry[r.Name()] = r
}

(Error handling elided for clarity.)

If you have an instance in hand, or look up an instance in a table, or in any manner lay your hands on the correct type wrapped in an interface, you can probably avoid reflection, simply by making whatever thing you wanted to obtain via reflection the result of some method in the interface.

Move the Responsibility Around

Going back to my game server example, the first Game interface I wrote looked like this:

type Seed interface{}

type Game interface {
    GenerateNewSeed() Seed
    NewGame(Seed)
}

In this case, the Seed is not just "let this be encodable to JSON", it is the bad sort of interface{} where the NewGame method of the relevant object will have to type assert the interface{} into the correct type, stupid because a given type of Game will generate only one type of Seed.

When I first wrote the engine, I tried to abstract out the process of generating a Seed, but why was an Engine even touching a value it knew nothing about? That's a violation of the principle of least knowledge. The code was trying to tell me the Engine shouldn't be trying to route the Seed.

The answer to that was, as seen above, to move the responsibility around. The Game becomes responsible for its "Seed", the server only needs a more limited query that says "give me something serializable", and in the end, the code contains no type conversions on either side.

If you've got something that's an interface{} because it's "passing through" some code that doesn't know exactly what the value is, you have a lot of tools for instead keeping the value in code that does understand the value by shifting responsibility around. While better OO design is the best first choice, when that doesn't work, have a look at "closures", which provide a nice ability to pack up some "understanding" into some code that other things can call.

Unfortunately, if the value is ever "at rest", such as in an abstract data type where you may just want to stick a value and come back to it only much later, this isn't an option.

"Sum Types" in Go

Another tool for avoiding bare interface{} usage is something I previous wrote about already, "sum types" in Go. If you have a constrained set of messages, for instance, you can use the type system to more carefully declare them as coming from a particular set of types, not just interface{}. I've seen a number of cases in Go codebases where interface{} is used simply because "it's how you do dynamic types" when there's a perfectly sensible interface that could have been defined instead.

(I scare quote it because I know they aren't really "sum types". But it's decently close, considered across the whole of programming. Terms are often globally fuzzier than people deeply connected to only one local community think.)

While we're on the topic, don't forget interfaces can be made private to a module by making them contain an unexported (lowercase-named) method. Nothing else is allowed to conform to that interface; even if they try to define a method of the same name, Go won't accept it as conforming to the interface. This allows you to manually strive for totality. (Perhaps a checker could be written.)

Use `interface{}` As "In" Params Only

The Atlassian Stash REST API has a recurring pattern for paged results. This is an obvious place where we'd like to abstract out the act of paging, rather than implement it once per paged API. I think this is a classic "obviously impossible to do type-safely" thing in Go. So let's do it.

I'll cut a few of the fields out and the error handling, of course, but you start with the obvious:

type Paged struct {
    Size       int  `json:"size"`
    IsLastPage bool `json:"isLastPage"`
    // other fields

    // and the payload:
    Values interface{} `json:"values"`
}

To use it, you must give it the type of the thing that will go in to the Values when you decode the JSON from the server:

    // PullRequest defined elsewhere as the struct for
    // pull requests
    pagedPRs := &Paged{Values: []PullRequest{}}

    jsonStream := ... // get some JSON from where ever
    decoder := json.NewDecoder(jsonStream)
    decoder.Decode(pagedPRs)

But if you use pagedPRs.Values once the API call is complete and it has been populated, you'll have to type-assert it. The type got lost. This turns out to be easy to fix, though. Do this instead:

    prs := []PullRequest{}
    pagedPRs := &Paged{Values: prs}

    jsonStream := ... // get some JSON from where ever
    decoder := json.NewDecoder(jsonStream)
    decoder.Decode(pagedPRs)

The Paged type may have an interface{}, but here you've still got a reference to the original value; you now have the first page of the PullRequests in the prs value.

It's now possible to write a .Next method for the Paged type. My local instance is tied to some other code decisions about how the API is accessed and how base URLs are managed, so it's not useful to paste in here. But here's something else that is nice to see: Once you have a .Next(), you can easily do this:

func (p *Paged) Each(f func() error) error {
    for {
        err := pr.Next()
        if err != nil {
            return err
        }
        err = f()
        if err != nil {
            return err
        }
    }
}

Now you can easily write code to iterate over the returned pages of results that doesn't look half bad for a "static" language, with a bit of the actual URL munging elided:

    prs := []PullRequest{}
    paged := Paged{Values: prs}
    // some stuff here to configure the URL you are querying

    err := paged.Each(func() error {
        for _, pr := range prs {
            fmt.Printf("PR #%d", pr.ID)
        }
        return nil
    })

The inner "for" loop will see every pr that matches your REST query. You can abort iteration early by returning an "error". (If you like, make this more explicit in the Each implementation, perhaps by making it eat an ErrIterationDone error and returning nil to the caller in that case, rather like Python's StopIteration exception.) You get this effect where the library using the interface{} is as type-safe as the calling code... if the calling code uses strong types, the library code is safe at run time, but if the calling code is sloppy and doesn't itself know what is inside the interface{}, the library doesn't impose any additional structure and the result is as unsafe as the calling code (but no more).

(Sidebar: Since errors compare equal to themselves and only themselves, it's safe to declare ErrIterationDone = errors.New("Iteration done"), and that becomes a distinguished unique value that compares equal to itself and nothing else.)

There is a bit of a downside, which is that the closure in the .Each function needs to be in a scope in which the type of the variable being iterated over is somehow known. This does provide a small limitation on exactly how much you can abstract this. But this is a manageable language quirk, as opposed to being unable to solve this at all without extensive copy & paste to handle the paging abstraction.

In Conclusion...

By no means do these tricks, and any other such tricks people may have, cover every case. None of these permit type-safe generic data structures. You still can't use these tricks to avoid every use of interface{}. You still can't use these tricks to avoid every use of reflection.

But on the other hand, you often have more options than may first appear.

(Especially if you're not yet thinking in composition.)

As I've said before, for instance, I would not care to write Servo in Go. I wouldn't care to write any sort of massive desktop program like an office suite. I wouldn't care to write a math-heavy program in Go. I think it's got an "OK at best" story for writing compilers. It's great that the race detector is built in and that the language affords good threading patterns, but I still wish I could get static guarantees of Erlang-level isolation between threads somehow. There's many other things I wouldn't want to use it for. I mean those statements quite deeply, and I will reiterate, I don't consider them a "concession", I consider it simply a description of the situation.

Taking a modern kitchen-sink language as an example: When you compare Scala to Go, Scala covers a lot more ground than Go. But I can't help but wonder about how either those really sophisticated features must average covering a very small space or there must be a great deal more non-orthogonality to the features than meets the eye. Because for all the apparent difference in features, Scala doesn't necessarily cover as much more of the space as you might expect, since Go still manages to cover so much with its meager feature set. There's a lesson here for language nerds. I don't fully know what it is yet, but I'm working on it.