Reversing Go - Part 2

Reversing Golang Binaries

Recognizing common constructs


Append is implemented using growslice

Case 1 - General Use

package main

import (

func main() {
    t := os.Environ()
    p := append(t, "A", "B", "C")
    fmt.Printf("t = %x\np = %x\n", unsafe.Pointer(&t), unsafe.Pointer(&p))

Now the compiler compiles it to

.text:004A7556    lea     rax, string_autogen_SNN2L2
.text:004A755D    mov     [rsp+90h+var_90], rax
.text:004A7561    call    runtime_newobject
.text:004A7566    mov     rax, [rsp+90h+var_88]
.text:004A756B    mov     [rsp+90h+newSlice], rax
.text:004A7570    call    syscall_Environ
.text:004A7575    mov     rax, [rsp+90h+var_90]
.text:004A7579    mov     rcx, [rsp+90h+var_88]
.text:004A757E    mov     rdx, [rsp+90h+var_80]
.text:004A7583    mov     rdi, [rsp+90h+newSlice]
.text:004A7588    mov     [rdi+slice.len], rcx
.text:004A758C    mov     [rdi+slice.cap], rdx
.text:004A7590    cmp     cs:runtime_writeBarrier, 0
.text:004A7597    jnz     loc_4A77A5
.text:004A759D    mov     [], rax

A new slice (3 words) is malloc’d using runtime.newobject and the return value os syscall.Environ is assigned to the newly created empty slice.

Next comes the call to append.

.text:004A75A0    lea     rax, string_autogen_SNN2L2
.text:004A75A7    mov     [rsp+90h+var_90], rax
.text:004A75AB    call    runtime_newobject
.text:004A75B0    mov     rdi, [rsp+90h+var_88]
.text:004A75B5    mov     rax, [rsp+90h+newSlice]
.text:004A75BA    mov     rcx, [rax+8]              ; newSlice.len
.text:004A75BE    mov     rdx, [rax+10h]            ; newSlice.cap
.text:004A75C2    mov     rbx, [rax]                ; newSlice.ptr
.text:004A75C5    lea     rsi, [rcx+3]
.text:004A75C9    cmp     rsi, rdx
.text:004A75CC    ja      need_more_space

It checks if the capacity of newSlice is enough to accomodate 3 more elements. If the capacity is smaller, a new slice is allocated using growslice

.text:004A775A    lea     rax, string_autogen_PMMZGP
.text:004A7761    mov     [rsp+90h+var_90], rax
.text:004A7765    mov     [rsp+90h+var_88], rbx
.text:004A776A    mov     [rsp+90h+var_80], rcx
.text:004A776F    mov     [rsp+90h+var_78], rdx
.text:004A7774    mov     [rsp+90h+var_70], rsi
.text:004A7779    call    runtime_growslice
.text:004A777E    mov     rbx, [rsp+90h+var_68] ; ptr
.text:004A7783    mov     rax, [rsp+90h+var_60] ; len
.text:004A7788    mov     rdx, [rsp+90h+var_58] ; cap
.text:004A778D    lea     rsi, [rax+3]
.text:004A7791    mov     rax, [rsp+90h+newSlice]
.text:004A7796    mov     rcx, [rsp+90h+var_40]
.text:004A779B    mov     rdi, [rsp+90h+var_38]
.text:004A77A0    jmp     append_elements

Now we know how to deduce params, so we can say that growslice has the signature

func growslice(tp *rtype, oldSlice slice, newCap int) slice
.text:004A75D2    shl     rcx, 4
.text:004A75D6    mov     qword ptr [rbx+rcx+8], 1  ; set length of new string
.text:004A75DF    lea     r8, [rbx+rcx]
.text:004A75E3    lea     r9, [rbx+rcx]
.text:004A75E7    lea     r9, [r9+10h]
.text:004A75EB    lea     r10, [rbx+rcx]
.text:004A75EF    lea     r10, [r10+20h]
.text:004A75F3    cmp     cs:runtime_writeBarrier, 0
.text:004A75FA    nop     word ptr [rax+rax+00h]
.text:004A7600    jnz     loc_4A7736
.text:004A7606    lea     r8, a5Abclmnpsz+0Dh ; "ABCLMNPSZ[\\\n\t"
.text:004A760D    mov     [rbx+rcx], r8

rbx points to the slice’s ptr, rcx contains the length of the slice. This snippet assigns the word at offset rcx*16+8 to 1 and the word at offset rcx*16+0 to the string “ABCL..”.

From Part-1 we know that a string has two words - ptr and len. So, here we are assigning a string of length 1 to the index stored in rcx. A string slice of n+1 elements looks something like this -

+00 ptr -> [ptr[0], len[0], ptr[1], len[1], ..., ptr[n], len[n]]
+08 len
+10 cap
.text:004A7611    mov     qword ptr [rbx+rcx+18h], 1
.text:004A761A    cmp     cs:runtime_writeBarrier, 0
.text:004A7621    jnz     loc_4A7719
.text:004A7627    lea     r8, a5Abclmnpsz+0Eh ; "BCLMNPSZ[\\\n\t"
.text:004A762E    mov     [rbx+rcx+10h], r8

At index rcx+1, it assigns the string “B”

.text:004A7633    mov     qword ptr [rbx+rcx+28h], 1
.text:004A763C    cmp     cs:runtime_writeBarrier, 0
.text:004A7643    jnz     loc_4A76FF
.text:004A7649    lea     r8, a5Abclmnpsz+0Fh ; "CLMNPSZ[\\\n\t"
.text:004A7650    mov     [rbx+rcx+20h], r8

At index rcx+2, it assigns the string “C”


newSlice = append(newSlice, oldSlice...)

is compiled to

oldLen := len(newSlice)
newLen := oldLen+len(oldSlice)
if newLen > cap(newCap) {
    // allocate a larger slice
    newSlice = growslice(newSlice, newLen)
newSlice[oldLen] = oldSlice[0]
newSlice[oldLen+1] = oldSlice[1]
// ...

Case 2 - A special case of append

We know that make calls a specialized implementation on the type being used on. if we call make using a slice type, the compiler calls runtime.makeslice, for channels, it uses runtime.makechan and for maps, runtime.makechan

let’s consider the following snippet

// ...
a = append(a, make([]string, 1024)...)

Does go call makeslice and then growslice? No, it doesn’t. If you see the implementation of makeslice, you will find that it calls mallocgc with the third param (zero out the allocated memory) set to true. So, makeslice always returns a zeroed out slice. Now in this case, we are appending zeros to a. The compiler knows this, and it optimizes and removes the call to makeslice. Why?

append calls growslice which in turn calls mallocgc to allocate memory if the capacity is less. So, instead of two calls to mallocgc, only one call to mallocgc is required.

.text:004A707C    lea     rax, string_autogen_I57PDL
.text:004A7083    mov     [rsp+0A0h+var_A0], rax
.text:004A7087    call    runtime_newobject
.text:004A708C    mov     rdi, [rsp+0A0h+var_98]
.text:004A7091    mov     [rsp+0A0h+p], rdi
.text:004A7096    mov     rax, [rsp+0A0h+t]
.text:004A709B    mov     rcx, [rax]        ; ptr
.text:004A709E    mov     rdx, [rax+8]      ; len
.text:004A70A2    mov     rbx, [rax+10h]    ; cap
.text:004A70A6    lea     rsi, [rdx+1024]   ; just increase the length
.text:004A70AD    mov     [rsp+0A0h+var_48], rsi
.text:004A70B2    cmp     rsi, rbx
.text:004A70B5    ja      need_more_space       ; growslice

If growslice is not called, it zeroes out the part of the slice that needs to be appended - 1024*16 bytes starting from len(a).

.text:004A70C0    cmp     r8, rcx
.text:004A70C3    jz      clear_memory
; ...
.text:004A7189    shl     rdx, 4
.text:004A718D    lea     rax, [rcx+rdx]
.text:004A7191    mov     [rsp+0A0h+var_A0], rax
.text:004A7195    mov     [rsp+0A0h+var_98], 4000h  ; clear 16*1024 bytes
.text:004A719E    xchg    ax, ax
.text:004A71A0    call    runtime_memclrHasPointers

From the docs,

memclrHasPointers clears n bytes of typed memory starting at ptr. The caller must ensure that the type of the object at ptr has pointers, usually by checking typ.ptrdata. However, ptr does not have to point to the start of the allocation.

This makes sense, since a string is composed of a pointer and it’s length.

Summarizing, we have

// ...
a = append(a, make([]string, N)...)

is implemented by

oldLen := len(a)
newLen := N+oldLen
ptr := &a[0]
if newLen > cap(a) {
    a = growslice(a, newLen)
if ptr == &a[0] {
    memset(a[oldLen:], N*sizeof(a[0]))
    // of course memset is not there in go
    // equivalent would be memclr family of functions

Strings, Bytes and Runes

Rune to String

a = string(rune(R))

is compiled to

a = intstring(nil, R)

Array of Runes/Bytes to string

for runes,

var t []rune
// ...
a = string(t)

compiles to

a = slicerunetostring(tmpBufPtr, t)

for bytes,

var t []byte
// ...
a = string(t)

compiles to

a = slicebytetostring(tmpBufPtr, t)

Now, tmpBufPtr is a pointer to an array of 32 bytes, if t does not escape to heap. If t escapes to heap, tmpBufPtr is nil

String to Byte array/Rune array

func main() {
    fmt.Println([]byte("I love Go"))
.text:004A57C8    lea     rax, stru_4B2CE0  ; [9]byte
.text:004A57CF    mov     [rsp+68h+var_68], rax
.text:004A57D3    call    runtime_newobject
.text:004A57D8    mov     rax, [rsp+68h+var_60]
.text:004A57DD    mov     rcx, 'G evol I'
.text:004A57E7    mov     [rax], rcx
.text:004A57EA    mov     byte ptr [rax+8], 'o'
.text:004A57EE    mov     [rsp+68h+var_68], rax
.text:004A57F2    mov     [rsp+68h+var_60], 9
.text:004A57FB    mov     [rsp+68h+var_58], 9
.text:004A5804    call    runtime_convTslice

So, as you can see, an array of [9]byte is created using newobject and the string is copied into that array. I will explain the convTN family later.

.text:004A585F    mov     rcx, cs:os_Argc
.text:004A5866    mov     rax, cs:os_Args       ; []string
.text:004A586D    test    rcx, rcx
.text:004A5870    jbe     loc_4A5916
.text:004A5876    mov     rcx, [rax]            ; os.Args[0]
.text:004A5879    mov     rax, [rax+8]          ; os.Args[0].len
.text:004A587D    mov     [rsp+68h+var_68], 0   ; tmpBufPtr
.text:004A5885    mov     [rsp+68h+var_60], rcx ; str.ptr
.text:004A588A    mov     [rsp+68h+var_58], rax ; str.len
.text:004A588F    call    runtime_stringtoslicebyte

If the string is not a literal, then stringtoslicebyte is called to get a byte array, for runes, the corresponding function is stringtoslicerune. Here are the signatures of the functions

func stringtoslicebyte(*[32]byte, string) []byte
func stringtoslicerune(*[32]rune, string) []rune

convTN family of functions

from primitive types

convT16, convT32, convT64, convTstring, convTslice allocates the respective structures in heap and returns a pointer to it. convT16, convT32, convT64 are used for 16, 32 and 64 bit types.

When are these functions used? These functions are used when we try to convert a primitive data type to an interface{}

For example,

func main() {
    var tv, iv interface{}
    iv = os.Args[0]
    iv = 0xcafe
    iv = "I love Go!"
    iv = []byte("I love Rust!")
    tv = []byte("I love Go and Rust!")
    iv = tv
.text:004A57F4    mov     [rsp+0C8h+var_C8], rax
.text:004A57F8    mov     [rsp+0C8h+var_C0], rcx
.text:004A57FD    nop     dword ptr [rax]
.text:004A5800    call    runtime_convTstring
.text:004A5805    mov     rax, [rsp+0C8h+var_B8]
.text:004A580A    xorps   xmm0, xmm0
.text:004A580D    movups  [rsp+0C8h+var_18], xmm0
; make eface
.text:004A5815    lea     rcx, string_autogen_CT9221    ; *string
.text:004A581C    mov     qword ptr [rsp+0C8h+var_18], rcx
.text:004A5824    mov     qword ptr [rsp+0C8h+var_18+8], rax

convTstring is used to get a pointer to os.Args[0] and construct the interface{} value, whose type is *string (pointer to string)

for literals, the interface is directly constructed using the address of the object

.text:004A5872    lea     rax, int
.text:004A5879    mov     qword ptr [rsp+0C8h+var_28], rax
.text:004A5881    lea     rax, qword_4E9EC0 ; 0xcafe
.text:004A5888    mov     qword ptr [rsp+0C8h+var_28+8], rax
; ...
.text:004A58D6    lea     rax, string_autogen_CT9221
.text:004A58DD    mov     qword ptr [rsp+0C8h+var_38], rax
.text:004A58E5    lea     rax, off_4EA330   ; *string
.text:004A58EC    mov     qword ptr [rsp+0C8h+var_38+8], rax
; ...
.rdata:004EA330 off_4EA330  dq offset aILoveGo  ; "I love Go!"
.rdata:004EA338             dq 0Ah

For slices, an array is constructed using newobject and then convTslice is used to construct an interface

.text:004A592F    lea     rax, stru_4B15C0  *[19]byte
.text:004A5936    mov     [rsp+0C8h+var_C8], rax
.text:004A59E5    call    runtime_newobject
.text:004A59EA    mov     rax, [rsp+0C8h+var_C0]
.text:004A59EF    mov     rcx, 'G evol I'
.text:004A59F9    mov     [rax], rcx
.text:004A59FC    mov     rcx, 'a oG evo'
.text:004A5A06    mov     [rax+3], rcx
.text:004A5A0A    mov     rcx, '!tsuR dn'
.text:004A5A14    mov     [rax+0Bh], rcx
.text:004A5A18    mov     [rsp+0C8h+var_C8], rax
.text:004A5A1C    mov     [rsp+0C8h+var_C0], 13h
.text:004A5A25    mov     [rsp+0C8h+var_B8], 13h
.text:004A5A2E    call    runtime_convTslice

from non-primitive types

Let’s consider the following snippet

type I1 interface {

type I2 interface {

type S struct {
	x, y int64

func (S) Method1() {}
func (S) Method2() {}

func main() {
	var e interface{}
	var s S
	var i1 I1
	var i2 I2
	e = s           // convT2E
	i1 = s          // convT2I
	e = i1          // no conversion
	i2 = s          // convT2I
	i1 = i2         // convI2I

For e = s, we have the following,

.text:004A5976    xorps   xmm0, xmm0
.text:004A5979    movups  [rsp+0B8h+var_70], xmm0
.text:004A597E    lea     rax, main_S
.text:004A5985    mov     [rsp+0B8h+var_B8], rax
.text:004A5989    lea     rax, [rsp+0B8h+var_70]
.text:004A598E    mov     [rsp+0B8h+var_B0], rax
.text:004A5993    call    runtime_convT2Enoptr

when we try to assign a type T (which is not a 64 bit word or a slice or a string) to an interface, convT2E and convT2Enoptr is used. If T has embedded pointers, convT2E is used. It constructs an eface (empty interface) instance from type T.

func convT2E(t *_type, elem unsafe.Pointer) (e eface)

Now what if the target of the assignment is a iface (non empty interface, eface is an empty interface). Then convT2I is used.

Consider the statement i1 = s in the above code, we are assigning a struct instance to a non empty interface, in this case, convT2I is used

.text:004A59F8    xorps   xmm0, xmm0
.text:004A59FB    movups  [rsp+0B8h+var_70], xmm0
.text:004A5A00    lea     rax, go_itab_main_S_main_I1
.text:004A5A07    mov     [rsp+0B8h+var_B8], rax
.text:004A5A0B    lea     rax, [rsp+0B8h+var_70]
.text:004A5A10    mov     [rsp+0B8h+var_B0], rax
.text:004A5A15    call    runtime_convT2Inoptr

convT2I converts type T to a non empty interface (an interface with a valid set of functions)

type iface struct {
	tab  *itab
	data unsafe.Pointer

type eface struct {
	utype *_type
	data  unsafe.Pointer

func convT2E(t *_type, elem unsafe.Pointer) (e eface)
func convT2I(tab *itab, elem unsafe.Pointer) (i iface)
func convI2I(inter *interfacetype, i iface) (r iface)

the itab structure is discussed in part-1

For the statement i1 = i2, we can do that because the methods exposed by interface I1 are contained in the interface I2. The go compiler uses convI2I for this scenario

.text:004A5B28    lea     rax, main_I1  ; *interfaceType
.text:004A5B2F    mov     [rsp+0B8h+var_B8], rax
.text:004A5B33    mov     rax, [rsp+0B8h+var_78]    ; iface.utype
.text:004A5B38    mov     [rsp+0B8h+var_B0], rax
.text:004A5B3D    mov     rax, [rsp+0B8h+var_60]    ;
.text:004A5B42    mov     [rsp+0B8h+var_A8], rax
.text:004A5B47    call    runtime_convI2I

convI2I takes the target interfaceType, and the interface (iface) we want to convert and returns the iface pointing to the target type. How does convI2I do it?

func convI2I(inter *interfacetype, i iface) (r iface) {
	tab :=
	if tab == nil {
	if tab.inter == inter { = tab =
	} = getitab(inter, tab._type, false) =

If we are assigning interfaces whose types are same, like some instance of I1 to I1, then the itab table is retained. Otherwise, getitab searches the global table of itabs (itabTable) for the interface type we want to convert to (i1’s type) and the underlying type as the type we are converting from (i2’s underlying type).

In this example, underlying type of i2 is the struct S, and the interface type of i1 is the interfaceType I1. getitab searches for an itab with interface type I1 and underlying type S and returns a pointer to it. It must return itab_S_I1 since this is the itab that satisfies the conditions

What did we learn?

  1. append function
  2. type conversions
Built with Hugo
Theme Stack designed by Jimmy