Recognizing common constructs
Append
Append is implemented using growslice
Case 1 - General Use
package main
import (
"fmt"
"os"
"unsafe"
)
func main() {
t := os.Environ()
p := append(t, "A", "B", "C")
fmt.Printf("t = %x\np = %x\n", unsafe.Pointer(&t), unsafe.Pointer(&p))
}
Now the compiler compiles it to
.text:004A7556 lea rax, string_autogen_SNN2L2
.text:004A755D mov [rsp+90h+var_90], rax
.text:004A7561 call runtime_newobject
.text:004A7566 mov rax, [rsp+90h+var_88]
.text:004A756B mov [rsp+90h+newSlice], rax
.text:004A7570 call syscall_Environ
.text:004A7575 mov rax, [rsp+90h+var_90]
.text:004A7579 mov rcx, [rsp+90h+var_88]
.text:004A757E mov rdx, [rsp+90h+var_80]
.text:004A7583 mov rdi, [rsp+90h+newSlice]
.text:004A7588 mov [rdi+slice.len], rcx
.text:004A758C mov [rdi+slice.cap], rdx
.text:004A7590 cmp cs:runtime_writeBarrier, 0
.text:004A7597 jnz loc_4A77A5
.text:004A759D mov [rdi+slice.data], rax
A new slice (3 words) is malloc’d using runtime.newobject
and the return value os syscall.Environ
is assigned to the newly created empty slice.
Next comes the call to append
.
.text:004A75A0 lea rax, string_autogen_SNN2L2
.text:004A75A7 mov [rsp+90h+var_90], rax
.text:004A75AB call runtime_newobject
.text:004A75B0 mov rdi, [rsp+90h+var_88]
.text:004A75B5 mov rax, [rsp+90h+newSlice]
.text:004A75BA mov rcx, [rax+8] ; newSlice.len
.text:004A75BE mov rdx, [rax+10h] ; newSlice.cap
.text:004A75C2 mov rbx, [rax] ; newSlice.ptr
.text:004A75C5 lea rsi, [rcx+3]
.text:004A75C9 cmp rsi, rdx
.text:004A75CC ja need_more_space
It checks if the capacity of newSlice
is enough to accomodate 3 more elements. If the capacity is smaller, a new slice is allocated using growslice
.text:004A775A lea rax, string_autogen_PMMZGP
.text:004A7761 mov [rsp+90h+var_90], rax
.text:004A7765 mov [rsp+90h+var_88], rbx
.text:004A776A mov [rsp+90h+var_80], rcx
.text:004A776F mov [rsp+90h+var_78], rdx
.text:004A7774 mov [rsp+90h+var_70], rsi
.text:004A7779 call runtime_growslice
.text:004A777E mov rbx, [rsp+90h+var_68] ; ptr
.text:004A7783 mov rax, [rsp+90h+var_60] ; len
.text:004A7788 mov rdx, [rsp+90h+var_58] ; cap
.text:004A778D lea rsi, [rax+3]
.text:004A7791 mov rax, [rsp+90h+newSlice]
.text:004A7796 mov rcx, [rsp+90h+var_40]
.text:004A779B mov rdi, [rsp+90h+var_38]
.text:004A77A0 jmp append_elements
Now we know how to deduce params, so we can say that growslice has the signature
func growslice(tp *rtype, oldSlice slice, newCap int) slice
.text:004A75D2 shl rcx, 4
.text:004A75D6 mov qword ptr [rbx+rcx+8], 1 ; set length of new string
.text:004A75DF lea r8, [rbx+rcx]
.text:004A75E3 lea r9, [rbx+rcx]
.text:004A75E7 lea r9, [r9+10h]
.text:004A75EB lea r10, [rbx+rcx]
.text:004A75EF lea r10, [r10+20h]
.text:004A75F3 cmp cs:runtime_writeBarrier, 0
.text:004A75FA nop word ptr [rax+rax+00h]
.text:004A7600 jnz loc_4A7736
.text:004A7606 lea r8, a5Abclmnpsz+0Dh ; "ABCLMNPSZ[\\\n\t"
.text:004A760D mov [rbx+rcx], r8
rbx
points to the slice’s ptr, rcx
contains the length of the slice. This snippet assigns the word at offset rcx*16+8
to 1 and the word at offset rcx*16+0
to the string “ABCL..”.
From Part-1 we know that a string has two words - ptr and len. So, here we are assigning a string of length 1 to the index stored in rcx
. A string slice of n+1 elements looks something like this -
+00 ptr -> [ptr[0], len[0], ptr[1], len[1], ..., ptr[n], len[n]]
+08 len
+10 cap
.text:004A7611 mov qword ptr [rbx+rcx+18h], 1
.text:004A761A cmp cs:runtime_writeBarrier, 0
.text:004A7621 jnz loc_4A7719
.text:004A7627 lea r8, a5Abclmnpsz+0Eh ; "BCLMNPSZ[\\\n\t"
.text:004A762E mov [rbx+rcx+10h], r8
At index rcx+1
, it assigns the string “B”
.text:004A7633 mov qword ptr [rbx+rcx+28h], 1
.text:004A763C cmp cs:runtime_writeBarrier, 0
.text:004A7643 jnz loc_4A76FF
.text:004A7649 lea r8, a5Abclmnpsz+0Fh ; "CLMNPSZ[\\\n\t"
.text:004A7650 mov [rbx+rcx+20h], r8
At index rcx+2
, it assigns the string “C”
Summarizing,
newSlice = append(newSlice, oldSlice...)
is compiled to
oldLen := len(newSlice)
newLen := oldLen+len(oldSlice)
if newLen > cap(newCap) {
// allocate a larger slice
newSlice = growslice(newSlice, newLen)
}
newSlice[oldLen] = oldSlice[0]
newSlice[oldLen+1] = oldSlice[1]
// ...
Case 2 - A special case of append
We know that make
calls a specialized implementation on the type being used on.
if we call make
using a slice type, the compiler calls runtime.makeslice
, for channels, it uses runtime.makechan
and for maps, runtime.makechan
let’s consider the following snippet
// ...
a = append(a, make([]string, 1024)...)
Does go call makeslice
and then growslice? No, it doesn’t. If you see the implementation of makeslice, you will find that it calls mallocgc
with the third param (zero out the allocated memory) set to true. So, makeslice always returns a zeroed out slice. Now in this case, we are appending zeros to a
. The compiler knows this, and it optimizes and removes the call to makeslice
. Why?
append
calls growslice
which in turn calls mallocgc
to allocate memory if the capacity is less. So, instead of two calls to mallocgc
, only one call to mallocgc
is required.
.text:004A707C lea rax, string_autogen_I57PDL
.text:004A7083 mov [rsp+0A0h+var_A0], rax
.text:004A7087 call runtime_newobject
.text:004A708C mov rdi, [rsp+0A0h+var_98]
.text:004A7091 mov [rsp+0A0h+p], rdi
.text:004A7096 mov rax, [rsp+0A0h+t]
.text:004A709B mov rcx, [rax] ; ptr
.text:004A709E mov rdx, [rax+8] ; len
.text:004A70A2 mov rbx, [rax+10h] ; cap
.text:004A70A6 lea rsi, [rdx+1024] ; just increase the length
.text:004A70AD mov [rsp+0A0h+var_48], rsi
.text:004A70B2 cmp rsi, rbx
.text:004A70B5 ja need_more_space ; growslice
If growslice
is not called, it zeroes out the part of the slice that needs to be appended - 1024*16 bytes starting from len(a)
.
.text:004A70C0 cmp r8, rcx
.text:004A70C3 jz clear_memory
; ...
clear_memory:
.text:004A7189 shl rdx, 4
.text:004A718D lea rax, [rcx+rdx]
.text:004A7191 mov [rsp+0A0h+var_A0], rax
.text:004A7195 mov [rsp+0A0h+var_98], 4000h ; clear 16*1024 bytes
.text:004A719E xchg ax, ax
.text:004A71A0 call runtime_memclrHasPointers
From the docs,
memclrHasPointers clears n bytes of typed memory starting at ptr. The caller must ensure that the type of the object at ptr has pointers, usually by checking typ.ptrdata. However, ptr does not have to point to the start of the allocation.
This makes sense, since a string is composed of a pointer and it’s length.
Summarizing, we have
// ...
a = append(a, make([]string, N)...)
is implemented by
oldLen := len(a)
newLen := N+oldLen
ptr := &a[0]
if newLen > cap(a) {
a = growslice(a, newLen)
}
if ptr == &a[0] {
memset(a[oldLen:], N*sizeof(a[0]))
// of course memset is not there in go
// equivalent would be memclr family of functions
}
Strings, Bytes and Runes
Rune to String
a = string(rune(R))
is compiled to
a = intstring(nil, R)
Array of Runes/Bytes to string
for runes,
var t []rune
// ...
a = string(t)
compiles to
a = slicerunetostring(tmpBufPtr, t)
for bytes,
var t []byte
// ...
a = string(t)
compiles to
a = slicebytetostring(tmpBufPtr, t)
Now, tmpBufPtr is a pointer to an array of 32 bytes, if t does not escape to heap. If t escapes to heap, tmpBufPtr is nil
String to Byte array/Rune array
func main() {
fmt.Println([]byte("I love Go"))
fmt.Println([]byte(os.Args[0]))
}
.text:004A57C8 lea rax, stru_4B2CE0 ; [9]byte
.text:004A57CF mov [rsp+68h+var_68], rax
.text:004A57D3 call runtime_newobject
.text:004A57D8 mov rax, [rsp+68h+var_60]
.text:004A57DD mov rcx, 'G evol I'
.text:004A57E7 mov [rax], rcx
.text:004A57EA mov byte ptr [rax+8], 'o'
.text:004A57EE mov [rsp+68h+var_68], rax
.text:004A57F2 mov [rsp+68h+var_60], 9
.text:004A57FB mov [rsp+68h+var_58], 9
.text:004A5804 call runtime_convTslice
So, as you can see, an array of [9]byte
is created using newobject
and the string is copied into that array. I will explain the convTN
family later.
.text:004A585F mov rcx, cs:os_Argc
.text:004A5866 mov rax, cs:os_Args ; []string
.text:004A586D test rcx, rcx
.text:004A5870 jbe loc_4A5916
.text:004A5876 mov rcx, [rax] ; os.Args[0]
.text:004A5879 mov rax, [rax+8] ; os.Args[0].len
.text:004A587D mov [rsp+68h+var_68], 0 ; tmpBufPtr
.text:004A5885 mov [rsp+68h+var_60], rcx ; str.ptr
.text:004A588A mov [rsp+68h+var_58], rax ; str.len
.text:004A588F call runtime_stringtoslicebyte
If the string is not a literal, then stringtoslicebyte
is called to get a byte array, for runes, the corresponding function is stringtoslicerune
. Here are the signatures of the functions
func stringtoslicebyte(*[32]byte, string) []byte
func stringtoslicerune(*[32]rune, string) []rune
convTN family of functions
from primitive types
convT16
, convT32
, convT64
, convTstring
, convTslice
allocates the respective structures in heap and returns a pointer to it. convT16
, convT32
, convT64
are used for 16, 32 and 64 bit types.
When are these functions used? These functions are used when we try to convert a primitive data type to an interface{}
For example,
func main() {
var tv, iv interface{}
iv = os.Args[0]
fmt.Println(iv)
iv = 0xcafe
fmt.Println(iv)
iv = "I love Go!"
fmt.Println(iv)
iv = []byte("I love Rust!")
fmt.Println(iv)
tv = []byte("I love Go and Rust!")
fmt.Println(iv)
iv = tv
fmt.Println(iv)
}
.text:004A57F4 mov [rsp+0C8h+var_C8], rax
.text:004A57F8 mov [rsp+0C8h+var_C0], rcx
.text:004A57FD nop dword ptr [rax]
.text:004A5800 call runtime_convTstring
.text:004A5805 mov rax, [rsp+0C8h+var_B8]
.text:004A580A xorps xmm0, xmm0
.text:004A580D movups [rsp+0C8h+var_18], xmm0
; make eface
.text:004A5815 lea rcx, string_autogen_CT9221 ; *string
.text:004A581C mov qword ptr [rsp+0C8h+var_18], rcx
.text:004A5824 mov qword ptr [rsp+0C8h+var_18+8], rax
convTstring
is used to get a pointer to os.Args[0]
and construct the interface{}
value, whose type is *string
(pointer to string)
for literals, the interface is directly constructed using the address of the object
.text:004A5872 lea rax, int
.text:004A5879 mov qword ptr [rsp+0C8h+var_28], rax
.text:004A5881 lea rax, qword_4E9EC0 ; 0xcafe
.text:004A5888 mov qword ptr [rsp+0C8h+var_28+8], rax
; ...
.text:004A58D6 lea rax, string_autogen_CT9221
.text:004A58DD mov qword ptr [rsp+0C8h+var_38], rax
.text:004A58E5 lea rax, off_4EA330 ; *string
.text:004A58EC mov qword ptr [rsp+0C8h+var_38+8], rax
; ...
.rdata:004EA330 off_4EA330 dq offset aILoveGo ; "I love Go!"
.rdata:004EA338 dq 0Ah
For slices, an array is constructed using newobject
and then convTslice
is used to construct an interface
.text:004A592F lea rax, stru_4B15C0 *[19]byte
.text:004A5936 mov [rsp+0C8h+var_C8], rax
.text:004A59E5 call runtime_newobject
.text:004A59EA mov rax, [rsp+0C8h+var_C0]
.text:004A59EF mov rcx, 'G evol I'
.text:004A59F9 mov [rax], rcx
.text:004A59FC mov rcx, 'a oG evo'
.text:004A5A06 mov [rax+3], rcx
.text:004A5A0A mov rcx, '!tsuR dn'
.text:004A5A14 mov [rax+0Bh], rcx
.text:004A5A18 mov [rsp+0C8h+var_C8], rax
.text:004A5A1C mov [rsp+0C8h+var_C0], 13h
.text:004A5A25 mov [rsp+0C8h+var_B8], 13h
.text:004A5A2E call runtime_convTslice
from non-primitive types
Let’s consider the following snippet
type I1 interface {
Method1()
}
type I2 interface {
Method1()
Method2()
}
type S struct {
x, y int64
}
func (S) Method1() {}
func (S) Method2() {}
func main() {
var e interface{}
var s S
var i1 I1
var i2 I2
e = s // convT2E
fmt.Println(e)
i1 = s // convT2I
e = i1 // no conversion
fmt.Println(e)
i2 = s // convT2I
fmt.Println(i2)
i1 = i2 // convI2I
fmt.Println(e)
}
For e = s
, we have the following,
.text:004A5976 xorps xmm0, xmm0
.text:004A5979 movups [rsp+0B8h+var_70], xmm0
.text:004A597E lea rax, main_S
.text:004A5985 mov [rsp+0B8h+var_B8], rax
.text:004A5989 lea rax, [rsp+0B8h+var_70]
.text:004A598E mov [rsp+0B8h+var_B0], rax
.text:004A5993 call runtime_convT2Enoptr
when we try to assign a type T (which is not a 64 bit word or a slice or a string) to an interface, convT2E
and convT2Enoptr
is used. If T has embedded pointers, convT2E
is used. It constructs an eface
(empty interface) instance from type T.
func convT2E(t *_type, elem unsafe.Pointer) (e eface)
Now what if the target of the assignment is a iface
(non empty interface, eface
is an empty interface). Then convT2I
is used.
Consider the statement i1 = s
in the above code, we are assigning a struct instance to a non empty interface, in this case, convT2I
is used
.text:004A59F8 xorps xmm0, xmm0
.text:004A59FB movups [rsp+0B8h+var_70], xmm0
.text:004A5A00 lea rax, go_itab_main_S_main_I1
.text:004A5A07 mov [rsp+0B8h+var_B8], rax
.text:004A5A0B lea rax, [rsp+0B8h+var_70]
.text:004A5A10 mov [rsp+0B8h+var_B0], rax
.text:004A5A15 call runtime_convT2Inoptr
convT2I
converts type T to a non empty interface (an interface with a valid set of functions)
type iface struct {
tab *itab
data unsafe.Pointer
}
type eface struct {
utype *_type
data unsafe.Pointer
}
func convT2E(t *_type, elem unsafe.Pointer) (e eface)
func convT2I(tab *itab, elem unsafe.Pointer) (i iface)
func convI2I(inter *interfacetype, i iface) (r iface)
the itab
structure is discussed in part-1
For the statement i1 = i2
, we can do that because the methods exposed by interface I1
are contained in the interface I2
. The go compiler uses convI2I
for this scenario
.text:004A5B28 lea rax, main_I1 ; *interfaceType
.text:004A5B2F mov [rsp+0B8h+var_B8], rax
.text:004A5B33 mov rax, [rsp+0B8h+var_78] ; iface.utype
.text:004A5B38 mov [rsp+0B8h+var_B0], rax
.text:004A5B3D mov rax, [rsp+0B8h+var_60] ; iface.data
.text:004A5B42 mov [rsp+0B8h+var_A8], rax
.text:004A5B47 call runtime_convI2I
convI2I
takes the target interfaceType
, and the interface (iface
) we want to convert and returns the iface
pointing to the target type. How does convI2I
do it?
func convI2I(inter *interfacetype, i iface) (r iface) {
tab := i.tab
if tab == nil {
return
}
if tab.inter == inter {
r.tab = tab
r.data = i.data
return
}
r.tab = getitab(inter, tab._type, false)
r.data = i.data
return
}
If we are assigning interfaces whose types are same, like some instance of I1
to I1
, then the itab
table is retained. Otherwise, getitab
searches the global table of itabs (itabTable
) for the interface type we want to convert to (i1’s type) and the underlying type as the type we are converting from (i2’s underlying type).
In this example, underlying type of i2 is the struct S
, and the interface type of i1 is the interfaceType I1
. getitab
searches for an itab
with interface type I1
and underlying type S
and returns a pointer to it. It must return itab_S_I1
since this is the itab that satisfies the conditions
What did we learn?
- append function
- type conversions