八股文-Golang Slice和Map

diracnote

2022-12-28

Golang

Golang, 八股文

1.Slice

Slice整体思路和Java里的ArrayList一致，区别在于Slice会存在共享底层数组的情况，这部分行为和ArrayList的SubList一致。

1.1.底层数据结构

type slice struct {
    array unsafe.Pointer
    len   int
    cap   int
}

1.2.扩容机制

若所需容量大于当前容量的两倍就会使用期望容量；

对于容量扩展，1.18进行重构。
1.18版本之前（不含）：若当前容量小于阈值1024，容量翻番；超过阈值之后，每次增加25%的容量，直到新容量满足期望容量；
1.18版本之后（含）：若当前容量小于阈值256，容量翻番；超过阈值之后，每次增加(25%+192)的容量，直到新容量满足期望容量；

1.17版本runtime/slice.go
1.18版本runtime/slice.go

然后，当数据类型size为1字节，4/8字节，或者2的倍数时，会根据内存大小进行向上取整，进行内存对齐。

2.Map

Golang中的Map和Java一样，采用链地址法实现。Python采用开放地址法。

2.1.底层数据结构

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

数据存储在buckets中，每个bucket最多存8个kv对。哈希值的低位用于定位bucket，部分高位存在bucket中用于区分entry，若bucket中多于8个key，多余的kv存在extra中。
bucket中的kv不进行内部移动，防止map遍历过程中漏算或者多算。

bucket的内存分布：tophash + 8个kv对 + overflow指针

创建map时，当B>4（桶数量大于16）时，会预创建额外的溢出桶。
算法：默认多创建2^(b-4)个桶，但是最终的内存需要对其，对其后多余的桶也归到溢出桶里面。
内存对其算法与TCMalloc内存分配算法相关。

// For small b, overflow buckets are unlikely.
// Avoid the overhead of the calculation.
if b >= 4 {
    // Add on the estimated number of overflow buckets
    // required to insert the median number of elements
    // used with this value of b.
    nbuckets += bucketShift(b - 4)
    sz := t.bucket.size * nbuckets
    up := roundupsize(sz)
    if up != sz {
        nbuckets = up / t.bucket.size
    }
}

2.2.扩容复制

当map扩容时，bucket数组扩充两倍大小，然后渐进式从老bucket复制至新bucket数组。
扩充因子（loadFactor）为6.5， Java HashMap的扩充因子为0.75（对应Golang中应该是6），不过两者差别不大

//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
//        4.00         2.13        20.77         3.00         4.00
//        4.50         4.05        17.30         3.25         4.50
//        5.00         6.85        14.77         3.50         5.00
//        5.50        10.55        12.94         3.75         5.50
//        6.00        15.27        11.67         4.00         6.00
//        6.50        20.90        10.79         4.25         6.50
//        7.00        27.14        10.15         4.50         7.00
//        7.50        34.03         9.73         4.75         7.50
//        8.00        41.10         9.40         5.00         8.00

2.3.查找

根据key计算出hash值
如果存在old table, 首先在old table中查找，如果找到的bucket已经evacuated，转到步骤3。反之，返回其对应的value。
在new table中查找对应的value
其中，高8位不是用作offset的，而是用于加快key的比较的。

2.4. 插入过程分析

根据key算出hash值，进而得出对应的bucket。
如果bucket在old table中，将其重新散列到new table中，并额外迁移一个oldbucket。
在bucket中，如果已经存在需要插入的key，更新其对应的value；否则查找空闲的位置。
根据table中元素的个数，判断是否grow table。
如果对应的bucket已经full，重新申请新的bucket作为overbucket。
将key/value pair插入到bucket中。

若在扩容中，key对应oldbucket存在，则先将oldbucket迁移，

有点怪的地方：先找待更新的kv位置，然后再判断是否需要扩容，需要扩容的话，会重新查找kv位置。