八股文-Golang Slice和Map

1.Slice

Slice整体思路和Java里的ArrayList一致,区别在于Slice会存在共享底层数组的情况,这部分行为和ArrayList的SubList一致。

1.1.底层数据结构

1
2
3
4
5
type slice struct {
array unsafe.Pointer
len int
cap int
}

1.2.扩容机制

若所需容量大于当前容量的两倍就会使用期望容量;

对于容量扩展,1.18进行重构。
1.18版本之前(不含):若当前容量小于阈值1024,容量翻番;超过阈值之后,每次增加25%的容量,直到新容量满足期望容量;
1.18版本之后(含):若当前容量小于阈值256,容量翻番;超过阈值之后,每次增加(25%+192)的容量,直到新容量满足期望容量;

1.17版本runtime/slice.go
1.18版本runtime/slice.go

然后,当数据类型size为1字节,4/8字节,或者2的倍数时,会根据内存大小进行向上取整,进行内存对齐。

2.Map

Golang中的Map和Java一样,采用链地址法实现。Python采用开放地址法。

2.1.底层数据结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed

buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)

extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
// If both key and elem do not contain pointers and are inline, then we mark bucket
// type as containing no pointers. This avoids scanning such maps.
// However, bmap.overflow is a pointer. In order to keep overflow buckets
// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
// overflow and oldoverflow are only used if key and elem do not contain pointers.
// overflow contains overflow buckets for hmap.buckets.
// oldoverflow contains overflow buckets for hmap.oldbuckets.
// The indirection allows to store a pointer to the slice in hiter.
overflow *[]*bmap
oldoverflow *[]*bmap

// nextOverflow holds a pointer to a free overflow bucket.
nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
// tophash generally contains the top byte of the hash value
// for each key in this bucket. If tophash[0] < minTopHash,
// tophash[0] is a bucket evacuation state instead.
tophash [bucketCnt]uint8
// Followed by bucketCnt keys and then bucketCnt elems.
// NOTE: packing all the keys together and then all the elems together makes the
// code a bit more complicated than alternating key/elem/key/elem/... but it allows
// us to eliminate padding which would be needed for, e.g., map[int64]int8.
// Followed by an overflow pointer.
}

数据存储在buckets中,每个bucket最多存8个kv对。哈希值的低位用于定位bucket,部分高位存在bucket中用于区分entry,若bucket中多于8个key,多余的kv存在extra中。
bucket中的kv不进行内部移动,防止map遍历过程中漏算或者多算。

bucket的内存分布:tophash + 8个kv对 + overflow指针

创建map时,当B>4(桶数量大于16)时,会预创建额外的溢出桶。
算法:默认多创建2^(b-4)个桶,但是最终的内存需要对其,对其后多余的桶也归到溢出桶里面。
内存对其算法与TCMalloc内存分配算法相关。

1
2
3
4
5
6
7
8
9
10
11
12
13
// For small b, overflow buckets are unlikely.
// Avoid the overhead of the calculation.
if b >= 4 {
// Add on the estimated number of overflow buckets
// required to insert the median number of elements
// used with this value of b.
nbuckets += bucketShift(b - 4)
sz := t.bucket.size * nbuckets
up := roundupsize(sz)
if up != sz {
nbuckets = up / t.bucket.size
}
}

2.2.扩容复制

当map扩容时,bucket数组扩充两倍大小,然后渐进式从老bucket复制至新bucket数组。
扩充因子(loadFactor)为6.5, Java HashMap的扩充因子为0.75(对应Golang中应该是6),不过两者差别不大

1
2
3
4
5
6
7
8
9
10
//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
// 4.00 2.13 20.77 3.00 4.00
// 4.50 4.05 17.30 3.25 4.50
// 5.00 6.85 14.77 3.50 5.00
// 5.50 10.55 12.94 3.75 5.50
// 6.00 15.27 11.67 4.00 6.00
// 6.50 20.90 10.79 4.25 6.50
// 7.00 27.14 10.15 4.50 7.00
// 7.50 34.03 9.73 4.75 7.50
// 8.00 41.10 9.40 5.00 8.00

2.3.查找

  1. 根据key计算出hash值
  2. 如果存在old table, 首先在old table中查找,如果找到的bucket已经evacuated,转到步骤3。 反之,返回其对应的value。
  3. 在new table中查找对应的value
    其中,高8位不是用作offset的,而是用于加快key的比较的。

2.4. 插入过程分析

  1. 根据key算出hash值,进而得出对应的bucket。
  2. 如果bucket在old table中,将其重新散列到new table中,并额外迁移一个oldbucket
  3. 在bucket中,如果已经存在需要插入的key,更新其对应的value;否则查找空闲的位置。
  4. 根据table中元素的个数,判断是否grow table。
  5. 如果对应的bucket已经full,重新申请新的bucket作为overbucket。
  6. 将key/value pair插入到bucket中。

若在扩容中,key对应oldbucket存在,则先将oldbucket迁移,

有点怪的地方:先找待更新的kv位置,然后再判断是否需要扩容,需要扩容的话,会重新查找kv位置。