.heapsnapshot #67

xiaoxiaojx · 2023-10-26T16:53:31Z

如何获取一个快照文件

如果是 Node.js 环境参考官方文档 v8.writeHeapSnapshot([filename])

const { writeHeapSnapshot } = require('node:v8');
const filename = writeHeapSnapshot();

console.log(`main thread heapdump: ${filename}`);

如果是浏览器环境则点击 Chrome Devtools Memory 面板下的 Take snapshot 按钮

快照文件预览

当我们获取到 xxx.heapsnapshot 文件后通过编辑器打开预览其实是一个 JSON 对象

{
  "snapshot": {
    "meta": {
      "node_fields": ["type", "name", "id", "self_size", "edge_count", "trace_node_id", "detachedness"],
      "node_types": [
        ["hidden","array", "string", "object", "code", "closure", "regexp", "number", "native", "synthetic", "concatenated string", "sliced string", "symbol", "bigint", "object shape"],
        "string",
        "number",
        "number",
        "number",
        "number",
        "number"
      ],
      "edge_fields": ["type", "name_or_index", "to_node"],
      "edge_types": [
        ["context", "element", "property", "internal", "hidden", "shortcut", "weak"],
        "string_or_number",
        "node"
      ],
      "trace_function_info_fields": [],
      "trace_node_fields": [],
      "sample_fields": [],
      "location_fields": []
    },
    "node_count": 86352,
    "edge_count": 348827,
    "trace_function_count": 0
  },
  "nodes": [9, 1, 1, 0, 6, 0, 0
  , 9, 2, 3, 0, 25, 0, 0,
   9, 3, 5, 0, 0, 0, 0
   // 省略...
],
  "edges": [1, 1, 7
  , 5, 2916, 21742
  , 5, 2917, 21812
  , 5, 2918, 22008
  // 省略...
],
  "strings": ["<dummy>", "", "(GC roots)", "(Bootstrapper)", "(Builtins)" // 省略... ],
  "locations": [109032, 2, 0, 0
  , 136794, 3, 9, 35
  , 136808, 3, 9, 35
  // 省略...
]
}

快照文件解析

.heapsnapshot 文件主要记录的是当前页面/进程的内存快照信息，比如你的代码中有如下对象

const d = {}
const b = {
    d
}
const c = {
    b,
    d
}
const a = {
    b,
    c
}

它们之间的引用关系用 Graph 描述就类似下面这样

Object a, b, c, d 就相当于 Graph 的节点 Node，对象之间的引用关系就相当于 Graph 的边 Edge

确实我们在 .heapsnapshot 文件发现了 nodes 与 edges 数据，但数据却需要解密才能够理解

比如 nodes 数组的数据是下面这样，每一项都是数字

{
  "nodes": [9, 1, 1, 0, 6, 0, 0
  , 9, 2, 3, 0, 25, 0, 0,
   9, 3, 5, 0, 0, 0, 0
   // 省略...
]
}

SerializeNodes

此时我们需要阅读 v8 源码了解一个 Graph 的节点 Node 是如何进行的序列化。发现一个 Node 节点会依次记录它的 type、name、id、self_size、children_count、trace_node_id、detachedness 的值，每个 Node 节点的结束写入一个 \n\0 的换行。如上的 nodes 数组换行了 3 次故包含 3 个 Node 节点

// v8/src/profiler/heap-snapshot-generator.cc

void HeapSnapshotJSONSerializer::SerializeNode(const HeapEntry* entry) {
  // The buffer needs space for 5 unsigned ints, 1 size_t, 1 uint8_t, 7 commas,
  // \n and \0
  static const int kBufferSize =
      5 * MaxDecimalDigitsIn<sizeof(unsigned)>::kUnsigned +
      MaxDecimalDigitsIn<sizeof(size_t)>::kUnsigned +
      MaxDecimalDigitsIn<sizeof(uint8_t)>::kUnsigned + 7 + 1 + 1;
  base::EmbeddedVector<char, kBufferSize> buffer;
  int buffer_pos = 0;
  if (to_node_index(entry) != 0) {
    buffer[buffer_pos++] = ',';
  }
  buffer_pos = utoa(entry->type(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(GetStringId(entry->name()), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(entry->id(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(entry->self_size(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(entry->children_count(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(entry->trace_node_id(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(entry->detachedness(), buffer, buffer_pos);
  buffer[buffer_pos++] = '\n';
  buffer[buffer_pos++] = '\0';
  writer_->AddString(buffer.begin());
}

那么一个 Node 包含的数据用 JavaScript 表示类似如下

class Node {
    constructor(type, name, id, self_size, children_count, trace_node_id, detachedness) {
        this.type = type;
        this.name = name;
        this.id = id;
        this.self_size = self_size;
        this.children_count = children_count;
        this.trace_node_id = trace_node_id;
        this.detachedness = detachedness;
    }
}

上面的 nodes 数组解密后就类似如下

{
  "nodes": [new Node(9, 1, 1, 0, 6, 0, 0)
  , new Node(9, 2, 3, 0, 25, 0, 0)
  , new Node(9, 3, 5, 0, 0, 0, 0)
   // 省略...
]
}

此时 Node 的所有属性 type、name 都是数字可读性依然很差，那么需要继续进行解密。通过继续阅读源码可知

name: 1 这里的 name 通常是构造函数的 name，1 其实也是数组的索引，对应的值是 strings[1]，于是得出了第一个 Node 的 name 是 ""
id: id 只是序列化时一个自增长的数字
self_size: 该 Node 的自身所占内存大小，不包含它引用的对象
children_count: 该 Node 节点引用的对象的数量
...

SerializeEdges

上面我们已经解密了 nodes 字段，接下来看看 Graph 的边 Edge。发现一个 Edge 会依次记录它的 type、name_or_index、to 的值，每个 Edge 节点的结束也会写入一个 \n\0 的换行

// v8/src/profiler/heap-snapshot-generator.cc

void HeapSnapshotJSONSerializer::SerializeEdge(HeapGraphEdge* edge,
                                               bool first_edge) {
  // The buffer needs space for 3 unsigned ints, 3 commas, \n and \0
  static const int kBufferSize =
      MaxDecimalDigitsIn<sizeof(unsigned)>::kUnsigned * 3 + 3 + 2;
  base::EmbeddedVector<char, kBufferSize> buffer;
  int edge_name_or_index = edge->type() == HeapGraphEdge::kElement ||
                                   edge->type() == HeapGraphEdge::kHidden
                               ? edge->index()
                               : GetStringId(edge->name());
  int buffer_pos = 0;
  if (!first_edge) {
    buffer[buffer_pos++] = ',';
  }
  buffer_pos = utoa(edge->type(), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(edge_name_or_index, buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(to_node_index(edge->to()), buffer, buffer_pos);
  buffer[buffer_pos++] = '\n';
  buffer[buffer_pos++] = '\0';
  writer_->AddString(buffer.begin());
}

对于一个 Edge 其实是需要具备 from 与 to 的 Node 节点，比如例子中的 Object(a) 引用了 Object(b)，此时 from 是 a, to 是 b，但上面记录却只包含 to?

要正确找到 from 对应的 Node 节点，就需要借助于 Node 节点的 children_count 属性的值。比如上面解密的

第一个 Node 节点的 children_count 为 6，那么 edges 数组中 0 ～ 5 个 Edge 的 from 都是第一个 Node
第二个 Node 节点的 children_count 为 25，那么 edges 数组中 6 ～ 30 个 Edge 的 from 都是第二个 Node
第三个 Node 节点的 children_count 为 0，那么 edges 数组中没有 from 是该 Node 节点，直接跳过
...

用 JavaScript 表示类似如下

class Edge {
    constructor(type, name_or_index, to) {
        this.type = type;
        this.name_or_index = name_or_index;
        this.to = to;
    }
}

上面的 edges 数组解密后就类似如下

{
  "edges": [new Edge(1, 1, 7) // this.from = Node1
  , new Edge(5, 2916, 21742) // this.from = Node1
  , new Edge(5, 2917, 21812) // this.from = Node1
  , new Edge(5, 2918, 22008) // this.from = Node1
  // 省略...
]
}

type: 为数组索引，比如第一个 Edge 的 type 为 1 找到 snapshot.meta.edge_types[0][1] 的值是 element，所有的 type 包括
- context: 表示一个JavaScript对象（通常是函数或闭包）与其上下文变量之间的关系。上下文变量是在函数作用域内声明的变量
- element: 表示一个对象是数组，并且该边表示数组的元素。通常用于表示数组元素之间的关系
- property: 表示一个对象的普通属性（non-indexed property）。这是指通过对象的名称访问的属性，而不是通过数组索引访问的属性
- internal: 表示一个对象的内部（内置）属性。内部属性是对象的系统级属性，通常不可见或不可枚举
- shortcut: 表示一个对象的“快捷方式”引用。在堆图中，有时候可以使用快捷方式直接引用到对象，而不需要沿着一条正常的引用路径hidden: 表示一个对象的隐藏属性。隐藏属性是对象的附加属性，通常在JavaScript代码中不可见，但在内部使用
- weak: 表示一个弱引用关系。弱引用不会阻止对象被垃圾回收，当没有强引用指向对象时，对象可以被回收
edge_name_or_index:
- 比如第一个 Edge 的 type 为 1 是 element, element 表示 to 是 from 数组的其中一个元素，那么此时的 1 为 to 在 from 数组中的索引，即 from[1] = to
- 如果该 Edge 的 type 为 2 找到 snapshot.meta.edge_types[0][2] 的值是 property, property 表示 to 是 from 对象的一个属性，即 key = strings[edge_name_or_index]; from = {[key]: to}
to: Edge 中为 to 的 Node 节点的索引，比如第一个 Edge 的 to 为 7, 从 nodes 数组中找到索引为 7 的元素，因为一个 Node 节点在 nodes 数组中占据 7 位，那索引为 7 的元素表示的是第二个 Node 节点

from 的 children_count 与 edges 数组索引的有耦合的这层隐藏关系的代码实现见如下

// v8/src/profiler/heap-snapshot-generator.cc

void HeapSnapshot::FillChildren() {
  DCHECK(children().empty());
  int children_index = 0;
  for (HeapEntry& entry : entries()) {
    children_index = entry.set_children_index(children_index);
  }
  DCHECK_EQ(edges().size(), static_cast<size_t>(children_index));
  children().resize(edges().size());
  for (HeapGraphEdge& edge : edges()) {
    edge.from()->add_child(&edge);
  }
}

SerializeLocations

最后需要解密的是 locations 字段，它包含了 Node 节点对应的源码文件与行列，有利于我们快速定位问题代码。location 会依次记录它的 entry_index、scriptId、line、col 的值，每个 location 节点的结束也会写入一个 \n\0 的换行

// v8/src/profiler/heap-snapshot-generator.cc

void HeapSnapshotJSONSerializer::SerializeLocation(
    const SourceLocation& location) {
  // The buffer needs space for 4 unsigned ints, 3 commas, \n and \0
  static const int kBufferSize =
      MaxDecimalDigitsIn<sizeof(unsigned)>::kUnsigned * 4 + 3 + 2;
  base::EmbeddedVector<char, kBufferSize> buffer;
  int buffer_pos = 0;
  buffer_pos = utoa(to_node_index(location.entry_index), buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(location.scriptId, buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(location.line, buffer, buffer_pos);
  buffer[buffer_pos++] = ',';
  buffer_pos = utoa(location.col, buffer, buffer_pos);
  buffer[buffer_pos++] = '\n';
  buffer[buffer_pos++] = '\0';
  writer_->AddString(buffer.begin());
}

用 JavaScript 表示类似如下

class Location {
    constructor(entry_index, scriptId, line, col) {
        this.entry_index = entry_index;
        this.scriptId = scriptId;
        this.line = line;
        this.col = col;
    }
}

上面的 locations 数组解密后就类似如下

{
  "locations": [new Location(109032, 2, 0, 0)
  , new Location(136794, 3, 9, 35)
  , new Location(136808, 3, 9, 35)
// 省略...
]
}

entry_index: 对应的是 Node 节点的索引，注意一个 Node 节点在 nodes 数组中占用 7 位
scriptId: v8 生成的一个唯一 id，如果你在 Chrome Memory 中打了一个快照，通过这个 scriptId 就能找到当前正在运行的页面 v8 内存中的 script 信息，继续根据行列信息就能定位到源码
line: Node 节点的行
col: Node 节点的列

如果是从其他人电脑下载的 .heapsnapshot 文件，其携带的 scriptId 与当前页面内存的 scriptId 将无法对应，故不能定位到源码

快照文件分析工具

Chrome Devtools Memory

特点: 官方工具, 信息全面, 如果是本机下载的快照文件能够定位到源码
缺点: 信息过多难以聚焦
文档: Chrome DevTools Memory

devtoolx

特点: 缩减了大量的无用信息, 默认排序显示内存占用最大的对象, 可以认为是精细化的 Chrome Devtools Memory
缺点: 没有多个快照对比功能
文档: devtoolx

v8-profiler-rs

特点: 通过图形可视化的形式非常直观的展示内存信息, 也缩减了大量的无用信息, 能够很快找到内存占用最大的对象。并且具备多个快照对比的功能
文档: v8-profiler-rs

memlab

特点: 能够通过写测试脚本自动化抓取快照进行分析
文档: memlab

xiaoxiaojx added the C++ C++ label Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.heapsnapshot #67

.heapsnapshot #67

xiaoxiaojx commented Oct 26, 2023 •

edited

Loading

.heapsnapshot #67

.heapsnapshot #67

Comments

xiaoxiaojx commented Oct 26, 2023 • edited Loading

如何获取一个快照文件

快照文件预览

快照文件解析

SerializeNodes

SerializeEdges

SerializeLocations

快照文件分析工具

Chrome Devtools Memory

devtoolx

v8-profiler-rs

memlab

xiaoxiaojx commented Oct 26, 2023 •

edited

Loading