Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js 进程偶现的 CPU 占用 100% 排查 (下) #65

Open
xiaoxiaojx opened this issue Sep 4, 2023 · 0 comments
Open

Node.js 进程偶现的 CPU 占用 100% 排查 (下) #65

xiaoxiaojx opened this issue Sep 4, 2023 · 0 comments
Labels
Node.js Node.js® is a JavaScript runtime built on Chrome's V8 JavaScript engine.

Comments

@xiaoxiaojx
Copy link
Owner

image

前情提要

几个月前 y 同学开发时遇到了 Node.js 进程偶现的 CPU 占用 100% 排查 的问题, 当时排查的结论是大概率为 Node.js http2 的 bug

时间回到现在 d 同学反馈近期 CPU 占用 100% 非常频繁, 虽然上次排查下来可以通过使用 http1 代替 http2 解决, 但对于 toB 的巨石应用动辄 500+ 的文件请求, 还是希望使用 http2 强大的并发能力来大幅减少 bundless 模式下大量的文件请求时长问题

好巧不巧 d 同学也是 M2, 最后排查下来与 M2 无关...

添加日志

于是下载了 d 同学当前 Node.js 16.19.1 版本的代码, 在 lldb 线程堆栈回溯的最后一个函数 OnStreamClose 中加入了下面的日志, 目的是希望知道最后代码卡死的准确位置来做下一步的分析

int Http2Session::OnStreamClose(nghttp2_session* handle,
                                int32_t id,
                                uint32_t code,
                                void* user_data) {
  Http2Session* session = static_cast<Http2Session*>(user_data);
  Environment* env = session->env();
  Isolate* isolate = env->isolate();
  HandleScope scope(isolate);
  Local<Context> context = env->context();
  Context::Scope context_scope(context);
  Debug(session, "stream %d closed with code: %d", id, code);
+ std::cout << ">>> 1" <<< std:endl;
  BaseObjectPtr<Http2Stream> stream = session->FindStream(id);
+ std::cout << ">>> 2" <<< std:endl;
  // Intentionally ignore the callback if the stream does not exist or has
  // already been destroyed
  if (!stream || stream->is_destroyed())
    return 0;
+ std::cout << ">>> 3" <<< std:endl;

  // Don't close synchronously in case there's pending data to be written. This
  // may happen when writing trailing headers.
  if (code == NGHTTP2_NO_ERROR && nghttp2_session_want_write(handle) &&
      !env->is_stopping()) {
+ 	std::cout << ">>> 4" <<< std:endl;
    env->SetImmediate([handle, id, code, user_data](Environment* env) {
+ 	  std::cout << ">>> 5" <<< std:endl;
      OnStreamClose(handle, id, code, user_data);
    });
+ 	std::cout << ">>> 6" <<< std:endl;

    return 0;
  }

  stream->Close(code);
+ std::cout << ">>> 7" <<< std:endl;

  // ...
  return 0;
}

接着把修改后的代码编译成功的二进制文件给 d 同学去复现。macOS 中编译 Node.js 运行下面两行命令即可。其他平台参考 Building Node.js

./configure
make -j4

复现成功

最后 d 同学使用刚编译的 Node.js 很快复现了问题, 与此同时启动 nopack 命令的 VSCode 控制台在快速不断循环 ♻️ 打印 1 -> 2 -> 3 -> 4 -> 6 -> 5 -> 1 的日志, 因为这个死循环的日志输出把 VSCode 的 Code Helper 进程的 CPU 也拉到了 100%

>>> 1
>>> 2
>>> 3
>>> 4
>>> 6
>>> 5
>>> 1
>>> 2
>>> 3
>>> 4
>>> 6
>>> 5

试图解决

因为上面复现时日志只能打印到 >>> 6, 仔细一看 >>> 6 上一行的代码 OnStreamClose 函数满足了 if(code == NGHTTP2_NO_ERROR && nghttp2_session_want_write(handle) && !env->is_stopping()) 条件后会再次调用自身, 故造成了死循环 ❌

于是简单修改了一下 OnStreamClose 函数, 使得即使满足了 if 条件, 也不能调用自身超过 3 次。最后编译好发现未能再复现, 问题得到解决 ✅

int Http2Session::OnStreamClose(nghttp2_session* handle,
                                int32_t id,
                                uint32_t code,
                                void* user_data,
+                               int32_t retry_num) {
  //...
  // Don't close synchronously in case there's pending data to be written. This
  // may happen when writing trailing headers.
  if (code == NGHTTP2_NO_ERROR && nghttp2_session_want_write(handle) &&
      !env->is_stopping()
+  && retry_num < 3) {
+   int32_t next_retry_num = retry_num + 1;
+   env->SetImmediate([handle, id, code, user_data, next_retry_num](Environment* env) {
+     OnStreamClose(handle, id, code, user_data, next_retry_num);
    });

    return 0;
  }
  // ...
}

最终结论

最后让 d 同学使用最新的 20.5.1 版本看能不能复现, 如果仍有问题, 就去提一个 Issue。结果是未能复现, 那么问题就锁定在了 16.19.1 ~ 20.5.1 版本之间。d 同学认真查阅了一下 Node.js Releases 日志 找到了该问题由 16.19.0 版本 node/pull/45153 改动引入, 在 16.20.0 版本 node/pull/46721 进行了回滚。故该问题为 Node.js 16.19.0、16.19.1 版本的 bug

int Http2Session::OnStreamClose(nghttp2_session* handle,
                                int32_t id,
                                uint32_t code,
                                void* user_data) {
  Http2Session* session = static_cast<Http2Session*>(user_data);
  Environment* env = session->env();
  Isolate* isolate = env->isolate();
  HandleScope scope(isolate);
  Local<Context> context = env->context();
  Context::Scope context_scope(context);
  Debug(session, "stream %d closed with code: %d", id, code);
  BaseObjectPtr<Http2Stream> stream = session->FindStream(id);
  // Intentionally ignore the callback if the stream does not exist or has
  // already been destroyed
  if (!stream || stream->is_destroyed())
    return 0;

-  // Don't close synchronously in case there's pending data to be written. This
-  // may happen when writing trailing headers.
-  if (code == NGHTTP2_NO_ERROR && nghttp2_session_want_write(handle) &&
-      !env->is_stopping()) {
-    env->SetImmediate([handle, id, code, user_data](Environment* env) {
-      OnStreamClose(handle, id, code, user_data);
-    });
-
-    return 0;
-  }

  stream->Close(code);

  // ...
  return 0;
}
@xiaoxiaojx xiaoxiaojx added the Node.js Node.js® is a JavaScript runtime built on Chrome's V8 JavaScript engine. label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Node.js Node.js® is a JavaScript runtime built on Chrome's V8 JavaScript engine.
Projects
None yet
Development

No branches or pull requests

1 participant