-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-1985] Fix possible log spew in application tryAllocate #663
Conversation
pkg/scheduler/objects/application.go
Outdated
@@ -990,7 +990,7 @@ func (sa *Application) tryAllocate(headRoom *resources.Resource, preemptionDelay | |||
// the iterator might not have the node we need as it could be reserved, or we have not added it yet | |||
node := getNodeFn(requiredNode) | |||
if node == nil { | |||
log.Log(log.SchedApplication).Warn("required node is not found (could be transient)", | |||
log.Log(log.SchedApplication).Debug("required node is not found (could be transient)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use a rate-limit approach? It's still an issue if someone decides to use Debug level.
I propose a natural single msg/second limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use a rate-limit approach? It's still an issue if someone decides to use Debug level.
I propose a natural single msg/second limit.
I've added a RateLimitedLogger in the latest commit, appreciated for the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think limiting the output to one message/sec is better. This is acceptable as a quick fix, but let's do this properly.
006be0c
to
6eceecf
Compare
pkg/log/rate_limited_logger.go
Outdated
} | ||
|
||
// visible for testing | ||
func TestingRateLimitedLog(logger *zap.Logger, every time.Duration) *rateLimitedLogger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
- rename to sth like
getTestRateLimitedLog()
- move it to
rate_limited_logger_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, addressed comments in 0376571
pkg/log/rate_limited_logger_test.go
Outdated
// this won't last over one minute, assert there is only one record in log file | ||
for i := 0; i < 10000; i++ { | ||
logger.Info("YuniKorn") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest a different approach here, we don't know how much time it takes to run 10000 iterations.
For example, invoke logger.Info("Yunikorn")
in every 100ms for 2 seconds. That means 20 calls in total. Then you check the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy that !
pkg/log/rate_limited_logger_test.go
Outdated
logDir, err := os.MkdirTemp("", "log*") | ||
if err != nil { | ||
panic(err) | ||
} | ||
logFile := fmt.Sprintf("%s/log.stdout", logDir) | ||
outputPaths := []string{logFile} | ||
zapConfig := zap.NewProductionConfig() | ||
zapConfig.Encoding = "json" | ||
zapConfig.OutputPaths = outputPaths | ||
|
||
zapLogger, err := zapConfig.Build() | ||
if err != nil { | ||
panic(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing, I'd opt for in memory logging. Avoid filesystem operations if possible:
buf := bytes.Buffer{}
conf := zap.NewDevelopmentConfig()
encoderConfig := zapcore.EncoderConfig{
MessageKey: "msg",
CallerKey: "caller",
NameKey: "logger",
TimeKey: "ts",
EncodeTime: zapcore.EpochTimeEncoder,
EncodeDuration: zapcore.StringDurationEncoder,
EncodeCaller: zapcore.ShortCallerEncoder,
}
wr := bufio.NewWriter(&buf)
l := zap.New(
zapcore.NewCore(
zapcore.NewJSONEncoder(encoderConfig),
zapcore.AddSync(wr),
conf.Level,
),
)
l.Info("aaaa")
wr.Flush() // important
fmt.Println(buf.String())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that, already done logging to memory in 7d3daef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
@brandboat could you fix the lintern issue? While at it, change the timings from 2s/100ms to 500ms/10ms so it doesn't take that much time to execute. Then I'll merge. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lintern, timings.
ca11672
to
b1a3cd1
Compare
Already addressed the comments in the latest commit, many thanks ! 😄 |
What is this PR for?
Introduce rate limited logger in this pr, and applied in application#tryAllocate to avoid log spew
What type of PR is it?
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-1985
How should this be tested?
covered by unit test