Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1985] Fix possible log spew in application tryAllocate #663

Closed
wants to merge 5 commits into from

Conversation

brandboat
Copy link
Member

@brandboat brandboat commented Sep 21, 2023

What is this PR for?

Introduce rate limited logger in this pr, and applied in application#tryAllocate to avoid log spew

What type of PR is it?

  • - Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-1985

How should this be tested?

covered by unit test

@@ -990,7 +990,7 @@ func (sa *Application) tryAllocate(headRoom *resources.Resource, preemptionDelay
// the iterator might not have the node we need as it could be reserved, or we have not added it yet
node := getNodeFn(requiredNode)
if node == nil {
log.Log(log.SchedApplication).Warn("required node is not found (could be transient)",
log.Log(log.SchedApplication).Debug("required node is not found (could be transient)",
Copy link
Contributor

@pbacsko pbacsko Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a rate-limit approach? It's still an issue if someone decides to use Debug level.

I propose a natural single msg/second limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a rate-limit approach? It's still an issue if someone decides to use Debug level.

I propose a natural single msg/second limit.

I've added a RateLimitedLogger in the latest commit, appreciated for the comment.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think limiting the output to one message/sec is better. This is acceptable as a quick fix, but let's do this properly.

@brandboat brandboat force-pushed the YUNIKORN-1985 branch 2 times, most recently from 006be0c to 6eceecf Compare October 25, 2023 11:00
}

// visible for testing
func TestingRateLimitedLog(logger *zap.Logger, every time.Duration) *rateLimitedLogger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

  • rename to sth like getTestRateLimitedLog()
  • move it to rate_limited_logger_test.go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, addressed comments in 0376571

Comment on lines 43 to 46
// this won't last over one minute, assert there is only one record in log file
for i := 0; i < 10000; i++ {
logger.Info("YuniKorn")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a different approach here, we don't know how much time it takes to run 10000 iterations.

For example, invoke logger.Info("Yunikorn") in every 100ms for 2 seconds. That means 20 calls in total. Then you check the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy that !

Comment on lines 68 to 64
logDir, err := os.MkdirTemp("", "log*")
if err != nil {
panic(err)
}
logFile := fmt.Sprintf("%s/log.stdout", logDir)
outputPaths := []string{logFile}
zapConfig := zap.NewProductionConfig()
zapConfig.Encoding = "json"
zapConfig.OutputPaths = outputPaths

zapLogger, err := zapConfig.Build()
if err != nil {
panic(err)
}
Copy link
Contributor

@pbacsko pbacsko Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing, I'd opt for in memory logging. Avoid filesystem operations if possible:

	buf := bytes.Buffer{}
	conf := zap.NewDevelopmentConfig()
	encoderConfig := zapcore.EncoderConfig{
		MessageKey:     "msg",
		CallerKey:      "caller",
		NameKey:        "logger",
		TimeKey:        "ts",
		EncodeTime:     zapcore.EpochTimeEncoder,
		EncodeDuration: zapcore.StringDurationEncoder,
		EncodeCaller:   zapcore.ShortCallerEncoder,
	}

	wr := bufio.NewWriter(&buf)
	l := zap.New(
		zapcore.NewCore(
			zapcore.NewJSONEncoder(encoderConfig),
			zapcore.AddSync(wr),
			conf.Level,
		),
	)
	l.Info("aaaa")
	wr.Flush()   // important
        fmt.Println(buf.String())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, already done logging to memory in 7d3daef

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@pbacsko
Copy link
Contributor

pbacsko commented Oct 26, 2023

@brandboat could you fix the lintern issue? While at it, change the timings from 2s/100ms to 500ms/10ms so it doesn't take that much time to execute. Then I'll merge. Thanks.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lintern, timings.

@brandboat
Copy link
Member Author

brandboat commented Oct 26, 2023

Already addressed the comments in the latest commit, many thanks ! 😄

@pbacsko pbacsko closed this in 187a7ce Oct 26, 2023
@brandboat brandboat deleted the YUNIKORN-1985 branch October 26, 2023 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants