Request Buffer #75

vermapratyush · 2017-09-26T15:12:01Z

Adds a buffer on requests before throwing ErrMaxConcurrency.
The Timeout is still adhered to, timer starts when the request is submitted to hystrix-go, not when execution starts. This basically implements MaxQueueSize as present in the Netflix's Hystrix

The default value of MaxQueueSize is 50 (5 * DefaultMaxConcurrency), although can be overridden when initialising circuit.

In addition to the request buffer, the PR includes a different way to solving for #67 . It uses channels instead of sync.Once as it makes the Go() function simpler. Also some go-lint fixes.

We have been running this in production at GrabTaxi for a while, and there seems to be no side effects.

Adds a buffer on requests before throwing ErrMaxConcurrency The Timeout is still adhered to, timer starts when the request is submitted, not when execution starts.

tonyghita · 2017-10-10T04:00:04Z

hystrix/eventstream.go

 	})
 	if err != nil {
 		return err
 	}
-	err = sh.writeToRequests(eventBytes)
+	_ = sh.writeToRequests(eventBytes)


Should probably return sh.writeToRequests(eventBytes) instead of ignoring the error.

tonyghita

This looks really promising! I hope it gets merged soon.

tonyghita · 2017-10-10T04:07:56Z

.gitignore

@@ -1 +1,3 @@
-.vagrant
+*.iml


The IDE-specific ignore lines probably belong in your global .gitignore: https://help.github.com/articles/ignoring-files/#create-a-global-gitignore

tonyghita · 2017-10-10T04:09:02Z

README.md

+	Timeout:                     1000,
+	MaxConcurrentRequests:       100,
+	ErrorPercentThreshold:       25,
+	QueueSizeRejectionThreshold: 100,


How would you pick a good value for QueueSizeRejectionThreshold?

The default value of QueueSizeRejectionThreshold is currently equal to the MaxConcurrentRequests. This should take care of request spike which is 2x the usual.
2x seems to be a decent number, although in some of the use-cases I have seen 3x-4x as well (in my workplace).
Netflix also use a number equal to MaxConcurrentRequests, although I am open to suggestion for a better default value.

tamccall · 2017-11-10T19:08:44Z

hystrix/metric_collector/metric_collector.go

@@ -45,6 +45,8 @@ func (m *metricCollectorRegistry) Register(initMetricCollector func(string) Metr
 type MetricCollector interface {
 	// IncrementAttempts increments the number of updates.
 	IncrementAttempts()
+	// IncrementQueueSize increments the number of elements in the queue.
+	IncrementQueueSize()


This would break any external implementations of this interface. Should probably start managing releases as suggested by #70 before merging this

afex

first let me thank you for this patch. it is a welcome change and i appreciate the time you've spent on it. i'm sorry i took so long to tackle this review.

before i can merge it, however, you need to address the regression presented by the addition of the queued event, as well as the default value for the queue size.

afex · 2017-12-19T23:22:48Z

hystrix/settings.go

@@ -80,12 +86,18 @@ func ConfigureCommand(name string, config CommandConfig) {
 		errorPercent = config.ErrorPercentThreshold
 	}

+	queueSizeRejectionThreshold := DefaultQueueSizeRejectionThreshold


your PR comment mentions that the default is the same as MaxConcurrentRequests, but in fact it is statically set to 50 here even if the user provides a different concurrency setting.

i like the idea of having the queue size (if unset) be equal to the concurrency setting, which would change the code here to remove DefaultQueueSizeRejectionThreshold and replace it with max

afex · 2017-12-19T23:23:42Z

hystrix/settings.go

@@ -16,14 +16,18 @@ var (
 	DefaultSleepWindow = 5000
 	// DefaultErrorPercentThreshold causes circuits to open once the rolling measure of errors exceeds this percent of requests
 	DefaultErrorPercentThreshold = 50
+	// DefaultQueueSizeRejectionThreshold reject requests when the queue size exceeds the given limit
+	DefaultQueueSizeRejectionThreshold = DefaultMaxConcurrent * 5


recommend removing this default based on other comment in this file

afex · 2017-12-19T23:25:25Z

hystrix/metrics.go

@@ -95,6 +95,9 @@ func (m *metricExchange) IncrementMetrics(wg *sync.WaitGroup, collector metricCo
 		collector.IncrementAttempts()
 		collector.IncrementErrors()
 	}
+	if update.Types[0] == "queued" {
+		collector.IncrementQueueSize()
+	}


as described in my comment in hystrix.go, this won't work to accurately track the event rate. a types list of [ queued, failure, fallback-success ] should apply all of:

collector.IncrementQueueSize() collector.IncrementFailures() collector.IncrementAttempts() collector.IncrementErrors() collector.IncrementFallbackSuccesses()

afex · 2017-12-19T23:28:48Z

hystrix/metric_collector/metric_collector.go

@@ -45,6 +45,8 @@ func (m *metricCollectorRegistry) Register(initMetricCollector func(string) Metr
 type MetricCollector interface {
 	// IncrementAttempts increments the number of updates.
 	IncrementAttempts()
+	// IncrementQueueSize increments the number of elements in the queue.
+	IncrementQueueSize()


"queue size/length" does not seem like an accurate name since this is better stated as "rate at which executions were queued", or "number of queued events over a time window"

I have changed the name to IncrementQueuedItem. I tried to keep it in sync with other function names like IncrementAttempts. Let me know if you think of some other function name to be more appropriate.

afex · 2017-12-19T23:36:56Z

hystrix/hystrix.go

-				returnTicket()
+			select {
+			case t := <-circuit.executorPool.WaitingTicket:
+				cmd.reportEvent("queued")


adding a new event type is problematic here. i agree that we should track the rate at which executions are queued, but there is a current assumption being made about the event list for an execution which no longer holds true here.

currently the events []string field of a command is assumed to contain data in a format of:

[ success|failure|rejected|short-circuit|timeout, fallback-success|fallback-failure ]

for example, an events slice containing [ failure, fallback-success ] indicates the execution failed but the fallback did not. changing this to [ queued, failure, fallback-success ] makes sense (execution was queued, then failed, then fell back successfully) but other parts of the code assume the first element in the list indicates the run function's result. this changes that and will break stats reporting as well as closing a circuit after a success.

in order to add this queued event, you'll need to change CircuitBreaker.ReportEvent and metricExchange.IncrementMetrics to account for this.

Nice catch, I have incorporated the change by using a map structure to pass around the events.

vermapratyush · 2018-01-14T09:38:48Z

@afex Thanks for the review. I have made the required changes in the PR.

In order to address the regression for buffer queue, I have added comment in loadtest/README.md apart from the unit test which asserts queue length. I have just mentioned that increasing the concurrency in bench tool should validate the buffer implementation.

Changes:

Refactor function name IncrementQueuedItem
Fix bug in events being reported.
Incorporate changes related to queue size.
Added comment to run regression for queue length.

vermapratyush added 7 commits September 26, 2017 22:50

Request buffer

f5da31c

Adds a buffer on requests before throwing ErrMaxConcurrency The Timeout is still adhered to, timer starts when the request is submitted, not when execution starts.

Change in import path to contribute to afex/go-hystrix

a2a7df0

Updated README

56c4816

Updated datadog_collector

e01e6d5

Update statsD and graphite implementation with updated metric_collector

968d05c

Remove flaky test

5ef977f

fix flaky test and goimport

fa3c586

tonyghita reviewed Oct 10, 2017

View reviewed changes

Code review changes

2410870

vermapratyush force-pushed the afex-contrib branch from 94ffb27 to 2410870 Compare October 17, 2017 09:05

tamccall reviewed Nov 10, 2017

View reviewed changes

tamccall mentioned this pull request Nov 18, 2017

Treat bad requests separately from other errors myteksi/hystrix-go#9

Closed

afex requested changes Dec 19, 2017

View reviewed changes

vermapratyush added 2 commits January 14, 2018 17:23

Incorporate code review comment

845cbad

Fix unit test

f3c8dca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Buffer #75

Request Buffer #75

vermapratyush commented Sep 26, 2017 •

edited

Loading

tonyghita Oct 10, 2017

vermapratyush Oct 17, 2017

tonyghita left a comment

tonyghita Oct 10, 2017

vermapratyush Oct 17, 2017

tonyghita Oct 10, 2017

vermapratyush Oct 17, 2017

tamccall Nov 10, 2017

afex left a comment

afex Dec 19, 2017

vermapratyush Jan 14, 2018

afex Dec 19, 2017

afex Dec 19, 2017

afex Dec 19, 2017

vermapratyush Jan 14, 2018

afex Dec 19, 2017

vermapratyush Jan 14, 2018

vermapratyush commented Jan 14, 2018 •

edited

Loading

		@@ -1 +1,3 @@
		.vagrant
		*.iml

Request Buffer #75

Are you sure you want to change the base?

Request Buffer #75

Conversation

vermapratyush commented Sep 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonyghita left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vermapratyush commented Jan 14, 2018 • edited Loading

vermapratyush commented Sep 26, 2017 •

edited

Loading

vermapratyush commented Jan 14, 2018 •

edited

Loading