Skip to content
This repository has been archived by the owner on Jun 1, 2024. It is now read-only.

Periodic batches dropped because of Maximum timeout #383

Open
2 of 8 tasks
BramVader opened this issue Feb 12, 2021 · 4 comments
Open
2 of 8 tasks

Periodic batches dropped because of Maximum timeout #383

BramVader opened this issue Feb 12, 2021 · 4 comments

Comments

@BramVader
Copy link

Does this issue relate to a new feature or an existing bug?

  • Bug
  • New Feature

What version of Serilog.Sinks.Elasticsearch is affected? Please list the related NuGet package.
Serilog.Sinks.Elasticsearch 8.4.1

What is the target framework and operating system? See target frameworks & net standard matrix.

  • netStandard 2.0
  • netCore 2.0
  • netCore 1.0
  • 4.7
  • 4.6.x
  • 4.5.x

Please describe the current behavior?
I'm testing the reliability of the Elasticsearch sink, by emitting 1000000 items, using durable buffering. Shipping takes about 18 minutes (about 950 items/second). I notice that during this process sometimes the following exception occurs (in the SelfLog):

2021-02-12T12:55:56.5332286Z Exception while emitting periodic batch from Serilog.Sinks.Elasticsearch.Durable.ElasticsearchLogClient: Elasticsearch.Net.ElasticsearchClientException: Maximum timeout reached while retrying request. Call: Status code unknown from: POST /_bulk
 ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Elasticsearch.Net.HttpConnection.RequestAsync[TResponse](RequestData requestData, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Elasticsearch.Net.Transport`1.HandleElasticsearchClientException(RequestData data, Exception clientException, IElasticsearchResponse response)
   at Elasticsearch.Net.Transport`1.FinalizeResponse[TResponse](RequestData requestData, IRequestPipeline pipeline, List`1 seenExceptions, TResponse response)
   at Elasticsearch.Net.Transport`1.RequestAsync[TResponse](HttpMethod method, String path, CancellationToken cancellationToken, PostData data, IRequestParameters requestParameters)
   at Serilog.Sinks.Elasticsearch.Durable.ElasticsearchLogClient.SendPayloadAsync(List`1 payload, Boolean first)

I already changed the sinkOptions.ConnectionTimeout to a very long time (one day) instead of 5 seconds, but that does not seem to make a difference.

I'm wondering why this happens. Maybe Elasticsearch can't keep up the indexing rate and starts throttling the connection. But what seems really strange to me is that this batch will be considered "lost" and Durable buffering just starts with the next batch. In my opinion, it should never skip data because of failure - the only exception would be when the filebuffer reaches its sinkOptions.BufferFileSizeLimitBytes (configured to 100MB).

Please describe the expected behavior?
It should keep retrying to send the current batch, before starting with the next batch.
Also is there a way to configure this behavior, as it seems that sinkOptions.ConnectionTimeout does not have effect.

@mivano
Copy link
Contributor

mivano commented Feb 13, 2021

Thanks for reporting this. The durable part is however a difficult beast in this sink and I wished it was a bit more reliable. I do not have an answer why this happens, but this should not lead to data loss. I do advise that with these kinds of scenarios you are better off with a real queue implementation.

@BramVader
Copy link
Author

Thanks Michael for your quick (but rather unsatisfying) answer! I'm just wondering: Am I the only one that is trying to use durable buffering and/or are others experiencing the same problems? Is it because I'm running both Elasticsearch and my testcode in Kubernetes?
In the meantime, I'm going to try to reproduce this problem locally, running my testcode in Visual Studio and having the source code of both Elasticsearch.NET and Serilog.Sinks.Elasticsearch checked out (and using project references). Maybe I can fix it...

@mivano
Copy link
Contributor

mivano commented Feb 15, 2021

The durable implementation is a copy of the Seq one. Code bases have diverged and the amount of tests is not good enough to keep it up to date. It was suppose to be a nice feature to handle occasionally disconnected scenarios. However, I do see a lot of issues reported about this durable part.

I would rather remove this part from this sink as it contains duplicated code as well and makes maintenance harder. But their lacks a better alternative. (#254 (comment))

This sink is great at starting simple to ingest data into ES. It handles a nice set of defaults and has extension points. However, it is not 100% reliable. A high load, network issues etc can cause it to go down. In line with the Serilog principle; the logger should not impact the application (https://github.com/serilog/serilog/wiki/Reliability). This is also not an audit logger (documentation is sparse, but see here serilog/serilog#1170).

I m not a current user of the sink myself anymore, so I m more than happy if someone can help out, so if you get some ideas, please share!

@zhangzw218
Copy link

It's a very old problem and it doesn't seem to be solved. Because I'm in the same situation right now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants