Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to receive valid reponse after 3 retries #76

Open
gvanbavi opened this issue Dec 21, 2021 · 9 comments
Open

Failed to receive valid reponse after 3 retries #76

gvanbavi opened this issue Dec 21, 2021 · 9 comments

Comments

@gvanbavi
Copy link

gvanbavi commented Dec 21, 2021

Hi All,

Been using customerio-python for a fair bit in production environment. From time-time-time we would get the following error:
Cannot send email for xxxxx message id xx ** Failed to receive valid response after 3 retries.**
Check system status at http://status.customer.io.

The same payload/user would go through at another instance.

Python 3.8

Has anyone encountered this before?

Would a simple remedy be adding a retry wrapper on top?

@PedroDiSanti
Copy link

Hello, friend. I have the exact same problem that you described.

Since December 22, from time to time, I receive this same message. Were you able to fix this problem or it just stopped?

Thanks for your time.

@keeth
Copy link

keeth commented Oct 21, 2022

We're seeing this issue too! I've been asking Customer.io support about it.

@keeth
Copy link

keeth commented Oct 21, 2022

I thought it might be a thread safety issue, as we are running multithreaded environment and the CIO python client shares an instance of requests.Session and I have seen concerns elsewhere about threading issues with this configuration, but I subclassed the client to store the session in a thread local var, and that did not help, so I suspect the problem is on the server.

Also I noticed that only transactional email has this problem - track, identify and add_device never experience connection issues for us (transactional email seems to be a separate API with a different hostname).

@Riuzaky77
Copy link

Same here, the complete error message would be this , at least for me.

Last caught exception -- <class 'requests.exceptions.ConnectionError'>: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
Check system status at http://status.customer.io.
customerio.client_base.CustomerIOException: Failed to receive valid reponse after 3 retries.
    raise CustomerIOException(message)
  File "/usr/local/lib/python3.9/site-packages/customerio/client_base.py", line 40, in send_request
    resp = self.send_request('POST', self.url + "/v1/send/email", request)
    ...
During handling of the above exception, another exception occurred:
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
    raise ConnectionError(err, request=request)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 498, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    response = self.http.request(
  File "/usr/local/lib/python3.9/site-packages/customerio/client_base.py", line 31, in send_request
Traceback (most recent call last):
During handling of the above exception, another exception occurred:
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
  File "<string>", line 3, in raise_from
    six.raise_from(e, None)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
Traceback (most recent call last):
During handling of the above exception, another exception occurred:
ConnectionResetError: [Errno 104] Connection reset by peer
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
  File "<string>", line 3, in raise_from
    six.raise_from(e, None)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen

@skion
Copy link

skion commented Sep 21, 2023

We see this very regularly when using the Python SDK up to the point of it becoming unusable. This is what I believe is going on:

The APIClient defines retry parameters as follows:

def __init__(self, key, url=None, region=Regions.US, retries=3, timeout=10, backoff_factor=0.02, use_connection_pooling=True):

This is misleading since it gives the impression that API calls are being retried by default, and that these retries are configurable. However, they are not, since all calls in the client use HTTP POST methods, and the Retry() configuration by default only retries on idempotent methods. The result is that all calls to send_email() and send_push() are in fact only tried once (and not thrice as the default parameter value seems to suggest).

What's even more confusing is that the log message does print the (fake) number of retries:

Failed to receive valid response after 3 retries.

You can validate this by setting a large retries and backoff_factor and observing that the calls will not take any longer than they used to.

Judging from the presence of the parameters in the API client I assume the authors' intentions were to retry send_email() requests despite the non-idempotent side effects. Therefore my proposed fix would be to set Retry(..., allowed_methods=None) to retry on any verb.

@skion
Copy link

skion commented Sep 21, 2023

If you'd like to work around this while waiting for an upstream fix, try if this works:

        client = APIClient(
            key=api_key,
            region=Regions.EU,
            retries=5,
            backoff_factor=1.0,
        )

        def build_session_with_retries(self):
            session = super(APIClient, self)._build_session()
            session.headers["Authorization"] = "Bearer {key}".format(key=self.key)

            # Retry request a number of times before raising an exception
            # also define backoff_factor to delay each retry. Retry even on
            # non-idempotent methods.
            session.mount(
                "https://",
                HTTPAdapter(
                    max_retries=Retry(
                        total=self.retries,
                        backoff_factor=self.backoff_factor,
                        allowed_methods=None,
                    )
                ),
            )
            return session

        client._build_session = types.MethodType(
            build_session_with_retries, self.client
        )

It replaces the original _build_session method.

@keeth
Copy link

keeth commented Sep 21, 2023

Yes we came to the same conclusion about retrying POST requests. That cleared up the issue for us.

Unfortunately, emails are non-idempotent by nature, so retrying opens up the possibility of duplicate sends.

CIO support has not been very helpful here unfortunately, neither investigating the connection reset issue nor addressing the fact that retrying an API request that sends an email, without any kind of protection from duplicate sends, is a bad idea.

That said we've never had a customer complaint about duplicate emails, so could be it's just a theoretical concern.

@skion
Copy link

skion commented Sep 21, 2023

I just came across the following notes in the README:

The Customer.io Python SDK depends on the Requests library which includes urllib3 as a transitive dependency. The Requests library leverages connection pooling defined in urllib3. urllib3 only attempts to retry invocations of HTTP methods which are understood to be idempotent (See: Retry.DEFAULT_ALLOWED_METHODS). Since the POST method is not considered to be idempotent, any invocations which require POST are not retried.

It is possible to have the Customer.io Python SDK effectively disable connection pooling by passing a named initialization parameter use_connection_pooling to either the APIClient class or CustomerIO class. Setting this parameter to False (default: True) causes the Session to be initialized and discarded after each request. If you are experiencing integration issues where the cause is reported as Connection Reset by Peer, this may correct the problem. It will, however, impose a slight performance penalty as the TCP connection set-up and tear-down will now occur for each request.

I conclude that they are aware of the issues, and have a feeling they will not merge my PR in that case. If the issue exists due to a combination of connection pooling and lack of retries, and disabling connection pooling is indeed an alternative, then I would propose CustomerIO devs start to take out the parameters from the APIClient since they literally do nothing there except confuse people, and only keep them for the CustomerIO client.

I wouldn't be surprised if this is due to a mismatch in urllib's default client side connection (pool ) timeout and CustomerIO's server side connection timeout, disconnecting earlier. That explains why we're seeing it only on low volume apps that keep running, and never on e.g. batch jobs.

@samiruppaluru
Copy link

For anyone else experiencing this issue: We were experiencing this issue for a while, and by taking advantage of the parameter (released in version 1.6) @skion references in the comment above and initializing with use_connection_pooling=False we were able to resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants