Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mailcow runs into rate limit for every mail #5168

Closed
5 tasks done
cajus opened this issue Apr 5, 2023 · 51 comments
Closed
5 tasks done

Mailcow runs into rate limit for every mail #5168

cajus opened this issue Apr 5, 2023 · 51 comments
Labels

Comments

@cajus
Copy link

cajus commented Apr 5, 2023

Contribution guidelines

I've found a bug and checked that ...

  • ... I understand that not following the below instructions will result in immediate closure and/or deletion of my issue.
  • ... I have understood that this bug report is dedicated for bugs, and not for support-related inquiries.
  • ... I have understood that answers are voluntary and community-driven, and not commercial support.
  • ... I have verified that my issue has not been already answered in the past. I also checked previous issues.

Description

After the last update, my mailserver does not accept any mails and sends thousands of "rate-limit" warnings to my secondary mail address.

I've tried to disable rate limiting. It's shown as "disabled" now, but it's still rejecting mails. What can I do to debug this?

Logs:

.

Steps to reproduce:

Nothing changed

Which branch are you using?

master

Operating System:

Fedora 37

Server/VM specifications:

8G, 4 CPUs

Is Apparmor, SELinux or similar active?

yes

Virtualization technology:

none

Docker version:

23.0.2

docker-compose version or docker compose version:

v2.9

mailcow version:

2023-04a

Reverse proxy:

nginx

@cajus cajus added the bug label Apr 5, 2023
@cajus
Copy link
Author

cajus commented Apr 5, 2023

Ok. I've now manually commented everything in the rspamd rate limit configuration, and I'm getting mails again at least. Good enough for some "emergency recovery". Trying to find out more.

@fredol
Copy link
Contributor

fredol commented Apr 5, 2023

Same problem here but only for specific addresses/domains.

@immanuelfodor
Copy link

Yes, trying to send an email from SOGo, and I get a 4.7.1 Ratelimit "mailcow" exceeded error. Impossible to use the email service after the update, this should be top prio to solve.

image

@chriscroome
Copy link
Contributor

We have found an issue related to this as well.

@immanuelfodor
Copy link

@cajus what have you commented out exactly, can you give us a diff? There are 176 occurrences to ratelimit in the repository 😀

~/mailcow# grep -r -I ratelimit . | wc -l
176

@cajus
Copy link
Author

cajus commented Apr 5, 2023

@immanuelfodor I entered the rspamd container from my mailcow directory using

docker exec -it $(docker ps -qf name=rspamd) /bin/bash

and commented everything in

/etc/rspamd/modules.d/ratelimit.conf
/etc/rspamd/override.d/ratelimit.conf

using nano. After a restart it worked again (without ratelimits of course). But as there's other work to do, I didn't search for reasons yet.

Edit: which is the wrong way to do it and based on the "gnaaaaaargh how to fix this quickly" effect while search for the reason inside the container. #5168 (comment) is the real way to do it - as long as there's no upstream fix for it.

@immanuelfodor
Copy link

Thanks!

I comented everything out in just the following file, and after restarting rspamd, SOGo can now send emails again. Here is the diff

~/mailcow# git diff
diff --git a/data/conf/rspamd/override.d/ratelimit.conf b/data/conf/rspamd/override.d/ratelimit.conf
index aec1c788..2dd733ef 100644
--- a/data/conf/rspamd/override.d/ratelimit.conf
+++ b/data/conf/rspamd/override.d/ratelimit.conf
@@ -1,12 +1,12 @@
-rates {
-    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
-    to = "100 / 1s";
-    to_ip = "100 / 1s";
-    to_ip_from = "100 / 1s";
-    bounce_to = "100 / 1h";
-    bounce_to_ip = "7 / 1m";
-}
-whitelisted_rcpts = "postmaster,mailer-daemon";
-max_rcpt = 25;
-custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
-info_symbol = "RATELIMITED";
+#rates {
+#    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
+#    to = "100 / 1s";
+#    to_ip = "100 / 1s";
+#    to_ip_from = "100 / 1s";
+#    bounce_to = "100 / 1h";
+#    bounce_to_ip = "7 / 1m";
+#}
+#whitelisted_rcpts = "postmaster,mailer-daemon";
+#max_rcpt = 25;
+#custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
+#info_symbol = "RATELIMITED";
~/mailcow# docker-compose restart rspamd-mailcow

@FreddleSpl0it
Copy link
Collaborator

I have reviewed our commits but could not identify any changes that could have caused this issue. However, it seems that the problem may be related to the recent upgrade of the rspamd container to version 3.5, as they have made some modifications to the ratelimit feature.

As a possible solution, we could try to perform a hotfix by rolling back the rspamd image to a previous version.

can someone with this problem test this fix?
edit the docker-compose.yml and set

    rspamd-mailcow:
      image: mailcow/rspamd:1.93

to

    rspamd-mailcow:
      image: mailcow/rspamd:1.92

@RafaelKr
Copy link
Contributor

RafaelKr commented Apr 5, 2023

@FreddleSpl0it is rspamd "connected" to fail2ban? In this case the PR #5127 could be related to this.

Of course the rspamd ratelimit feature modifications are a viable explanation.

@FreddleSpl0it
Copy link
Collaborator

mhm, cannot really see how this could be connected to the issue. Maybe I'm missing something.
So would be great if someone could give feedback to this

I have reviewed our commits but could not identify any changes that could have caused this issue. However, it seems that the problem may be related to the recent upgrade of the rspamd container to version 3.5, as they have made some modifications to the ratelimit feature.

As a possible solution, we could try to perform a hotfix by rolling back the rspamd image to a previous version.

can someone with this problem test this fix? edit the docker-compose.yml and set

    rspamd-mailcow:
      image: mailcow/rspamd:1.93

to

    rspamd-mailcow:
      image: mailcow/rspamd:1.92

@immanuelfodor
Copy link

Reverted the change in the ratelimit file, restarted rspamd as before, and it still lets SOGo to send emails to both internal and external addresses, receiving from both internal and external also works 🤷‍♂️ What's happening here.

@immanuelfodor
Copy link

Did a docker-compose down && docker-compose up -d just to make sure there is nothing cached in rspamd, and it still works fine with the original file without anything commented out. This is very weird. Maybe it just happens after some time?

@immanuelfodor
Copy link

These rate limits definitely happened earlier, it's in the Rspamd history, so we are not dreaming:

image

But I can't reproduce it now no matter how hard I try :O

@cajus
Copy link
Author

cajus commented Apr 5, 2023

Maybe you've to run into a ratelimit once, and it doesn't recover?

@immanuelfodor
Copy link

It's still weird:

  • Never happened before for years
  • Updated to the latest version just before it happened
  • Came here to see if anybody else has the same problem, and it was the second open issue
  • Commenting out the file and/or restarting rspamd helped solving it
  • And still works fine after reverting the comment change

If it's just a coincidence, it's definitely a rare one 😀

@chriscroome
Copy link
Contributor

Maybe you've to run into a ratelimit once, and it doesn't recover?

I suspect that this might be the case as well.

@math-98
Copy link

math-98 commented Apr 5, 2023

It's still weird:

* Never happened before for years

* Updated to the latest version just before it happened

* Came here to see if anybody else has the same problem, and it was the second open issue

* Commenting out the file and/or restarting rspamd helped solving it

* And still works fine after reverting the comment change

If it's just a coincidence, it's definitely a rare one 😀

Same here, we have used our server without ever realizing that such a limit existed, on monday we have upgraded our instance and now this. And it started working again by commenting the ratelimit part in the conf.

@dannykorpan
Copy link

Same problem here. No more incoming emails.

@DerLinkman
Copy link
Member

I've repushed the image. Can someone try it again by using docker compose pull and then docker compose up -d?

@cajus
Copy link
Author

cajus commented Apr 6, 2023

Just did it, and it goes instantly into the "ratelimit" state. No Mail sending possible. Commenting everything in the ratelimit configuration makes it work again. So - whatever it is - it's not yet resolved.

@FreddleSpl0it reverting rspamd to 1.92 seems to work at the first glance.

@FreddleSpl0it
Copy link
Collaborator

Thank you for the feedback, @cajus. Currently, @DerLinkman has republished the old image, so no one should experience any issues when updating. We'll investigate further.

@erichk4
Copy link

erichk4 commented Apr 6, 2023

Hi,
we also received ratelimit warnings from the mailcow watchdog today (out of the blue), removing the "ratelimit hash" under "System -> Information -> Logs -> Ratelimits" fixed it (for now)...

@FreddleSpl0it
Copy link
Collaborator

@erichk4 did you updated today?

In my test environment, it only seems to affect mailboxes with a ratelimit set. Once the ratelimit was triggered, it doesn't reset. In Redis, the expiration of the hash you see in 'System -> Information -> Logs -> Ratelimits' was set to 1 day and 22 hours for a ratelimit of '1/1m'.

@FreddleSpl0it
Copy link
Collaborator

Could someone tell me how the ratelimit was set for the problematic mailboxes and how many recipients were attempted to be sent to?

@chriscroome
Copy link
Contributor

The server we have that was affected by this had a domain rate limited to 1 message per second. After it was triggered no email could be sent from that domain.

@FreddleSpl0it
Copy link
Collaborator

FreddleSpl0it commented Apr 6, 2023

Thanks, now I know what the issue is. It was introduced with the latest Rspamd update and has already been fixed in the master branch. We will wait for the release. In the meantime, we have republished the old image.

A possible workaround would be to avoid using '1' as the rate limit value, such as '1/1m' or '1/10d'.

@chriscroome
Copy link
Contributor

Out of interest could you link to the upstream issue that caused this?

@FreddleSpl0it
Copy link
Collaborator

If i get everything right, than this should be the fix
rspamd/rspamd@092940e

@cajus
Copy link
Author

cajus commented Apr 8, 2023

Hmm. I've still hundreds of rate-limit admin mails, and some mails are not delivered. Even with the updated rspamd image. Deactivating it again.

@evultrole
Copy link

This hit me pretty hard today. I had no rate limits set on any domains or mailboxes anywhere, but it suddenly enabled send and receive limits on 24 of my accounts for no reason, giving bogus messages about how the rate limit was set to "to" on the boxes. Other accounts were unaffected. Manually setting the system back to rspamd 1.92, doing a down, pull, and up seems to have fixed it (hopefully). It even recognized its own bogus limit hashes and removed them. A little terrifying because I didn't install the April update until Monday evening, so it's still pushing the bad version out as of this week, it seems

I don't think the double counting bug explains this, since I didn't have rate limits turned on.

@FreddleSpl0it
Copy link
Collaborator

@evultrole it seems that the affected version of Rspamd, version 3.5, is still being shipped. I just cloned a fresh mailcow and logged into Rspamd, and it showed version 3.5. Did you happen to look at the symbols added to the rate-limited emails in the Rspamd history?

@evultrole
Copy link

evultrole commented Apr 13, 2023

I'm not certain I'm looking at the right thing, but the log for the events is still there so I can check whatever you want if it will be helpful.

Is this what you're looking for?

Symbols for a rate limit on send

RATELIMITED (0) [to(RLtqzparnjyoujkrdy1ggen5re)]
DYN_RL_CHECK (0)

Symbols for a rate limit on recieve

ASN (0) [asn:22606, ipnet:13.111.0.0/16, country:US]
RATELIMITED (0) [to(RLtqzparnjyoujkrdy1ggen5re)]

That's all there is on the listings, which is quite short compared to stuff that goes through.

@FreddleSpl0it
Copy link
Collaborator

No DYN_RL symbol was added, which indicates that you have run into a global ratelimit.
This RATELIMITED (0) [to(RLtqzparnjyoujkrdy1ggen5re)] shows that you ran into the to ratelimit.
can you show your data/conf/rspamd/override.d/ratelimit.conf?

@Hindin81
Copy link

Same Situation here. Going back to:
rspamd-mailcow: image: mailcow/rspamd:1.92

solved it for me. Maybe until the next ratelimit will ocure.

@BombusAlpinus
Copy link

BombusAlpinus commented Apr 16, 2023

got the same problem with ratelimits in the last days running mailcow 2023-04a.
but only for one mailbox which gets a lot of mails forwarded from an old gmx mailbox of a user where also a lot of spam comes in. no ratelimits where set on the mailbox of the user in mailcow.

tried the provided solutions without success:

  • comment override file
  • pulled the republished rspamd image (still showed rspamd 1.93 afterwards)

only the switch back to 1.92 in docker-compose.yml solved the issue for me.
currently running docker rspamd 1.92 (rspamd 3.4) and the ratelimit issue is gone.

@evultrole
Copy link

@FreddleSpl0it My ratelimit.conf is 100% stock and has not been touched.

For more information: This is a very light use server, it sends less than 100 messages a day, mostly from copy machine scan-to-email functions, with no automated mailers interacting with it. It also only receives about 600 messages a day, including those rejected by rspamd. I can't imagine how any of these global limits could have possibly been triggered, even with each message being double counted.

rates {
    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
    to = "100 / 1s";
    to_ip = "100 / 1s";
    to_ip_from = "100 / 1s";
    bounce_to = "100 / 1h";
    bounce_to_ip = "7 / 1m";
}
whitelisted_rcpts = "postmaster,mailer-daemon";
max_rcpt = 25;
custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
info_symbol = "RATELIMITED";

@RafaelKr
Copy link
Contributor

I just noticed that max_rcpt doesn't exist anymore. Most probably it isn't causing this issue, but still I noticed it while seeing if I can find something odd in the config.
rspamd/rspamd.com#464

@FreddleSpl0it
Copy link
Collaborator

@evultrole it's not that the messages get double counted. The recipients get double counted. If you send to 50 recipients, the ratelimit to = "100 / 1s" gets hit.

I'm not entirely sure about this comment:
# Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default.
If I run the command docker-compose exec rspamd-mailcow rspamadm configdump | grep -i rates -n10, I can see that these rates are loaded.

@Lennix
Copy link

Lennix commented Apr 17, 2023

This E-Mail triggered the rate limit for me this morning. I've updated to the latest version of mailcow yesterday (the last update before that was 14 days earlier), I've never had issues with rate limiting before.

I notice the "MISSING_TO", unfortunately I can't see how many recipients the mail had. Since it's some kind of spam I guess its possible that there's a long list in CC.

This is rspamd info about the email the time it was greylisted, on the next try it was rate limited and after that the customer no longer received mail, it was all rate limited:
(over 9 minutes elapsed between the greylisted and the first rate limited mail)

BAD_REP_POLICIES (2)
MISSING_TO (2)
HAS_GOOGLE_REDIR (1)
URI_COUNT_ODD (1) [3]
FISHY_TLD (0.1) [morepleasantsolutions.site]
BAYES_HAM (-0.840597) [79.07%]
IP_REPUTATION_HAM (-0.123782) [asn: 51167(-0.12), country: DE(-0.01), ip: 62.171.181.125(0.00)]
MIME_GOOD (-0.1) [multipart/alternative, text/plain]
MX_GOOD (-0.01) []
HAS_REPLYTO (0) [[email protected]]]
ARC_NA (0)
RCVD_VIA_SMTP_AUTH (0)
ASN (0) [asn:51167, ipnet:62.171.180.0/23, country:DE]
MIME_TRACE (0) [0:+, 1:+, 2:~]
RCVD_TLS_ALL (0)
RECEIVED_SPAMHAUS_PBL (0) [188.162.43.67:received]
REPLYTO_DOM_NEQ_FROM_DOM (0)
DMARC_POLICY_ALLOW (0) [morepleasantsolutions.site, reject]
GREYLIST (0) [greylisted, Mon, 17 Apr 2023 06:51:51 GMT, new record]
REPLYTO_DN_EQ_FROM_DN (0)
MID_RHS_MATCH_FROM (0)
FROM_EQ_ENVFROM (0)
RCVD_COUNT_ONE (0) [1]
RCPT_MAILCOW_DOMAIN (0) [mycustomer.de]
ARC_SIGNED (0) [mycustomer.de:s=dkim:i=1]
R_SPF_ALLOW (0) [+a:morepleasantsolutions.site]
FROM_HAS_DN (0)
DKIM_TRACE (0) [morepleasantsolutions.site:+]
CLAM_VIRUS_FAIL (0) [failed to scan and retransmits exceed]
R_DKIM_ALLOW (0) [morepleasantsolutions.site:s=default]

@FreddleSpl0it
Copy link
Collaborator

@Lennix Thanks for the info. Then maybe we should lower the burst. At the moment, there is no burst limit specified.
But I'm not 100% sure if it's the solution.

For example, the rate limit to = "100 / 1s"; has a leak rate of 100 emails per second, but there is no explicit burst limit set. Without a burst limit, a single sender could potentially send up to 100 emails at once, filling the rate limit.

If we set the burst limit to 50, then a single sender can only send up to 50 emails at once

@shiz0
Copy link
Member

shiz0 commented Apr 18, 2023

I also just encountered this (or something similiar?).
I ran into a ratelimit when trying to send an email to two recipients at once.
Sending of the message failed. An error occurred while sending mail. The mail server responded: Ratelimit "mailcow" exceeded. Please check the message and try again.
Sending two separate copies of the mail right after one another worked fine.
Limit for that domain is set to 3 msg/minute.

@luchris
Copy link

luchris commented Apr 19, 2023

Thanks!

I comented everything out in just the following file, and after restarting rspamd, SOGo can now send emails again. Here is the diff

~/mailcow# git diff
diff --git a/data/conf/rspamd/override.d/ratelimit.conf b/data/conf/rspamd/override.d/ratelimit.conf
index aec1c788..2dd733ef 100644
--- a/data/conf/rspamd/override.d/ratelimit.conf
+++ b/data/conf/rspamd/override.d/ratelimit.conf
@@ -1,12 +1,12 @@
-rates {
-    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
-    to = "100 / 1s";
-    to_ip = "100 / 1s";
-    to_ip_from = "100 / 1s";
-    bounce_to = "100 / 1h";
-    bounce_to_ip = "7 / 1m";
-}
-whitelisted_rcpts = "postmaster,mailer-daemon";
-max_rcpt = 25;
-custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
-info_symbol = "RATELIMITED";
+#rates {
+#    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
+#    to = "100 / 1s";
+#    to_ip = "100 / 1s";
+#    to_ip_from = "100 / 1s";
+#    bounce_to = "100 / 1h";
+#    bounce_to_ip = "7 / 1m";
+#}
+#whitelisted_rcpts = "postmaster,mailer-daemon";
+#max_rcpt = 25;
+#custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
+#info_symbol = "RATELIMITED";
~/mailcow# docker-compose restart rspamd-mailcow

Where do i execute git diff? In the container or?

@immanuelfodor
Copy link

Git diff is just showing the changes. Comment out everything in your mailcow dir in this file data/conf/rspamd/override.d/ratelimit.conf and then restart rspamd

@Tavren
Copy link

Tavren commented Apr 21, 2023

Get into similar problem today. Come from nowhere, we getting some amount of spam on our alias vip what is aliased to info from about midnight we stop getting any messages addressed for vip, user info don't have any rate limit same as the domain, in logs i found lots of issues with to,to_ip_from and to_ip ratelimited, later seeing same in the RSPAMD UI. So far increasing limits from 100 to 1000 in data/conf/rspamd/override.d/ratelimit.conf and restarting container solve the issue.

rates {
    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
    to = "1000 / 1s";
    to_ip = "1000 / 1s";
    to_ip_from = "1000 / 1s";
    bounce_to = "100 / 1h";
    bounce_to_ip = "7 / 1m";
}
whitelisted_rcpts = "postmaster,mailer-daemon";
max_rcpt = 25;
custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
info_symbol = "RATELIMITED";

Mailcow version 2023-04a

@MAGICCC
Copy link
Member

MAGICCC commented Apr 21, 2023

Can you try to update to 2023-04b? The rspamd tag got reassigned to 1.92

@luchris
Copy link

luchris commented Apr 21, 2023

Can you try to update to 2023-04b? The rspamd tag got reassigned to 1.92

Is it safe to rollback to 3.4?

Some checks were not successful
1 failing, 14 successful, and 1 skipped checks

@Hindin81
Copy link

I have updated to 2023-04b.
In docker-compose.yml I have still the entry:
rspamd-mailcow: image: mailcow/rspamd:1.92
All is working currently.

@Tavren
Copy link

Tavren commented Apr 21, 2023

Can you try to update to 2023-04b? The rspamd tag got reassigned to 1.92

will do at midnight, but that problem appear after about 9 days after the installation and switch to mailcow, will take a look after update and revert to default ratelimits and see if the problem occurs in two weeks

@shiz0
Copy link
Member

shiz0 commented Apr 21, 2023

Can you try to update to 2023-04b? The rspamd tag got reassigned to 1.92

Thanks! Seems that fixed it.

I have updated to 2023-04b. In docker-compose.yml I have still the entry: rspamd-mailcow: image: mailcow/rspamd:1.92 All is working currently.

It was downgraded from 1.93: 5c025bf

@roelofz
Copy link

roelofz commented Apr 24, 2023

Hi,

I am remote, so not able to update mailcow, I am on 2023-04a.
Workaround in the web admin is to set rate limits to disabled for the domain (a bit like running naked in the city, but better that then not sending emails). this spares out commenting configs in files also.

I will try 2023-04b when I will return.
Please keep the testing up, preventing these issues!
Less updates and more stability is always preferred!

@alfonsrv
Copy link

Is it somehow possible to flush the rate limit for a certain email / domain? Or to disable getting a mail every other minute – to say every hour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests