-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve general performance around short URLs for instances with a large number #2112
Comments
Thanks!
Yep, I think you are right there. I'll do some testing. |
I took the liberty of adding it, hope I didn't overlook any guidelines you have while doing so. If not, it's not the biggest change and the PR can maybe provide you some ready-to-copy-paste material ;-) |
Yep, I saw it, thanks a lot 🙂. I was definitely planning to use that migration to test this. |
I have tried inserting 1M short URLs, all with a I have tried with Postgres only, I will try at least with MySQL as well. |
I was super surprised by your message at first, but I think I have an explanation: The real test would be to have 1M links but only 1k expired, and then run it? It should be almost instand with an index but will probably take 20 seconds without the index? |
Hmm, yeah, that actually makes sense. I'll try that and come back with the results. I also want to experiment a bit with short urls which are "expired" due to having reached the max amount of visits. In that case, the query requires a sub query, so it'll require a different optimization approach. |
Well, the execution times are again very similar 😅 To be honest, with just 1k expired URLs and the rest having And then, the more expired short URLs there are, the less the index makes a difference. I then tried with 100k expired short URLs from the 1M total URLs, and it took ~2s in both cases. Again, I just tested with Postgres, and it could be this difference is bigger with MySQL, specially with >=8.0 versions. I feel tempted to merge the PR anyway, with a couple changes, as it definitely doesn't make things worst, but I feel like I should test other database engines first. In the meantime, feel free to do some tests yourself, as I might be making some mistakes. |
We were planning to do some performance testing so we can do that as well. To be honest, the PR also comes from "this should have an impact" rather than from experiencing issues. Unrelated, we also noticed there's no index on the long urls. Shouldn't that be there for people who enable the "findIfExists" feature? I can imagine that becoming slow when the number of urls starts to rise. Again, just what "I would expect", haven't done any testing. We just anticipate serving several millions of links over time and we're checking if there are any reasons NOT to use shlink :-) |
That's fine. These are sometimes "easy" to spot when you have experienced it a couple of times. Your original report did not strike as something crazy and impossible to happen.
Could be helpful as well, although I suspect that feature is not extensively used. However, I did notice a bit of slowness just by listing short URLs while testing this, so that's something I want to dig into and improve, and adding more indexes might be needed. In general, I have focused on improving performance around visits more than around short URLs, as for most users, the list of visits will be exponentially larger than the short URLs one.
I cannot guarantee there's absolutely no bottlenecks, as there's always code branches which are less frequently hit and could be skipped. What I can guarantee is that I will work on improving any rough edges as soon as they are found. |
That last part is what we were looking for. Also happy to contribute. I think we'll have a slightly different usecase (high volume links, each link only visited a few times), so it might be interesting for shlink to see how it holds up under that usecase ;-). Either way, thanks for the reactiveness, much appreciated. |
I'm going to re-purpose this issue with a more general context around short URLs. I'll list the different improvements as they have been applied. |
Shlink version
4.1.0
PHP version
8.x
How do you serve Shlink
Docker image
Database engine
MySQL
Database version
8.x
Current behavior
The recently introduced DeleteExpiredShortUrlsCommand to clean up expired links is great, as we foresee our database to grow in the millions over time. However, as there is no index on the expiry date column, this command will become increasingly heavy as it has to go over every single record.
Expected behavior
An index on the expiry date column will drastically increase the speed of the DeleteExpiredShortUrlsCommand
Minimum steps to reproduce
Fill up the database with a couple of thousand/million links.
Run the DeleteExpiredShortUrlsCommand
Now add the index and compare the performance of the DeleteExpiredShortUrlsCommand
The text was updated successfully, but these errors were encountered: