Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a workaround to the 5k users limit #32

Open
atomheartother opened this issue Oct 20, 2019 · 21 comments
Open

Find a workaround to the 5k users limit #32

atomheartother opened this issue Oct 20, 2019 · 21 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@atomheartother
Copy link
Owner

atomheartother commented Oct 20, 2019

This has been stressing me out for weeks and I've been considering options, but I can't find any way to fix it. In short, using Twitter Streams, I am limited at 5 000 user subscriptions per application - as I write this QTweet is at 4400. Here's the options:

  • Move away from a stream and use polling, but that seems like suicide requests-wise and it would also involve a rewrite of basically how the entire app functions :I
  • Use Labs filterd streams but those are limited to 512 characters per rule and like 10 rules, so there's no way i can fit 5000 users in there.
  • Use Account Activity. Account Activity will cost me money per subscription though so I need to monetize QTweet.
  • Powertrack API is also a similar idea, it will cost me money D:
  • Put a hard limit on the number of subscriptions per server and only allow any higher behind a paywall. This only slows the problem down though.
  • Use webhooks: This seems like a good option, I am currently looking into it.

For now I will add an error message when someone tries to subscribe to new people when we've reached 5000 users. Yup, we're there soon.

@atomheartother atomheartother added enhancement New feature or request help wanted Extra attention is needed labels Oct 20, 2019
@atomheartother
Copy link
Owner Author

Thoughts:

I feel like a vast majority of subscriptions on qtweet are for content, media, that kind of stuff, where it doesn't really matter if it's posted with a delay. So I could just forget about real time streaming and move to fetching the user's latest tweets every few hours, and do this for every user. This frees up my 5000 users limit, and instead I'm now limited to 100 000 requests per day - with my current 5000 users, that's basically 1 request per hour, though as I get more users this will get slower.

However some people still want some fast tweets, and so I'll allow people to use a flag called --realtime, off by default. That flag will put you on the old (current) realtime system. HOWEVER, you're limited to say 1 or 2 realtime streams per server. If you want more, you have to pay (not sure how much yet). Now what does this do:

  • For the vast majority of subscriptions, it doesn't matter since most people have like 2 or 3 accounts they really care about.
  • Some people are gonne stop using QTweet and go to alternatives, which is fine since my main problem is i have too many users lol
  • If enough people want fast things and care enough to pay, then I can have enough income to allow for more realtime tweets.

I can use an exponential backup algorithm & not too dumb delays to minimize requests and to minimize delay between a tweet being posted and qtweet posting it to discord...

Food for thought

@trodiz
Copy link

trodiz commented Jan 11, 2020

Do forgive me if this is a dumb suggestion (it probably is), but would it be possible to cycle through subscriptions to work around this limitation. What I mean by that is as soon as you hit the upper limit (5000?) unsubscribe from one user and subscribe to the new one. Fetch their latest tweets and then move on to the next user. Keep cycling through all the users you have to serve to... like a round-robin scheduler.

Does that make any sense?

ps: I just stumbled upon this project. I am trying to host this service myself like you recommended, so I applied for a twitter dev account and is currently awaiting approval.

@franchesf
Copy link

How much does it cost to move to another solution and have real-time posting + infinite users?

@atomheartother
Copy link
Owner Author

Another solution... Than twitter?

@franchesf
Copy link

Another solution... Than twitter?

No no! Solution for the 5k limit.

@atomheartother
Copy link
Owner Author

No no! Solution for the 5k limit.

There are no other solutions except for the Enterprise API. The Enterprise API is far, far from our budget, Twitter won't even respond to my queries about it but either way it's a thing you negotiate on a project to project basis, not a program you can just join.

@franchesf
Copy link

No no! Solution for the 5k limit.

There are no other solutions except for the Enterprise API. The Enterprise API is far, far from our budget, Twitter won't even respond to my queries about it but either way it's a thing you negotiate on a project to project basis, not a program you can just join.

Got it! That's a shame - i would've loved to become your patreon if it was more affordable.
I was looking for something like this for years now as IFTTT is not that reliable.

Well good luck!

@atomheartother
Copy link
Owner Author

atomheartother commented Jan 13, 2020

Do forgive me if this is a dumb suggestion (it probably is), but would it be possible to cycle through subscriptions to work around this limitation. What I mean by that is as soon as you hit the upper limit (5000?) unsubscribe from one user and subscribe to the new one. Fetch their latest tweets and then move on to the next user. Keep cycling through all the users you have to serve to... like a round-robin scheduler.

I forgot to respond to this. This isn't such a bad idea (I actually hadn't thought of it!) but it would be a mess for multiple reasons, first of all Twitter limits how often you can re-register a stream, so I couldn't rotate it too often. Second of all, while this is happening, tweets would be lost, I mean that while I am not subscribd to tweets from an account, if that account tweets, I lose the tweet forever. This is a pain and not acceptable for the end users. Finally this is still a temporary solution as eventually we'd hit a limit where I'd have like an hour between every window of realtime subscription, and again i'd be stuck no matter how much money or effort I could throw at twitter at this point.

@trodiz
Copy link

trodiz commented Jan 13, 2020

while I am not subscribd to tweets from an account, if that account tweets, I lose the tweet forever.

Is it a requirement that the bot must be subscribed to a user all the time to get their latest tweets? You probably already know about this, but according to this documentation you can request latest-tweets with this since_id paramter. So even if you miss a set of live tweets, you can fetch them later when you come back to that user. There would be a delay ofcourse, but I don't see how the bot would lose a tweet forever.

I know I'm overlooking something here... 🤔

@atomheartother
Copy link
Owner Author

@trodiz Yes but I'm limited to 900 calls to that endpoint per 15min window. With 5k users that is simply not manageable :(

@atomheartother
Copy link
Owner Author

Huh, I had however never seen the lists API, is this new?
It's limited to 5 000 users per list, I wonder if I could use this, I'll read into it.

@atomheartother
Copy link
Owner Author

Ok, first day of experimentation with lists. Here's my results:

Lists are overall a GREAT use case for this problem. I can have 5000 users per list and 1000 lists per account, so that totals 5M subscriptions. I can check a list every 1s, so overall it's pretty damn fast. So I made a small branch to test things out.

The problem is state management. So far QTweet is perfectly stateless, you can deploy her anywhere and all the info required is stored on her side. Here, lists are stored by Twitter and that causes a lot of problems, not the least of which is that if I clear out my entire database locally, QTweet still has those lists registered on Twitter and the endpoints to manage lists seem pretty rate-limited.

I am still experimenting however despite the state manegement problem lists seem to be a pretty good option.

@Globlonux01
Copy link

I choose this option :
Put a hard limit on the number of subscriptions per server and only allow any higher behind a paywall.

Limit per server : 10 max.

@Furry
Copy link

Furry commented Feb 23, 2020

@Globlonux01 mentioned a 10 server max, but even then that seems like far too much. I do have a few ideas though...

  1. An approach that may work, though would require it's own bit of recoding, would be to allow users to submit their own bearer token to the bot, then specify if they'd like that token to be used only for private or public use. It may have to spawn a new docker/process per token though, which is a major downside unless you can create a generator for each token/server.

  2. Multiple apps/accounts that you own, whos tokens feed into the bot, so if it caps out one token, it can move on to the next.

  3. You can have two separate categories. One for streaming posts, and one for polling. Only one streaming account would be allowed per server, and everything else would be on an hourly/daily poll. This seems like the best option, but would require you to migrate existing servers to polling, which might be a chore.

But since i'm using this bot now, if you need help with any issues/want help working on this particular thing, just @ me in an issue and i'll help out.

@atomheartother
Copy link
Owner Author

atomheartother commented Feb 23, 2020

An approach that may work, though would require it's own bit of recoding, would be to allow users to submit their own bearer token to the bot, then specify if they'd like that token to be used only for private or public use. It may have to spawn a new docker/process per token though, which is a major downside unless you can create a generator for each token/server.

I actually have been thinking of something somewhat similar to this, using docker swarm and running other instances of the bot as slaves to the master node, using separate tokens provided by users for each one.

That would indeed require a pretty deep rewrite and a bunch of code dedicated to managing the different nodes, but it could be done. The point being i'd need to separate QTweet into 2 programs, one that's a front-posting Discord bot and the other that's a swarm of bots that receive their orders from the other one, and all subscribe to different twitter users using user-provided tokens.

While that SOUNDS appealing, i'm not sure of how much this complies with the twitter TOS, also spawning new instances of a service on the fly is definitely not something i'm super familiar with and this is getting into big boy devops territory. But also it would be pretty cool. I could even go for a microservices approach... Anyway I'm keeping it in mind but right now my bets are on the lists API thing.

@atomheartother
Copy link
Owner Author

ALSO keep in mind twitter is gonna EOLthe Streaming API I use sometime soon so i'm definitely not gonna rely on that if I'm doing a deep rewrite of my bot haha

@Furry
Copy link

Furry commented Feb 23, 2020

Alright! Lists seems like the easiest and most twitter-friendly approach anyway :)

@atomheartother
Copy link
Owner Author

New approach.

I'll be implementing the !!list command (which allows you to follow a list) completely separately from !!start, this'll give me some time to debug list-related issues and also unblock the 5k limit for the time being.

@SteadEXE
Copy link

SteadEXE commented Dec 3, 2020

Hello,

I just joined the thread, do you think it's possible to ask bot users to create their own API key, so the bot manages the tweets subscription with the server's API key? Or maybe, put all the user API keys in a pool, so unused queries can be used by others members.

it's just an idea, maybe a bad idea, but who knows...

@ebergstedt
Copy link

ebergstedt commented Feb 24, 2021

Cycling API keys is probably a trap, I'm sure Twitter would correlate api key to IP origination and detect cycling. Twitter really dislikes bots abusing or going around their limitations. Each api key would need to have a dedicated IP origination per request, which means you'd need to set up an instance (in a cluster of machines) for every 5k multiplier, which is $5 on digitalocean. You'd need need use a gateway app to connect to your api instances.

I'm very happy with your instructions for self-host and docker, and I'm sure many are as well. The above project would require fundamental infrastructure changes which would probably break the ease of use the self-hosted design you've made now, so it'd have to be a separate project.

@GoosePlays20
Copy link

@atomheartother send me your discord so i can dm you, i can get in touch with bot devs that can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants