Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yahoo Finance Premium instituting recaptcha #254

Open
me1029134 opened this issue Dec 30, 2023 · 16 comments
Open

Yahoo Finance Premium instituting recaptcha #254

me1029134 opened this issue Dec 30, 2023 · 16 comments

Comments

@me1029134
Copy link

me1029134 commented Dec 30, 2023

Describe the bug
I believe there is some kind of recaptcha problem. It's not on all the request though maybe like half of them. Below is my error.

DevTools listening on ws://127.0.0.1:63373/devtools/browser/661f6e71-8cf3-4067-bbfe-3966923a90ab
[1230/140638.380:ERROR:gl_utils.cc(412)] [.WebGL-00001D8400E82200]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels
[1230/140640.413:ERROR:gl_utils.cc(412)] [.WebGL-00001D84002B3F00]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels
Unable to login and/or retrieve the appropriate cookies. This is most likely due to Yahoo Finance instituting recaptcha, which this package does not support.

To Reproduce
Steps to reproduce the behavior:

  1. When pulling:
    query = yq.Ticker('ASGTF', username= "UserEmail", password="PW")

I get:
{'ASGTF': 'User is not logged in'}

Seems to do it about half the time and different tickers or pulling the same ticker multiple times.

Expected behavior
I'm expecting to get p_all_financial_data. I see it when I'm login in.
I verified I get it when I am logged in.

Screenshots

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: Chrome
  • Version: 2.3.7

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Comment from dpguthrie describing the problem and solution probably a little more.
#251 (comment)
I thought I had it fixed but it was not.

@Tharindu-Abay
Copy link

Is this what's happening in you code:
When you are logging into yahoo account from selenium you get a recaptcha and you cannot continue?

@thelaycon
Copy link

Attach a screenshot.

@samirgorai
Copy link

can you add some visual files when you are getting error and when you are getting a normal expected result

@me1029134
Copy link
Author

me1029134 commented Dec 31, 2023

Sure, for example:
If I run this example code:

import yahooquery as yq
password = 'PW'
userEmail = 'UserEmail'
symbol='AAPL'
while (True):
  query = yq.Ticker(symbol, username= userEmail, password=password)
  p_all_financial_data_quarter = query.p_all_financial_data(frequency='q')
  print(p_all_financial_data_quarter)

I get this the first run:
image
Working correctly,

and this the second run:
image
Not working. It seems like it will work about half the time, randomly (no sequence or anything)

Then of course on chrome when I'm logged in I see the correct data too:
image

I'm tried it on Windows 10 and 11, and Python 3.9 and 3.12

Thanks in advance for your help!

@samirgorai
Copy link

some Questions:
1)Where there some recent changes because of this error is produced or does the previous versions of the library also shows this error
2)For your example
import yahooquery as yq
password = 'PW'
userEmail = 'UserEmail'
symbol='AAPL'
while (True):
query = yq.Ticker(symbol, username= userEmail, password=password) #trying to login with user credentials
p_all_financial_data_quarter = query.p_all_financial_data(frequency='q')
print(p_all_financial_data_quarter)

the login is done in base.py

Yfinnce Base

i think whenver a user logins it this part of the code must be executed
can you confirm if i am correct/wrong

3)how can i debug at my local where can i get my Username password

@samirgorai
Copy link

Is it possible to get your email id so that i can mesage you directly.

samirgorai added a commit to samirgorai/yahooquery that referenced this issue Dec 31, 2023
Cause of the error might be that in username page contains multiple
names with 'id=login-username' because of it was not able to login and resulting in error and providing us
message:"Unable to login and/or retrieve the appropriate cookies.  This is "
         most likely due to Yahoo Finance instituting recaptcha, which "
         this package does not support."
i changed the method of finding the element to By.XPATH

Signed-off-by: Samir Gorai <[email protected]>
@samirgorai
Copy link

Possible FIX can yo look at #255

@samirgorai
Copy link

samirgorai commented Jan 2, 2024

Hello @dpguthrie @me1029134 i TESTED THE CODE with my changes #255

import yahooquery as yq
password = 'XXXXXX'
userEmail = '[email protected]'
symbol='AAPL'
while (True):
query = yq.Ticker(symbol, username= userEmail, password=password)
p_all_financial_data_quarter = query.p_all_financial_data(frequency='q')
print(p_all_financial_data_quarter)

AND THE RESULT WAS

DevTools listening on ws://127.0.0.1:64734/devtools/browser/41a2456b-89ec-4df2-b6a5-d65774e7c308
[0102/091740.817:ERROR:command_buffer_proxy_impl.cc(127)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer.
[0102/091745.755:ERROR:gl_utils.cc(412)] [.WebGL-0000438400E7D400]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels
[0102/091748.973:ERROR:gl_utils.cc(412)] [.WebGL-00004384002C0000]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels
{'AAPL': 'User is not subscribed to Premium or has invalid cookies'}

CAN YOU CHECK ONCE AT YOUR SETUP WITH YOUR id

@me1029134
Copy link
Author

Unfortunately after adding this line:
self.driver.find_element(By.XPATH, "//input[@id='login-username']").send_keys(self.username)
(and commenting out the other)

I'm still getting the same problem:
image

@samirgorai
Copy link

@me1029134 how can i get the build please after my changes.

@samirgorai
Copy link

samirgorai commented Jan 2, 2024

I am able to login into login.yahoo.com

using the following script

"""
file to test login
"""
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC

while(1):
username="[email protected]"
pasword="XXXXXX"
driver_path='C:\Users\samir\Web Scraping14-12-2023\geckodriver.exe'
LOGIN_URL = "https://login.yahoo.com"
browser = webdriver.Firefox()
browser.get(LOGIN_URL)
print(browser.title)
browser.find_element(By.XPATH, "//input[@id='login-username']").send_keys(username)
browser.find_element(By.XPATH, "//input[@id='login-signin']").click()
password_element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "login-passwd")))
password_element.send_keys(pasword)
browser.find_element(By.XPATH, "//button[@id='login-signin']").click()

time.sleep(5)

I think the problem is with
image
in base.py code if(instance.cookies:)
the condition is resulting false

I can also see that it was modified in last commit.

@samirgorai
Copy link

@me1029134 @dpguthrie can you PLease check once i have made some changes and commited

Thank you

@me1029134
Copy link
Author

me1029134 commented Jan 3, 2024

I believe the method we are trying to do is pull the cookies from a chrome log in session and load them into the Selenium session.
Something along the lines of these articles:
https://stackoverflow.com/questions/15058462/how-to-save-and-load-cookies-using-python-selenium-webdriver
https://medium.com/@ghulammustafapy/efficient-login-session-management-in-selenium-python-save-and-reuse-credentials-for-browser-7aa21b32df63

@me1029134
Copy link
Author

I have a prototype fix that seems to work for me. I noticed if I put a 20 second wait after the login and before any of the pulls, it seems to not get hung up for some reason. I added that and I added just saving the entire session after a good login. It would be better if you could just pass in the cookies / session, that seems like the correct way to do it. Here is the fix that worked for me at least:

    def login(self) -> None:
        if _has_selenium:
            session_instance='session_save_location/session_instance.pkl'
            if os.path.exists(session_instance):
                with open(session_instance, 'rb') as file:
                    self.session.cookies = pickle.load(file)
            else:
                instance = YahooFinanceHeadless(self.username, self.password)
                instance.login()
                time.sleep(20)
                if instance.cookies:
                    self.session.cookies = instance.cookies
                    with open(session_instance, 'wb') as file:
                        pickle.dump(self.session.cookies, file)
                    return
                else:
                    logger.warning(
                        "Unable to login and/or retrieve the appropriate cookies.  This is "
                        "most likely due to Yahoo Finance instituting recaptcha, which "
                        "this package does not support."
                    )
        else:
            logger.warning(
                "You do not have the required libraries to use this feature.  Install "
                "with the following: `pip install yahooquery[premium]`"
            )
        self.session = setup_session(self.session, self._setup_url)

@samirgorai
Copy link

@dpguthrie Do you have a high level design any document/image to understand your library?

@dpguthrie
Copy link
Owner

@samirgorai Nope, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants