Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness in New New Release 2.3.7 #256

Closed
vlaranjo opened this issue Jan 2, 2024 · 9 comments
Closed

Slowness in New New Release 2.3.7 #256

vlaranjo opened this issue Jan 2, 2024 · 9 comments

Comments

@vlaranjo
Copy link

vlaranjo commented Jan 2, 2024

I have been using yahooquery for a while now and I use the different modules in the Yahoo Finance free domain. Recently, I feel that it is taking longer than before to obtain the same data. Before every iteration would take around 1 second, while now it is taking 3 seconds - this is for historical price data.

Just wanted to know if you are facing the same or not?

I believe this may be related with the changes in Yahoo API connection, but I wanted to make sure. I am located in Europe.

Here goes a snippet of what I am doing (just in case):
` ua = UserAgent()

        stock = Ticker(ticker, user_agent=ua.random)

        df = stock.history(period='2y')
        W1_pct = round(df.pct_change(5)['close'][-1]*100,2)
        M1_pct = round(df.pct_change(22)['close'][-1]*100,2)
        Q1_pct = round(df.pct_change(63)['close'][-1]*100,2)
        S1_pct = round(df.pct_change(126)['close'][-1]*100,2)
        Y1_pct = round(df.pct_change(252)['close'][-1]*100,2)`

Desktop:
Operating System: Debian GNU/Linux 11 (bullseye)
Python Version: 3.10.2
Yahooquery version: 2.3.7

Screenshot:
image

@samirgorai
Copy link

@vlaranjo
1)to use yahooquery we have to login are you able to login successfully? as for #254 we can see that we are having issue in login.
2)can you Please confirm which location are you from .
3)can you login into your login.yahoo.com manually using browser and check if you are able to see "i am not robot " check while logging.

thanks in advance.

@dpguthrie
Copy link
Owner

@samirgorai This is unrelated to login/premium data - please try and keep these threads focused to the topic.

@vlaranjo I think there have been others in different threads who have mentioned some slowness. My guess is they can now more effectively throttle requests on their end because there is more identifying information of similar requests (cookies and a crumb). I don't notice any on my end, but it's most likely location-dependent.

I ran the code below, and would recommend you do the same, to see if removing any of the identifying info sped up requests. As you can see from the results, I'm getting data back quicker than you referenced but also at relatively similar speeds regardless of session setup.

# third party
import pandas as pd
from requests.cookies import RequestsCookieJar

# first party
import yahooquery as yq

# Keyword arguments used across each ticker instance
kwargs = {'validate': True, 'asynchronous': True}

# Read in tickers
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]
symbols = df['Symbol'].tolist()

# Create multiple ticker instances
t1 = yq.Ticker(symbols, **kwargs)
t2 = yq.Ticker(symbols, **kwargs)

# Reset progress to True to view timing info
t1.progress = True
t2.progress = True

# Remove identifying info from t2
t2.session.cookies = RequestsCookieJar()
t2.crumb = None

# Get dataframes for each ticker instance
df1 = t1.history(period='2y')
df2 = t2.history(period='2y')
image

@ms82494
Copy link

ms82494 commented Jan 7, 2024

I just copied the code from @dpguthrie and ran it. I got 86.91it/s for t1 and 91.78it/s for t2. Feels pretty quick to me, and the difference is probably not meaningful.

@jonggwan
Copy link

jonggwan commented Jan 8, 2024

@dpguthrie I have tested history and price. History speed is almost the same but price has difference in speed. I am not in US and using US VPN.

2024-01-09 085454

@chfiii
Copy link

chfiii commented Feb 2, 2024

The slowdown in the price entry has been going on since around August. Where it used to take me ~2 seconds to fetch 130 prices, it is now more like 10 seconds. I'm pretty sure it is because yahoo has blocked the fast entries that had been used.

@vlaranjo
Copy link
Author

vlaranjo commented Feb 3, 2024

@dpguthrie thank you for the reply

I am not being able to run your code because I have the following error:

---------------------------------------------------------------------------
timeout                                   Traceback (most recent call last)
~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    448                     # Otherwise it looks like a bug in the code.
--> 449                     six.raise_from(e, None)
    450         except (SocketTimeout, BaseSSLError, SocketError) as e:

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    443                 try:
--> 444                     httplib_response = conn.getresponse()
    445                 except BaseException as e:

~\Anaconda3\envs\Python_3.9\lib\http\client.py in getresponse(self)
   1376             try:
-> 1377                 response.begin()
   1378             except ConnectionError:

~\Anaconda3\envs\Python_3.9\lib\http\client.py in begin(self)
    319         while True:
--> 320             version, status, reason = self._read_status()
    321             if status != CONTINUE:

~\Anaconda3\envs\Python_3.9\lib\http\client.py in _read_status(self)
    280     def _read_status(self):
--> 281         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    282         if len(line) > _MAXLINE:

~\Anaconda3\envs\Python_3.9\lib\socket.py in readinto(self, b)
    703             try:
--> 704                 return self._sock.recv_into(b)
    705             except timeout:

~\Anaconda3\envs\Python_3.9\lib\ssl.py in recv_into(self, buffer, nbytes, flags)
   1241                   self.__class__)
-> 1242             return self.read(nbytes, buffer)
   1243         else:

~\Anaconda3\envs\Python_3.9\lib\ssl.py in read(self, len, buffer)
   1099             if buffer is not None:
-> 1100                 return self._sslobj.read(len, buffer)
   1101             else:

timeout: The read operation timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError                          Traceback (most recent call last)
~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    702             # Make the request on the httplib connection object.
--> 703             httplib_response = self._make_request(
    704                 conn,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    450         except (SocketTimeout, BaseSSLError, SocketError) as e:
--> 451             self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
    452             raise

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in _raise_timeout(self, err, url, timeout_value)
    339         if isinstance(err, SocketTimeout):
--> 340             raise ReadTimeoutError(
    341                 self, url, "Read timed out. (read timeout=%s)" % timeout_value

ReadTimeoutError: HTTPSConnectionPool(host='fc.yahoo.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
~\Anaconda3\envs\Python_3.9\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    485         try:
--> 486             resp = conn.urlopen(
    487                 method=request.method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    814             )
--> 815             return self.urlopen(
    816                 method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    814             )
--> 815             return self.urlopen(
    816                 method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    814             )
--> 815             return self.urlopen(
    816                 method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    814             )
--> 815             return self.urlopen(
    816                 method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    814             )
--> 815             return self.urlopen(
    816                 method,

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    786 
--> 787             retries = retries.increment(
    788                 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~\Anaconda3\envs\Python_3.9\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    591         if new_retry.is_exhausted():
--> 592             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    593 

MaxRetryError: HTTPSConnectionPool(host='fc.yahoo.com', port=443): Max retries exceeded with url: / (Caused by ReadTimeoutError("HTTPSConnectionPool(host='fc.yahoo.com', port=443): Read timed out. (read timeout=5)"))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13136\432901738.py in <module>
      1 # Create multiple ticker instances
----> 2 t1 = yq.Ticker(symbols, **kwargs)
      3 t2 = yq.Ticker(symbols, **kwargs)
      4 
      5 # Reset progress to True to view timing info

~\AppData\Roaming\Python\Python39\site-packages\yahooquery\ticker.py in __init__(self, symbols, **kwargs)
     88 
     89     def __init__(self, symbols, **kwargs):
---> 90         super(Ticker, self).__init__(**kwargs)
     91         self.symbols = symbols
     92         self.invalid_symbols = None

~\AppData\Roaming\Python\Python39\site-packages\yahooquery\base.py in __init__(self, **kwargs)
    927         self.country = kwargs.get("country", "united states").lower()
    928         self.formatted = kwargs.pop("formatted", False)
--> 929         self.session, self.crumb = _init_session(kwargs.pop("session", None), **kwargs)
    930         self.progress = kwargs.pop("progress", False)
    931         username = os.getenv("YF_USERNAME") or kwargs.get("username")

~\AppData\Roaming\Python\Python39\site-packages\yahooquery\utils\__init__.py in _init_session(session, **kwargs)
    166             ),
    167         )
--> 168         session, crumb = setup_session_with_cookies_and_crumb(session)
    169     return session, crumb
    170 

~\AppData\Roaming\Python\Python39\site-packages\yahooquery\utils\__init__.py in setup_session_with_cookies_and_crumb(session)
    179     else:
    180         if isinstance(session, FuturesSession):
--> 181             response = response.result()
    182 
    183         session.cookies = response.cookies

~\Anaconda3\envs\Python_3.9\lib\concurrent\futures\_base.py in result(self, timeout)
    444                     raise CancelledError()
    445                 elif self._state == FINISHED:
--> 446                     return self.__get_result()
    447                 else:
    448                     raise TimeoutError()

~\Anaconda3\envs\Python_3.9\lib\concurrent\futures\_base.py in __get_result(self)
    389         if self._exception:
    390             try:
--> 391                 raise self._exception
    392             finally:
    393                 # Break a reference cycle with the exception in self._exception

~\Anaconda3\envs\Python_3.9\lib\concurrent\futures\thread.py in run(self)
     56 
     57         try:
---> 58             result = self.fn(*self.args, **self.kwargs)
     59         except BaseException as exc:
     60             self.future.set_exception(exc)

~\Anaconda3\envs\Python_3.9\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    587         }
    588         send_kwargs.update(settings)
--> 589         resp = self.send(prep, **send_kwargs)
    590 
    591         return resp

~\Anaconda3\envs\Python_3.9\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
    701 
    702         # Send the request
--> 703         r = adapter.send(request, **kwargs)
    704 
    705         # Total elapsed time of the request (approximately)

~\AppData\Roaming\Python\Python39\site-packages\yahooquery\utils\__init__.py in send(self, request, **kwargs)
    141         if timeout is None:
    142             kwargs["timeout"] = self.timeout
--> 143         return super(TimeoutHTTPAdapter, self).send(request, **kwargs)
    144 
    145 

~\Anaconda3\envs\Python_3.9\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    517                 raise SSLError(e, request=request)
    518 
--> 519             raise ConnectionError(e, request=request)
    520 
    521         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='fc.yahoo.com', port=443): Max retries exceeded with url: / (Caused by ReadTimeoutError("HTTPSConnectionPool(host='fc.yahoo.com', port=443): Read timed out. (read timeout=5)"))

Perhaps this is again something related to the country where the request is being made.

@chfiii
Copy link

chfiii commented Feb 3, 2024 via email

@vlaranjo
Copy link
Author

vlaranjo commented Feb 8, 2024

Hi everyone,

Ok, I understood what was the issue, by looking at [(https://github.com//issues/249)], the asynchronous had to be =False

Now it runs fast when I run the code @dpguthrie shared.

But if I want to run the code I was using before with asynchronous=False it is still too slow:

for ticker in tqdm(tickers):
        try:

            #Get Tables from Yahoo: Historical and Profile
            ua = UserAgent()
            
            # Keyword arguments used across each ticker instance
            kwargs = {'validate': True, 'asynchronous': False}

            stock = Ticker(ticker, **kwargs)

            df = stock.history(period='2y')
            W1_pct = round(df.pct_change(5)['close'][-1]*100,2)
            M1_pct = round(df.pct_change(22)['close'][-1]*100,2)
            Q1_pct = round(df.pct_change(63)['close'][-1]*100,2)
            S1_pct = round(df.pct_change(126)['close'][-1]*100,2)
            Y1_pct = round(df.pct_change(252)['close'][-1]*100,2)

            industry = Index[Index.index==ticker]['Industry'].iloc[0]
            sector = Index[Index.index==ticker]['Sector'].iloc[0]

            name = Index[Index.index==ticker]['Name'].iloc[0]



            #Construct Ticker Table to Print
            performance_data_table = pd.DataFrame(data={
                'Ticker': [ticker], 'Name': [name], 'Sector': [sector], 'Industry': [industry], 'Return_1Y': [Y1_pct], 'Return_1S': [S1_pct],
                'Return_1Q': [Q1_pct],  'Return_1M': [M1_pct], 'Return_1W': [W1_pct]}).set_index('Ticker')

            ##Save Table
            performance_data_table.to_csv(os.path.join(performance_analysis_folder,ticker)+'_Table.csv')
            
            #time.sleep(2)

Do you have any suggestion? Or the best solution is really to change my code to that format, instead of the current loop?

Thank you,
Vasco

@vlaranjo
Copy link
Author

Hi everyone,

I turned it around based on the code @dpguthrie provided and now it is running fast. Thank you for the help.

I am leaving below the code snippet just in case it is useful for someone. I will close the issue.

#Define the Tickers object in Yahooquery
t1 = Ticker(tickers, user_agent=ua.random)

t1.progress = True

# Get dataframes for each ticker instance
df1 = t1.history(period='2y')
df1 =df1.reset_index()

for ticker in tickers:
    try:
        df = df1[df1['symbol'] == tickers[0]]

        W1_pct = round(df['close'].pct_change(5).iloc[-1]*100,2)
        M1_pct = round(df['close'].pct_change(22).iloc[-1]*100,2)
        Q1_pct = round(df['close'].pct_change(63).iloc[-1]*100,2)
        S1_pct = round(df['close'].pct_change(126).iloc[-1]*100,2)
        Y1_pct = round(df['close'].pct_change(252).iloc[-1]*100,2)

        industry = Index[Index.index==ticker]['Industry'].iloc[0]
        sector = Index[Index.index==ticker]['Sector'].iloc[0]

        name = Index[Index.index==ticker]['Name'].iloc[0]



        #Construct Ticker Table to Print
        performance_data_table = pd.DataFrame(data={
            'Ticker': [ticker], 'Name': [name], 'Sector': [sector], 'Industry': [industry], 'Return_1Y': [Y1_pct], 'Return_1S': [S1_pct],
            'Return_1Q': [Q1_pct],  'Return_1M': [M1_pct], 'Return_1W': [W1_pct]}).set_index('Ticker')

        ##Save Table
        performance_data_table.to_csv(os.path.join(performance_analysis_folder,ticker)+'_Table.csv')
        
    except:
        continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants