Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug trying to consolidate October Data #72

Open
radumas opened this issue Dec 14, 2019 · 4 comments
Open

Bug trying to consolidate October Data #72

radumas opened this issue Dec 14, 2019 · 4 comments
Assignees

Comments

@radumas
Copy link
Collaborator

radumas commented Dec 14, 2019

Extracting 2019-10-29.tar.gz
 98%|███████████████████████████████████▏| 40000/40983 [06:34<00:09, 100.08it/s]Traceback (most recent call last):
  File "fetch_s3.py", line 212, in <module>
    fetch_s3()
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "fetch_s3.py", line 208, in fetch_s3
    _fetch_s3(aws_access_key_id, aws_secret_access_key, output_dir, start_date, end_date, bucket)
  File "fetch_s3.py", line 197, in _fetch_s3
    fetch_and_transform(to_download, output_dir)
  File "fetch_s3.py", line 80, in fetch_and_transform
    jsons_to_csv(tmpdir, output_dir)
  File "fetch_s3.py", line 117, in jsons_to_csv
    pd.DataFrame.from_records(requests, columns=requests[0]._fields).to_csv(
IndexError: list index out of range
@radumas radumas self-assigned this Dec 14, 2019
@radumas radumas added this to the 0-Data Pipeline milestone Dec 14, 2019
@radumas
Copy link
Collaborator Author

radumas commented Dec 18, 2019

So I think the issue is because the API wasn't working at all Oct 5th, and so there was no requests data within a chunk of 2000 files (aka minutes?). I put a simple if statement to check if the requests variable is empty and consolidation worked.... will upload the fix soon.

@tloureiro
Copy link

@radumas did you end up fixing this? I see the October consolidated file at https://spideroak.com/browse/share/raphaeld/ttc_subway_times/ttc_subway_times/serverless_data/ but I don't see the fix in the fetch_s3.py

@radumas
Copy link
Collaborator Author

radumas commented Jan 9, 2020

@tloureiro thanks for catching this! I think I forgot to push code fixing this. I'll try to do so tonight...

@radumas
Copy link
Collaborator Author

radumas commented Sep 20, 2021

Whoops I didn't push a fix and this is still an issue, more recently with 2021-06 data

Traceback (most recent call last):
  File "fetch_s3.py", line 210, in <module>
    fetch_s3()
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rad/.local/share/virtualenvs/ttc_subway_times-ZmuzQ-JX/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "fetch_s3.py", line 206, in fetch_s3
    fetch_and_transform(to_download, output_dir)
  File "fetch_s3.py", line 80, in fetch_and_transform
    jsons_to_csv(tmpdir, output_dir)
  File "fetch_s3.py", line 117, in jsons_to_csv
    pd.DataFrame.from_records(requests, columns=requests[0]._fields).to_csv(
IndexError: list index out of range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants