Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added get_roster function that takes in team abbreviation and year to… #276

Open
wants to merge 2 commits into
base: v4
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions basketball_reference_web_scraper/client.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import requests

from basketball_reference_web_scraper.errors import InvalidSeason, InvalidDate, InvalidPlayerAndSeason
from basketball_reference_web_scraper.errors import InvalidSeason, InvalidDate, InvalidPlayerAndSeason, InvalidTeam
from basketball_reference_web_scraper.http_service import HTTPService
from basketball_reference_web_scraper.output.columns import BOX_SCORE_COLUMN_NAMES, SCHEDULE_COLUMN_NAMES, \
PLAYER_SEASON_TOTALS_COLUMN_NAMES, \
Expand All @@ -11,8 +11,8 @@
from basketball_reference_web_scraper.output.writers import CSVWriter, JSONWriter, FileOptions, OutputOptions, \
SearchCSVWriter
from basketball_reference_web_scraper.parser_service import ParserService


from datetime import datetime
from basketball_reference_web_scraper.data import TEAM_TO_TEAM_ABBREVIATION
def standings(season_end_year, output_type=None, output_file_path=None, output_write_option=None,
json_options=None):
try:
Expand Down Expand Up @@ -212,6 +212,35 @@ def team_box_scores(day, month, year, output_type=None, output_file_path=None, o
)
return output_service.output(data=values, options=options)

def get_roster(team, year=None, output_type=None, output_file_path=None, output_write_option=None, json_options=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make year a non-optional argument. I don't think there's a pattern for any year values being optional (intentionally).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we rename year to season_end_year? Following the naming convention in similar season-related methods.

try:
http_service = HTTPService(parser=ParserService())
if year == None:
today = datetime.now()
year = today.year
if today.month >=7:
year += 1
if len(team) > 3:
team=TEAM_TO_TEAM_ABBREVIATION[team.upper()]
values=http_service.get_team_roster(team=team, year=year)
except requests.exceptions.HTTPError as http_error:
if http_error.response.status_code == requests.codes.not_found:
raise InvalidTeam(team=team, year=year)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error's name is slightly inaccurate - the team value could be valid, but the season end year value could be invalid, like https://www.basketball-reference.com/teams/BOS/1020.html

I'd prefer to call this error InvalidTeamSeason (or something similar).

(Note that I've made similar inaccurate naming mistakes in other methods, like InvalidDate, that need to be corrected in the future.)

else:
raise http_error

options = OutputOptions.of(
file_options=FileOptions.of(path=output_file_path, mode=output_write_option),
output_type=output_type,
json_options=json_options,
csv_options={"column_names": "Players"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the Roster table

image

Happy to add more values in the future, but at a minimum the values should be

  • name
  • slug (or player_id)

)

output_service = OutputService(
json_writer=JSONWriter(value_formatter=BasketballReferenceJSONEncoder),
csv_writer=CSVWriter(value_formatter=format_value)
)
return output_service.output(data=values, options=options)

def play_by_play(home_team, day, month, year, output_type=None, output_file_path=None, output_write_option=None,
json_options=None):
Expand Down Expand Up @@ -250,3 +279,5 @@ def search(term, output_type=None, output_file_path=None, output_write_option=No
csv_writer=SearchCSVWriter(value_formatter=format_value)
)
return output_service.output(data=values, options=options)


5 changes: 5 additions & 0 deletions basketball_reference_web_scraper/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,8 @@ def __init__(self, player_identifier, season_end_year):
message = "Player with identifier \"{player_identifier}\" in season ending in {season_end_year} is invalid" \
.format(player_identifier=player_identifier, season_end_year=season_end_year)
super().__init__(message)

class InvalidTeam(Exception):
def __init__(self, team, year):
message = "Team \"{team}\" in {year} is invalid".format(team=team, year=year)
super().__init__(message)
11 changes: 11 additions & 0 deletions basketball_reference_web_scraper/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -870,6 +870,17 @@ def game_url_paths(self):
game_links = self.html.xpath(self.game_url_paths_query)
return [game_link.attrib['href'] for game_link in game_links]

class TeamRoster:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a SeasonTeamPage class that has a property called @team_roster_table?

Most of all the client methods refer to a top-level Page class that might expose underlying tables with properties or methods.

def __init__(self, html):
self.html = html

@property
def roster_query(self):
return '//table[@id="roster"]//td[@data-stat="player"]'
@property
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: let's have a new line between lines 879 and 880.

def team_roster(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create Row classes to represent the underlying row content (to keep consistent with similar patterns in this file).

players = self.html.xpath(self.roster_query)
return [player.text_content() for player in players]

class SchedulePage:
def __init__(self, html):
Expand Down
17 changes: 16 additions & 1 deletion basketball_reference_web_scraper/http_service.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from datetime import datetime, timezone

import requests
from lxml import html

from basketball_reference_web_scraper.data import TEAM_TO_TEAM_ABBREVIATION, TeamTotal, PlayerData
from basketball_reference_web_scraper.errors import InvalidDate, InvalidPlayerAndSeason
from basketball_reference_web_scraper.html import DailyLeadersPage, PlayerSeasonBoxScoresPage, PlayerSeasonTotalTable, \
PlayerAdvancedSeasonTotalsTable, PlayByPlayPage, SchedulePage, BoxScoresPage, DailyBoxScoresPage, SearchPage, \
PlayerPage, StandingsPage
PlayerPage, StandingsPage, TeamRoster


class HTTPService:
Expand Down Expand Up @@ -194,6 +196,17 @@ def team_box_scores(self, day, month, year):
for box_score in self.team_box_score(game_url_path=game_url_path)
]

def get_team_roster(self, team, year):
url = "{BASE_URL}/teams/{team}/{year}.html".format(BASE_URL=HTTPService.BASE_URL, team=team, year=year)

response = requests.get(url=url)

response.raise_for_status()

page = TeamRoster(html=html.fromstring(response.content))
return page.team_roster


def search(self, term):
response = requests.get(
url="{BASE_URL}/search/search.fcgi".format(BASE_URL=HTTPService.BASE_URL),
Expand Down Expand Up @@ -240,3 +253,5 @@ def search(self, term):
return {
"players": player_results
}


4 changes: 3 additions & 1 deletion bin/normalizer
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/Users/jaebradley/projects/basketball_reference_web_scraper/bin/python3
#!/bin/sh
'''exec' "/Users/paramgattupalli/Documents/Fall 2024/CEN 3031/basketball_reference_web_scraper/bin/python" "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
Expand Down
4 changes: 3 additions & 1 deletion bin/pip
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/Users/jaebradley/projects/basketball_reference_web_scraper/bin/python3
#!/bin/sh
'''exec' "/Users/paramgattupalli/Documents/Fall 2024/CEN 3031/basketball_reference_web_scraper/bin/python" "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
Expand Down
4 changes: 3 additions & 1 deletion bin/pip3
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
#!/Users/jaebradley/projects/basketball_reference_web_scraper/bin/python3
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PGatts these bin files should never have been committed - I removed them in 81dd1e0

Rebasing / merging the latest changes in v4 should resolve the merge conflicts caused by the bin directory.

#!/bin/sh
'''exec' "/Users/paramgattupalli/Documents/Fall 2024/CEN 3031/basketball_reference_web_scraper/bin/python" "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
Expand Down