-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kunzite - Carline S. and Ann T. #49
base: main
Are you sure you want to change the base?
Conversation
refactoring - but learned complexity rises to O(n^2) w/o it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! 🎉It looks like you were trying some new things. I'd recommend trying to keep how complicated the code is relative to the scale of the functionality. For instance, the create_movie
method is rather small and focused in functionality, so try not to introduce too many complications. Keep an eye on the requirements and make sure they're being logically met (not passing the tests alone). I liked seeing the use of sets in a few places, but there's an opportunity for a few more places. Also, keep an eye out for places to reuse functions. The functions from wave 3 can be really helpful for waves 4 and 5. There were some tests in wave 5 that appeared to be modified, though I was unsure why. Your code passed them when I reverted back to the unchanged versions (and more importantly, the logic in your wave 5 functions was sound). So if you do make changes to a test while trying things out, be sure to return it to normal before submitting. But overall, well done!
assert updated_data["watched"][0]["title"]==MOVIE_TITLE_1 | ||
assert updated_data["watched"][0]["rating"]==RATING_1 | ||
assert updated_data["watched"][0]["genre"]==GENRE_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style-wise, prefer spaces around the ==
.
We can absolutely check for the expected values on the record at the expected position explicitly, as written, but we could also check all at once by building a new dict for comparison, or checking whether the dict lies in the list (without assuming a position).
assert updated_data["watched"][0] == { "title": MOVIE_TITLE_1, "rating": RATING_1, "genre": GENRE_1 }
or
assert { "title": MOVIE_TITLE_1, "rating": RATING_1, "genre": GENRE_1 } in updated_data["watched"]
assert updated_data["watched"][1]["title"]==movie_to_watch["title"] | ||
assert updated_data["watched"][1]["rating"]==movie_to_watch["rating"] | ||
assert updated_data["watched"][1]["genre"]==movie_to_watch["genre"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, here, I might prefer in
checks so as not to make assumptions about where in the new list the moved record ends up. We may also want to ensure that the other movies are where they're supposed to be.
assert FANTASY_1 in updated_data["watchlist"]
assert movie_to_watch in updated_data["watched"]
assert FANTASY_2 in updated_data["watched"]
assert {"title": "Zero Dark Python", "genre": "Intrigue", "rating": 3.0} in friends_unique_movies | ||
assert {"title": "The Lord of the Functions: The Two Parameters", "genre": "Fantasy", "rating": 4.0} not in friends_unique_movies | ||
assert {"title": "The JavaScript and the React", "genre": "Action", "rating": 2.2} not in friends_unique_movies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out the same checks from the previous test will work here as well (barring the clean data check). We're starting with the same originaal data as the prior test, but adding a duplicate of one of the friend's unique movies. This shouldn't change the unique list we're generating. We can't as easily check for amanda's clean data (since we modifiied the arranged data), so we could build another modified copy to compare, or assume that our clean data check from above is sufficient.
assert INTRIGUE_3 in friends_unique_movies
assert HORROR_1 in friends_unique_movies
assert FANTASY_4 in friends_unique_movies
|
||
raise Exception("Test needs to be completed.") | ||
#*Carline* Assert | ||
assert len(recommendations) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were application code (rather than test code) we would tend to check for an empty list as
assert not recommendations
But in a test, we might want to explicitly check more of the documented behavior. Since we know this should return an empty list, we may want to check directly against that. We may hold tests to stricter standards than regular code.
assert recommendations == []
def test_unique_from_empty_favorites(): | ||
# Arrange | ||
sonyas_data = { | ||
"watched": [], | ||
"favorites": [], | ||
"friends": [ | ||
{ | ||
"watched": [INTRIGUE_1b] | ||
"watched": ["INTRIGUE_1b"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The movie references here should not be turned into strings. We expect the friends' watched lists to contain dictionaries that represent movies rather than strings.
rec_movies.append(movie) | ||
|
||
return rec_movies | ||
#T= O(nm)depending on num friends, movies in each. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since each friend can have a variable number of movies, this is why it can be handy to choose n as the total number of friend movies. We don't really care that the friend movies are chunked up under the friends. We just care that we need to iterate over them. But mboth the of the not in
checks on 95 are checking lists of records. The user list could be at least as long as the friend movie list (leading to O(n^2) time), and the output list itself could grow to the same size as the friend movie list, providing another avenue to reach O(n^2) time.
We can alleviate this if we reuse our get_friends_unique_watched
with the set optimizations I mentioned, in which case we would be starting from a uniqued list of the friend movies that the user hasn't watched, reducing the problem to one of filtering that list to the hosts the user has subscribed to.
Now, we might be disinclined from doing thisif we're concerned that some movies might be accessible through some hosts based on which friend has watched it, while the wave 3 function would lose that detail. The way the data is set up for the tests, each of the movies is only on a single host (isn't media fragmentation great?!), allowing the function reuse. But we might need to think about storing things differently if hosting movies on different services was needed.
def get_new_rec_by_genre(user_data): | ||
genre_freq = get_most_watched_genre(user_data) | ||
watched = user_data["watched"] | ||
friends_watched = [movie for friend in user_data["friends"] for movie in friend["watched"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to only recommend movies to the user that the user hasn't watched, this would be another opportunity to use get_friends_unique_watched
, simplifying this to filtering the result of get_friends_unique_watched
to those movies in the user's favorite genre.
recommended_movies.append(movie) | ||
|
||
return recommended_movies | ||
#T - O(n^2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As currently written, this would be O(n^2) where n is the total number of friend movies. 112 iterates over the friend movies. Within that loop we do a not in
check against the user watched list (for simplicity, we might assume it is about the same size as the friend movie list). Though written as a nested condition, line 114 could be written as another condition of line 113, where it would also be doing another linear check against the output list, which could potentially be as long as the friend movie list itself, giving us two distinct ways of reaching O(n^2) complexity. The get_most_watched_genre
method itself is O(n^2) over the user movies as currently written, so there we have another way. So yes time is O(n^2), but try to specify why.
for movie in user_data["favorites"]: | ||
if movie not in watched_by_friends: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use our get_unique_watched
method from wave 3 to already start from the collection of user movies that the friends haven't watched. We could make a set of the titles from the user's favorite movies, and use that improve the lookups (swapping which part is the outer loop).
recommended_movies.append(movie) | ||
|
||
return recommended_movies | ||
#T - O(n^2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as written, due to the not in
check on line 130, we could be comparing each movie in the favorites list to each movie in the friends movies. If we assume they are about the same size (worst case) this would resemble O(n^2), or we could keep the two quantities distinct with two variables O(nm).
No description provided.