Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Youtube API to download english transcripts #1139

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

amirtds
Copy link

@amirtds amirtds commented May 13, 2022

  • Bug fix (fixes an issue)

Background

https://appsembler.atlassian.net/browse/BLACK-2163
By the end of 2021 YouTube Deprecated their Timedtext API. This caused problem for our customers to download automatically the transcripts for their YouTube video for their videos in Open edX. Open edX team is aware of this error https://openedx.atlassian.net/browse/TNL-9460, but there was no movement on solving this issue from open edx end.
We tried to solve this by replacing the API with a new YouTube API, but the new API requires OAuth authentication and only works with videos in the same channel and even with corresponding changes in the codebase the Transcripts didn't work.

Our solution

I developed a new API that responds with SRT version of transcripts for a given YouTube video ID for example https://us-central1-appsembler-tahoe-0.cloudfunctions.net/youtube-transcript?video_id=AcZZlbWRyUM You can try this with different videos just replace what comes after video_id= with your YouTube video ID.
After receiving the Transcripts from our API we make a call to transcripts/upload to upload the transcript for a given unit component location, and we store the transcripts. To make it visible in Studio and LMS.
The functionality should work as the following video

Filmage.2022-02-28_204527.mp4

@github-actions

This comment has been minimized.

@amirtds amirtds self-assigned this May 13, 2022
@amirtds amirtds linked an issue May 13, 2022 that may be closed by this pull request
@coveralls
Copy link

coveralls commented May 13, 2022

Pull Request Test Coverage Report for Build 2982211757

  • 68 of 76 (89.47%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.0009%) to 48.998%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cms/djangoapps/contentstore/views/transcripts_ajax.py 52 60 86.67%
Files with Coverage Reduction New Missed Lines %
cms/djangoapps/contentstore/views/transcripts_ajax.py 2 88.97%
Totals Coverage Status
Change from base Build 2970894981: -0.0009%
Covered Lines: 110814
Relevant Lines: 226159

💛 - Coveralls

Copy link

@OmarIthawi OmarIthawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amirtds Thanks a lot for working on this issue.

I have one concern though. This fix introduces a lot of new git conflicts (#1139 (comment)). It seems like a very costly addition to the platform, what do you think?

What's the customer need for this fix? Is it a blocker for most customers or it's an important feature for a niche segment?

If it's the latter, I suggest keeping the solution out of the platform and seek an upstream-first fix. Again, we're looking at a 27 conflicts which is more costly than a complete re-do of the whole pull request.

If it's a critical fix for all customers, then we should be looking to invest in a more packaged feature appsembler/transcripts that integrates smoothly with the platform until we fix it upstream.

I don't think this should be merged as-is.

@thraxil
Copy link

thraxil commented May 23, 2022

A lot of the conflicts appear to be pretty simple things like ' -> " changes that look like they may have been done via automation. If we can revert those, that might make it much easier to deal with.

@OmarIthawi
Copy link

Good catch Anders. It's great that the conflicts are for that reason. @amirtds would you mind reverting style changes that are causing conflicts?

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from d5c67cc to c12d2f2 Compare August 31, 2022 01:27
@github-actions

This comment has been minimized.

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from c12d2f2 to 7f49484 Compare August 31, 2022 01:39
@github-actions

This comment has been minimized.

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from 7f49484 to a1e0e00 Compare August 31, 2022 02:06
@github-actions

This comment has been minimized.

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from a1e0e00 to 5296063 Compare August 31, 2022 02:30
@github-actions

This comment has been minimized.

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from 5296063 to 6e7b792 Compare August 31, 2022 02:43
@github-actions

This comment has been minimized.

@amirtds amirtds force-pushed the amir/juniper-youtube-transcripts-api branch from 6e7b792 to a6a10b3 Compare September 2, 2022 22:30
@github-actions

This comment has been minimized.

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

Checking git merge conflicts against https://github.com/edx/edx-platform.git

Comparing with open-release/koa.master
Benchmark conflicts with main 111
Current conflicts 115
Summary Adds 4 new conflicts. How can we do better?
New conflicting files with 'open-release/koa.master'
cms/djangoapps/contentstore/views/transcripts_ajax.py
cms/templates/js/video/transcripts/messages/transcripts-not-found.underscore
Comparing with open-release/lilac.master
Benchmark conflicts with main 254
Current conflicts 277
Summary Adds 23 new conflicts. How can we do better?
New conflicting files with 'open-release/lilac.master'
cms/djangoapps/contentstore/views/tests/test_transcripts.py
cms/djangoapps/contentstore/views/transcripts_ajax.py
cms/templates/js/video/transcripts/messages/transcripts-not-found.underscore
Comparing with open-release/maple.master
Benchmark conflicts with main 284
Current conflicts 307
Summary Adds 23 new conflicts. How can we do better?
New conflicting files with 'open-release/maple.master'
cms/djangoapps/contentstore/views/tests/test_transcripts.py
cms/djangoapps/contentstore/views/transcripts_ajax.py
cms/templates/js/video/transcripts/messages/transcripts-not-found.underscore
Comparing with open-release/nutmeg.master
Benchmark conflicts with main 292
Current conflicts 315
Summary Adds 23 new conflicts. How can we do better?
New conflicting files with 'open-release/nutmeg.master'
cms/djangoapps/contentstore/views/tests/test_transcripts.py
cms/djangoapps/contentstore/views/transcripts_ajax.py
cms/templates/js/video/transcripts/messages/transcripts-not-found.underscore
Comparing with master
Benchmark conflicts with main 288
Current conflicts 311
Summary Adds 23 new conflicts. How can we do better?
New conflicting files with 'master'
cms/djangoapps/contentstore/views/tests/test_transcripts.py
cms/djangoapps/contentstore/views/transcripts_ajax.py
cms/templates/js/video/transcripts/messages/transcripts-not-found.underscore
common/lib/xmodule/xmodule/video_module/transcripts_utils.py

@melvinsoft
Copy link

.

Copy link

@melvinsoft melvinsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Contribute: YouTube video transcripts Import fix
5 participants