Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Archive and GitArchive output formats #173

Draft
wants to merge 1 commit into
base: 0.2.0
Choose a base branch
from

Conversation

AI-Mozi
Copy link
Member

@AI-Mozi AI-Mozi commented Aug 30, 2024

@AI-Mozi AI-Mozi force-pushed the add_archive_and_git_archive_output_formats branch from c086226 to 5d844e0 Compare August 30, 2024 14:00
@AI-Mozi AI-Mozi self-assigned this Aug 30, 2024
Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-classing Ronin::Web::Spider::Archive classes might come back to bite us, if we ever change their APIs or if we add a << method to them. Using composition and initializing ivars might be a better option.

#
# Represents a web archive directory.
#
class Archive < Ronin::Web::Spider::Archive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub-classing Ronin::Web::Spider::Archive might cause problems for us in the future if we ever change it's API. Might be safer to initialize it as an ivar in initialize.

# The path to the root directory.
#
def initialize(root)
super(root)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be safer to initialize Ronin::Web::Spider::GitArchive as an ivar.

Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, we should name these new output formats web_archive and git_web_archive (or web_git_archive). This should indicate to the user that only the URLs will be saved into the archive.

@AI-Mozi AI-Mozi force-pushed the add_archive_and_git_archive_output_formats branch from 5d844e0 to 8d62799 Compare September 1, 2024 16:49
Copy link
Member

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to rename the classes to match the output-format names.

register :png, '.png', PNG
register :pdf, '.pdf', PDF
register :web_archive, '', Archive
register :web_git_archive, '', GitArchive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename classes to WebArchive and GitArchive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-exts can be omitted now that they are optional.

#
# Represents a web archive directory.
#
class Archive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename class to WebArchive.

#
# Represents a web archive directory that is backed by Git.
#
class GitArchive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename class to WebGitArchive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants