Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor to Introduce Backend Abstraction #2011

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

zhenglongjiepheonix
Copy link
Contributor

@zhenglongjiepheonix zhenglongjiepheonix commented Sep 3, 2024

What does this PR do?

  • add backend abstraction
  • refactor the original pipeline flow to accommodate potential needs of different backend
  • modify API so that more parameter passing format will be supported

The NanotronBackend is still WIP and untested, but it would be nice to get some feedbacks first.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice first draft.

I was wondering: do you think we should keep the 2 methods for row and column separated? Just opening the question, I do not have a strong opinion on that.

optimum/fx/parallelization/api.py Outdated Show resolved Hide resolved
optimum/fx/parallelization/backend/base.py Outdated Show resolved Hide resolved
Comment on lines 69 to 70
Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our
passes don't.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our
passes don't.
Mark information about tied parameters right before running passes because dynamo tracing alters the names of the parameters while our passes do not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure naming this pre_process is the most adapted considered it just marks for tied weights?

Copy link
Contributor Author

@zhenglongjiepheonix zhenglongjiepheonix Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally it means something that needs done before we run passes, and weights tying info marking happens to be one of them

) -> nn.Module:
raise NotImplementedError

def pre_process(self, graph_module: GraphModule, ctx: "ParallelExecutionCtx", config: "Config") -> GraphModule:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Config is a data class which records static configurations during the whole process

optimum/fx/parallelization/backend/base.py Show resolved Hide resolved
Comment on lines +70 to +72
raise ValueError(
"`sequence_parallel` can not be activated when `tp_mode` is not set to `REDUCE_SCATTER` in nanotron backend"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it best to fail like that or change the setting in Nanotron ourselves?

@michaelbenayoun
Copy link
Member

@zhenglongjiepheonix what's the status?

@zhenglongjiepheonix
Copy link
Contributor Author

the nanotron backend is not tested, for the default backend, everything works fine, it contains the newest update and has addressed comments

@michaelbenayoun
Copy link
Member

Ok so ready for final review?

@zhenglongjiepheonix
Copy link
Contributor Author

Ok so ready for final review?

Basically It's for reference, if someone is working on the support for nanotron indeed, then the correctness of nanotronbackend needs to be verified and additional tests are needed, but in my opinion this PR marks the boundary of optimum and nanotron, the rest of work should be implemented inside nanotron and optimum just expose parallelize_model api

Copy link

github-actions bot commented Jan 1, 2025

This PR has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants