[WIP] Refactor to Introduce Backend Abstraction #2011

zhenglongjiepheonix · 2024-09-03T15:30:50Z

What does this PR do?

add backend abstraction
refactor the original pipeline flow to accommodate potential needs of different backend
modify API so that more parameter passing format will be supported

The NanotronBackend is still WIP and untested, but it would be nice to get some feedbacks first.

…parallelization_strategy

…_abstraction

HuggingFaceDocBuilderDev · 2024-09-03T15:49:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun

Very nice first draft.

I was wondering: do you think we should keep the 2 methods for row and column separated? Just opening the question, I do not have a strong opinion on that.

optimum/fx/parallelization/api.py

optimum/fx/parallelization/backend/base.py

michaelbenayoun · 2024-09-06T16:19:37Z

optimum/fx/parallelization/backend/base.py

+        Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our
+        passes don't.


Suggested change

Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our

passes don't.

Mark information about tied parameters right before running passes because dynamo tracing alters the names of the parameters while our passes do not.

Are you sure naming this pre_process is the most adapted considered it just marks for tied weights?

I think generally it means something that needs done before we run passes, and weights tying info marking happens to be one of them

michaelbenayoun · 2024-09-06T16:22:07Z

optimum/fx/parallelization/backend/base.py

+    ) -> nn.Module:
+        raise NotImplementedError
+
+    def pre_process(self, graph_module: GraphModule, ctx: "ParallelExecutionCtx", config: "Config") -> GraphModule:


What's a config?

a Config is a data class which records static configurations during the whole process

optimum/fx/parallelization/backend/base.py

optimum/fx/parallelization/backend/nanotron.py

michaelbenayoun · 2024-09-06T16:42:09Z

optimum/fx/parallelization/backend/nanotron.py

+            raise ValueError(
+                "`sequence_parallel` can not be activated when `tp_mode` is not set to `REDUCE_SCATTER` in nanotron backend"
+            )


Is it best to fail like that or change the setting in Nanotron ourselves?

michaelbenayoun · 2024-09-26T13:31:28Z

@zhenglongjiepheonix what's the status?

zhenglongjiepheonix · 2024-09-26T23:39:18Z

the nanotron backend is not tested, for the default backend, everything works fine, it contains the newest update and has addressed comments

michaelbenayoun · 2024-09-27T16:10:31Z

Ok so ready for final review?

zhenglongjiepheonix · 2024-10-02T03:10:17Z

Ok so ready for final review?

Basically It's for reference, if someone is working on the support for nanotron indeed, then the correctness of nanotronbackend needs to be verified and additional tests are needed, but in my opinion this PR marks the boundary of optimum and nanotron, the rest of work should be implemented inside nanotron and optimum just expose parallelize_model api

github-actions · 2025-01-01T02:07:51Z

This PR has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

zhenglongjiepheonix added 21 commits August 14, 2024 03:05

modify parallelization strategy

50fcfc0

only support model id in api now

4114d3b

more comments

c689402

more comments

252c3b7

Merge remote-tracking branch 'upstream/main' into longjie/generalize_…

1be77ed

…parallelization_strategy

address comments

22d6766

remove idle runner

febac9b

fix

bf99175

format

4d9d036

more comments

44a87f4

generalize api & add backend abstraction

513d516

fix

8335a35

copyright

d051217

fix api

6b03855

Merge remote-tracking branch 'upstream/main' into longjie/add_backend…

6466ccc

…_abstraction

move weights intialization inside post process

b4166ac

seperate meta update and parallel layer construction

576104c

move weight intialization & binding inside backend

8bbc2e9

add weights tying for nanotron backend

d68df89

fix

c752e29

resolve

82d1cf9

zhenglongjiepheonix requested a review from michaelbenayoun September 3, 2024 15:31

fix

3a1a195

ArthurZucker mentioned this pull request Sep 6, 2024

Enhancing Hugging Face Models with Tensor Parallelism for Large-Scale Model Support 🚀 huggingface/transformers#32470

Open

michaelbenayoun reviewed Sep 6, 2024

View reviewed changes

zhenglongjiepheonix added 4 commits September 20, 2024 18:33

fix conflict

b5b371f

address comments

0ff39bb

address comments

5137f68

fix

9dd77de

zhenglongjiepheonix added 2 commits September 20, 2024 19:33

fix

a375b6d

fix

40880a3

github-actions bot added the Stale label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor to Introduce Backend Abstraction #2011

[WIP] Refactor to Introduce Backend Abstraction #2011

zhenglongjiepheonix commented Sep 3, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 3, 2024

michaelbenayoun left a comment

michaelbenayoun Sep 6, 2024

michaelbenayoun Sep 6, 2024

zhenglongjiepheonix Sep 20, 2024 •

edited

Loading

michaelbenayoun Sep 6, 2024

zhenglongjiepheonix Sep 20, 2024

michaelbenayoun Sep 6, 2024

michaelbenayoun commented Sep 26, 2024

zhenglongjiepheonix commented Sep 26, 2024

michaelbenayoun commented Sep 27, 2024

zhenglongjiepheonix commented Oct 2, 2024

github-actions bot commented Jan 1, 2025

		Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our
		passes don't.

	Mark tie information right before we run passes because dynamo tracing will alter the parameter name while our
	passes don't.
	Mark information about tied parameters right before running passes because dynamo tracing alters the names of the parameters while our passes do not.

[WIP] Refactor to Introduce Backend Abstraction #2011

Are you sure you want to change the base?

[WIP] Refactor to Introduce Backend Abstraction #2011

Conversation

zhenglongjiepheonix commented Sep 3, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 3, 2024

michaelbenayoun left a comment

Choose a reason for hiding this comment

michaelbenayoun Sep 6, 2024

Choose a reason for hiding this comment

michaelbenayoun Sep 6, 2024

Choose a reason for hiding this comment

zhenglongjiepheonix Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

michaelbenayoun Sep 6, 2024

Choose a reason for hiding this comment

zhenglongjiepheonix Sep 20, 2024

Choose a reason for hiding this comment

michaelbenayoun Sep 6, 2024

Choose a reason for hiding this comment

michaelbenayoun commented Sep 26, 2024

zhenglongjiepheonix commented Sep 26, 2024

michaelbenayoun commented Sep 27, 2024

zhenglongjiepheonix commented Oct 2, 2024

github-actions bot commented Jan 1, 2025

zhenglongjiepheonix commented Sep 3, 2024 •

edited

Loading

zhenglongjiepheonix Sep 20, 2024 •

edited

Loading