Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement micro op fusion in decode stage. #113

Open
zxc12523 opened this issue Nov 7, 2023 · 6 comments · Fixed by #146
Open

Implement micro op fusion in decode stage. #113

zxc12523 opened this issue Nov 7, 2023 · 6 comments · Fixed by #146

Comments

@zxc12523
Copy link

zxc12523 commented Nov 7, 2023

In the decode stage, we might find several pairs of uops that can be merged into one instruction to increase performance. Since this optimization is common in modern high-performance CPUs, we can add this feature for users to model the performance gain.

@klingaard
Copy link
Collaborator

Oh, absolutely!

The challenge here is -- can you build a small fusion framework in Olympia that allows a user of the model to experiment with configurable combinations? In other words -- do not hard-code the pairings in the simulator, set up a framework that is runtime programmable via YAML or JSON to identify pairings. That'd be really cool and very powerful.

@danbone
Copy link
Contributor

danbone commented Nov 8, 2023

@klingaard Is there any support for this in mavis? I saw a morph instruction function.

@zxc12523
Copy link
Author

zxc12523 commented Nov 9, 2023

@klingaard maybe we can add those configure into small_core.yaml ?

@klingaard
Copy link
Collaborator

Is there any support for this in mavis? I saw a morph instruction function.

Yes, and you're correct, it's related to the morph function call. I'm not a Mavis expert (@dbmurrell is the original author), but if you look at https://github.com/sparcians/mavis/blob/4f3fef891f9ddc5c371c27500d02596f21ea6fc8/test/main.cpp#L446 you can see an example of how you can morph an existing instruction into a fused one. I think the process is:

  1. Identify a pairing (within a decode group or across [that's tricky])
  2. Morph the first instruction into the fused "new" operation
  3. No-op the second (force it to go directly to the ROB)

maybe we can add those configure into small_core.yaml

I think that's reasonable, but you might run into limitations with YAML to properly identify pairings. Dunno until there's a design in place for how you want to do it. Suggestion: Might want to specify a different language (an XML derivative with a DOM) and reference that:

top.cpu.core0.extension.core_extensions:
    decode_fusions: "fusion_pairs.xml"

My suggestion for this entire effort: move this to a discussion and create a design document. Start with a use case, specifically, which pairs will you initially be fusing? For those pairs, what are the constraints?

For example, the first instruction must be an add followed by a branch AND the add's RD field must be the same as the branch's RS2 field... etc.

From there, you can help you determine the "language" you want to build to specify the pairings -- and how a generic fuser will convert that into runtime code...

@ghost ghost changed the title Implement micro op fussion in decode stage. Implement micro op fusion in decode stage. Nov 27, 2023
@klingaard
Copy link
Collaborator

So @jeffnye-gh has been looking at this. Discussion: #121
as well as first PR: #135

@klingaard klingaard linked a pull request Feb 8, 2024 that will close this issue
@jeffnye-gh
Copy link
Collaborator

I believe this can be closed now. Support for fusion is available through the FSL API and FusionDecoder.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants