Skip to content

Commit

Permalink
⚠️ Rename CPU tuning options / generics (#1125)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Dec 22, 2024
2 parents f3c21f9 + 286be32 commit fc0034f
Show file tree
Hide file tree
Showing 12 changed files with 365 additions and 265 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 22.12.2024 | 1.10.7.8 | :warning: rename CPU tuning options / generics | [#1125](https://github.com/stnolting/neorv32/pull/1125) |
| 22.12.2024 | 1.10.7.7 | :warning: move clock gating switch from processor top to CPU clock; `CLOCK_GATING_EN` is now a CPU tuning option | [#1124](https://github.com/stnolting/neorv32/pull/1124) |
| 21.12.2024 | 1.10.7.6 | minor rtl cleanups and optimizations | [#1123](https://github.com/stnolting/neorv32/pull/1123) |
| 19.12.2024 | 1.10.7.5 | :test_tube: use time-multiplex PMP architecture (reducing area footprint) | [#1105](https://github.com/stnolting/neorv32/pull/1105) |
Expand Down
28 changes: 14 additions & 14 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ Software can check for configured tuning options via specific flags in the <<_mx

{empty} +
[discrete]
===== **`CLOCK_GATING_EN`**
===== **`CPU_CLOCK_GATING_EN`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
Expand All @@ -274,7 +274,7 @@ Software can check for configured tuning options via specific flags in the <<_mx

{empty} +
[discrete]
===== **`FAST_MUL_EN`**
===== **`CPU_FAST_MUL_EN`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
Expand All @@ -293,7 +293,7 @@ time-independent of the provided operands.

{empty} +
[discrete]
===== **`FAST_SHIFT_EN`**
===== **`CPU_FAST_SHIFT_EN`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
Expand All @@ -314,7 +314,7 @@ approach requires only a few hardware resources and does not impact the critical

{empty} +
[discrete]
===== **`REGFILE_HW_RST`**
===== **`CPU_RF_HW_RST_EN`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
Expand Down Expand Up @@ -360,7 +360,7 @@ The single clock domain of the CPU core can be split into an always-on clock dom
The switchable clock domain can be deactivated to further reduce reduce dynamic power consumption. CPU-external modules
like timers, interfaces and memories are not affected by the clock gating.

The splitting into two clock domain is enabled by the `CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>> /
The splitting into two clock domain is enabled by the `CPU_CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>> /
<<_cpu_tuning_options>>). When enabled, a generic clock switching gate is added to decouple the switchable clock from
the always-on clock domain. Whenever the CPU enters <<_sleep_mode>> the switchable clock domain is shut down.

Expand Down Expand Up @@ -583,7 +583,7 @@ The "compressed" ISA extension provides 16-bit encodings of commonly used instru
|=======================
| Class | Instructions | Execution cycles
| ALU | `c.addi4spn` `c.nop` `c.add[i]` `c.li` `c.addi16sp` `c.lui` `c.and[i]` `c.sub` `c.xor` `c.or` `c.mv` | 2
| ALU | `c.srli` `c.srai` `c.slli` | 3 + 1..32; FAST_SHIFT: 4
| ALU | `c.srli` `c.srai` `c.slli` | 3 + 1..32; `CPU_FAST_SHIFT_EN`: 4
| Branches | `c.beqz` `c.bnez` | taken: 6; not taken: 3
| Jumps / calls | `c.jal[r]` `c.j` `c.jr` | 6
| Memory access | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4
Expand Down Expand Up @@ -612,7 +612,7 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.
| Class | Instructions | Execution cycles
| ALU | `add[i]` `slt[i]` `slt[i]u` `xor[i]` `or[i]` `and[i]` `sub` `lui` `auipc` | 2
| No-operation | "`nop`" | 2
| ALU shifts | `sll[i]` `srl[i]` `sra[i]` | 3 + 1..32; FAST_SHIFT: 4
| ALU shifts | `sll[i]` `srl[i]` `sra[i]` | 3 + 1..32; `CPU_FAST_SHIFT_EN`: 4
| Branches | `beq` `bne` `blt` `bge` `bltu` `bgeu` | taken: 6; not taken: 3
| Jump/call | `jal[r]` | 6
| Load/store | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5
Expand Down Expand Up @@ -651,7 +651,7 @@ This ISA extension is implemented as multi-cycle ALU co-process (`rtl/core/neorv
[options="header", grid="rows"]
|=======================
| Class | Instructions | Execution cycles
| Multiplication | `mul` `mulh` `mulhsu` `mulhu` | 36; FAST_MUL: 4
| Multiplication | `mul` `mulh` `mulhsu` `mulhu` | 36; `CPU_FAST_MUL_EN`: 4
| Division | `div` `divu` `rem` `remu` | 36
|=======================

Expand Down Expand Up @@ -873,11 +873,11 @@ generic. This ISA extension is implemented as multi-cycle ALU co-processor (`rtl
|=======================
| Class | Instructions | Execution cycles
| Logic with negate | `andn` `orn` `xnor` | 4
| Count leading/trailing zeros | `clz` `ctz` | 6 + 1..32; FAST_SHIFT: 4
| Count population | `cpop` | 6 + 32; FAST_SHIFT: 4
| Count leading/trailing zeros | `clz` `ctz` | 6 + 1..32; `CPU_FAST_SHIFT_EN`: 4
| Count population | `cpop` | 6 + 32; `CPU_FAST_SHIFT_EN`: 4
| Integer maximum/minimum | `min[u]` `max[u]` | 4
| Sign/zero extension | `sext.b` `sext.h` `zext` | 4
| Bitwise rotation | `rol` `ror[i]` | 6 + _shift_amount_; FAST_SHIFT: 4
| Bitwise rotation | `rol` `ror[i]` | 6 + _shift_amount_; `CPU_FAST_SHIFT_EN`: 4
| OR-combine | `orc.b` | 4
| Byte-reverse | `rev8` | 4
|=======================
Expand Down Expand Up @@ -1078,13 +1078,13 @@ core does implement must adhere to the requirements of `Zkt`.
|=======================
| Parent extension | Instructions | Data independent execution time?
.2+<| `RVI` <| `lui` `auipc` `add[i]` `slt[i][u]` `xor[i]` `or[i]` `and[i]` `sub` <| yes
<| `sll[i]` `srl[i]` `sra[i]` <| yes if `FAST_SHIFT_EN` enabled
<| `sll[i]` `srl[i]` `sra[i]` <| yes if `CPU_FAST_SHIFT_EN` enabled
| `RVM` | `mul[h]` `mulh[s]u` | yes
.2+<| `RVC` <| `c.nop` `c.addi` `c.lui` `c.andi` `c.sub` `c.xor` `c.and` `c.mv` `c.add` <| yes
<| `c.srli` `c.srai` `c.slli` <| yes if `FAST_SHIFT_EN` enabled
<| `c.srli` `c.srai` `c.slli` <| yes if `CPU_FAST_SHIFT_EN` enabled
| `RVK` | `aes32ds[m]i` `aes32es[m]i` `sha256sig*` `sha512sig*` `sha512sum*` `sm3p0` `sm3p1` `sm4ed` `sm4ks` | yes
.2+<| `RVB` <| `xperm4` `xperm8` `andn` `orn` `xnor` `pack[h]` `brev8` `rev8` <| yes
<| `ror[i]` `rol` <| yes if `FAST_SHIFT_EN` enabled
<| `ror[i]` `rol` <| yes if `CPU_FAST_SHIFT_EN` enabled
|=======================


Expand Down
8 changes: 4 additions & 4 deletions docs/datasheet/cpu_csr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -980,9 +980,9 @@ discover ISA sub-extensions and CPU configuration options
| 24 | `CSR_MXISA_ZBS` | r/- | <<_zbs_isa_extension>> available
| 25 | `CSR_MXISA_ZALRSC` | r/- | <<_zalrsc_isa_extension>> available
| 28:26 | - | r/- | _reserved_, hardwired to zero
| 27 | `CSR_MXISA_CLKGATE` | r/- | sleep-mode clock gating implemented when set (`CLOCK_GATING_EN`), see <<_cpu_tuning_options>
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`REGFILE_HW_RST`), see <<_cpu_tuning_options>>
| 29 | `CSR_MXISA_FASTMUL` | r/- | fast multiplication available when set (`FAST_MUL_EN`), see <<_cpu_tuning_options>
| 30 | `CSR_MXISA_FASTSHIFT` | r/- | fast shifts available when set (`FAST_SHIFT_EN`), see <<_cpu_tuning_options>
| 27 | `CSR_MXISA_CLKGATE` | r/- | sleep-mode clock gating implemented when set (`CPU_CLOCK_GATING_EN`), see <<_cpu_tuning_options>
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`CPU_RF_HW_RST_EN`), see <<_cpu_tuning_options>>
| 29 | `CSR_MXISA_FASTMUL` | r/- | fast multiplication available when set (`CPU_FAST_MUL_EN`), see <<_cpu_tuning_options>
| 30 | `CSR_MXISA_FASTSHIFT` | r/- | fast shifts available when set (`CPU_FAST_SHIFT_EN`), see <<_cpu_tuning_options>
| 31 | `CSR_MXISA_IS_SIM` | r/- | set if CPU is being **simulated** (⚠️ not guaranteed)
|=======================
8 changes: 4 additions & 4 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -243,10 +243,10 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `RISCV_ISA_Zmmul` | boolean | false | Enable <<_zmmul_isa_extension>> (hardware-based integer multiplication).
| `RISCV_ISA_Zxcfu` | boolean | false | Enable NEORV32-specific <<_zxcfu_isa_extension>> (custom RISC-V instructions).
4+^| **<<_cpu_tuning_options>>**
| `CLOCK_GATING_EN` | boolean | false | Implement sleep-mode clock gating (see sections <<_sleep_mode>> and <<_processor_clocking>>).
| `FAST_MUL_EN` | boolean | false | Implement fast but large full-parallel multipliers (trying to infer DSP blocks); see section <<_cpu_arithmetic_logic_unit>>.
| `FAST_SHIFT_EN` | boolean | false | Implement fast but large full-parallel barrel shifters; see section <<_cpu_arithmetic_logic_unit>>.
| `REGFILE_HW_RST` | boolean | false | Implement full hardware reset for register file (use individual FFs instead of BRAM); see section <<_cpu_register_file>>.
| `CPU_CLOCK_GATING_EN` | boolean | false | Implement sleep-mode clock gating; see sections <<_sleep_mode>> and <<_cpu_clock_gating>>.
| `CPU_FAST_MUL_EN` | boolean | false | Implement fast but large full-parallel multipliers (trying to infer DSP blocks); see section <<_cpu_arithmetic_logic_unit>>.
| `CPU_FAST_SHIFT_EN` | boolean | false | Implement fast but large full-parallel barrel shifters; see section <<_cpu_arithmetic_logic_unit>>.
| `CPU_RF_HW_RST_EN` | boolean | false | Implement full hardware reset for register file (use individual FFs instead of BRAM); see section <<_cpu_register_file>>.
4+^| **Physical Memory Protection (<<_smpmp_isa_extension>>)**
| `PMP_NUM_REGIONS` | natural | 0 | Number of implemented PMP regions (0..16).
| `PMP_MIN_GRANULARITY` | natural | 4 | Minimal region granularity in bytes. Has to be a power of two, min 4.
Expand Down
163 changes: 120 additions & 43 deletions rtl/core/neorv32_bus.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,89 @@ begin
end neorv32_bus_switch_rtl;


-- ================================================================================ --
-- NEORV32 SoC - Processor Bus Infrastructure: Bus Register Stage --
-- -------------------------------------------------------------------------------- --
-- The NEORV32 RISC-V Processor - https://github.com/stnolting/neorv32 --
-- Copyright (c) NEORV32 contributors. --
-- Copyright (c) 2020 - 2024 Stephan Nolting. All rights reserved. --
-- Licensed under the BSD-3-Clause license, see LICENSE for details. --
-- SPDX-License-Identifier: BSD-3-Clause --
-- ================================================================================ --

library ieee;
use ieee.std_logic_1164.all;

library neorv32;
use neorv32.neorv32_package.all;

entity neorv32_bus_reg is
generic (
REQ_REG_EN : boolean := false; -- enable request bus register stage
RSP_REG_EN : boolean := false -- enable response bus register stage
);
port (
-- global control --
clk_i : in std_ulogic; -- global clock, rising edge
rstn_i : in std_ulogic; -- global reset, low-active, async
-- bus ports --
host_req_i : in bus_req_t; -- host request
host_rsp_o : out bus_rsp_t; -- host response
device_req_o : out bus_req_t; -- device request
device_rsp_i : in bus_rsp_t -- device response
);
end neorv32_bus_reg;

architecture neorv32_bus_reg_rtl of neorv32_bus_reg is

begin

-- Request Register -----------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
request_reg_enabled:
if REQ_REG_EN generate
request_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
device_req_o <= req_terminate_c;
elsif rising_edge(clk_i) then
if (host_req_i.stb = '1') then -- reduce switching activity on device bus system
device_req_o <= host_req_i;
end if;
device_req_o.stb <= host_req_i.stb;
end if;
end process request_reg;
end generate;

request_reg_disabled:
if not REQ_REG_EN generate
device_req_o <= host_req_i;
end generate;


-- Response Register ----------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
response_reg_enabled:
if RSP_REG_EN generate
response_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
host_rsp_o <= rsp_terminate_c;
elsif rising_edge(clk_i) then
host_rsp_o <= device_rsp_i;
end if;
end process response_reg;
end generate;

response_reg_disabled:
if not RSP_REG_EN generate
host_rsp_o <= device_rsp_i;
end generate;


end neorv32_bus_reg_rtl;


-- ================================================================================ --
-- NEORV32 SoC - Processor Bus Infrastructure: Section Gateway --
-- -------------------------------------------------------------------------------- --
Expand Down Expand Up @@ -456,6 +539,24 @@ end neorv32_bus_io_switch;

architecture neorv32_bus_io_switch_rtl of neorv32_bus_io_switch is

-- bus register --
component neorv32_bus_reg
generic (
REQ_REG_EN : boolean := false;
RSP_REG_EN : boolean := false
);
port (
-- global control --
clk_i : in std_ulogic;
rstn_i : in std_ulogic;
-- bus ports --
host_req_i : in bus_req_t;
host_rsp_o : out bus_rsp_t;
device_req_o : out bus_req_t;
device_rsp_i : in bus_rsp_t
);
end component;

-- module configuration --
constant num_devs_c : natural := 32; -- number of device ports

Expand Down Expand Up @@ -493,6 +594,25 @@ architecture neorv32_bus_io_switch_rtl of neorv32_bus_io_switch is

begin

-- Register Stages ------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
neorv32_bus_reg_inst: neorv32_bus_reg
generic map (
REQ_REG_EN => INREG_EN,
RSP_REG_EN => OUTREG_EN
)
port map (
-- global control --
clk_i => clk_i,
rstn_i => rstn_i,
-- bus ports --
host_req_i => main_req_i,
host_rsp_o => main_rsp_o,
device_req_o => main_req,
device_rsp_i => main_rsp
);


-- Combine Device Ports -------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
dev_00_req_o <= dev_req(0); dev_rsp(0) <= dev_00_rsp_i;
Expand Down Expand Up @@ -529,29 +649,6 @@ begin
dev_31_req_o <= dev_req(31); dev_rsp(31) <= dev_31_rsp_i;


-- Optional Input Register ----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
input_reg_enabled:
if INREG_EN generate
request_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
main_req <= req_terminate_c;
elsif rising_edge(clk_i) then
if (main_req_i.stb = '1') then -- reduce switching activity on IO bus system
main_req <= main_req_i;
end if;
main_req.stb <= main_req_i.stb;
end if;
end process request_reg;
end generate;

input_reg_disabled:
if not INREG_EN generate
main_req <= main_req_i;
end generate;


-- Request --------------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
bus_request_gen:
Expand Down Expand Up @@ -595,26 +692,6 @@ begin
end process;


-- Optional Output Register ---------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
output_reg_enabled:
if OUTREG_EN generate
response_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
main_rsp_o <= rsp_terminate_c;
elsif rising_edge(clk_i) then
main_rsp_o <= main_rsp;
end if;
end process response_reg;
end generate;

output_reg_disabled:
if not OUTREG_EN generate
main_rsp_o <= main_rsp;
end generate;


end neorv32_bus_io_switch_rtl;


Expand Down
Loading

0 comments on commit fc0034f

Please sign in to comment.