Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ add optional CPU clock gating #775

Merged
merged 15 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 24.01.2024 | 1.9.3.4 | :sparkles: add optional CPU clock gating (via new generic `CLOCK_GATING_EN`): shut down the CPU clock during sleep mode; :warning: add new HDL design file for the clock gate (`neorv32_clockgate.vhd`) | [#775](https://github.com/stnolting/neorv32/pull/775) |
| 23.01.2024 | 1.9.3.3 | :bug: remove compressed floating point load/store operations as they are **not** supported by `Zfinx` | [#](https://github.com/stnolting/neorv32/pull/771) |
| 20.01.2024 | 1.9.3.2 | optimize bus switch; minor RTL and comment edits | [#769](https://github.com/stnolting/neorv32/pull/769) |
| 14.01.2024 | 1.9.3.1 | minor rtl cleanups and optimizations | [#764](https://github.com/stnolting/neorv32/pull/764) |
Expand Down
31 changes: 22 additions & 9 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -196,19 +196,31 @@ includes the <<_control_and_status_registers_csrs>> as well as the trap controll

==== Sleep Mode

The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing dynamic
power consumption. Sleep mode is entered by executing the `wfi` ("wait for interrupt") instruction.
The NEORV32 CPU provides a single sleep mode that can be entered to power-down the core reducing
dynamic power consumption. Sleep mode is entered by executing the `wfi` ("wait for interrupt") instruction.

[NOTE]
The `wfi` instruction will raise an illegal instruction exception when executed in user-mode
and `TW` in <<_mstatus>> is set. When executed in debug-mode or during single-stepping `wfi` will behave as
simple `nop`.
if `TW` in <<_mstatus>> is set. When executed in debug-mode or during single-stepping `wfi` will behave as
simple `nop` without entering sleep mode.

After executing the `wfi` instruction the CPU's `sleep_o` signal (<<_cpu_top_entity_signals>>) will become set
as soon as the CPU has fully halted ("CPU is sleeping"):

[start=1]
.The front-end (instruction fetch) is stopped. There is no pending instruction fetch bus access.
.The back-end (instruction execution) is stopped. There is no pending data bus access.
.There is not enabled interrupt pending.

In sleep mode all CPU-internal operations are stopped (execution, instruction fetch, counter increments, etc.).
CPU-external modules like memories, timers and peripheral interfaces are not affected by this. Furthermore, the CPU will
continue to buffer/enqueue incoming interrupt requests. The CPU will leave sleep mode as soon as any _enabled (via <<_mie>>)
interrupt source becomes _pending_ or if a debug session is started. As soon as the CPU has "parked" in a safe/resumable
state the `sleep_o` signal becomes high (see <<_cpu_top_entity_signals>>).
continue to buffer/enqueue incoming interrupt. The CPU will leave sleep mode as soon as any _enabled (via <<_mie>>)
interrupt source becomes _pending_ or if a debug session is started.

===== Power-Down Mode

Optionally, the sleep mode can also be used to shut down the CPU's main clock to further reduce power consumption
by halting the core's clock tree. This clock gating mode is enabled by the `CLOCK_GATING_EN` generic
(<<_processor_top_entity_generics>>). See section <<_processor_clocking>> for more information.


==== Full Virtualization
Expand Down Expand Up @@ -339,7 +351,8 @@ direction as seen from the CPU.
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
| `clk_aux_i` | 1 | in | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
| `rstn_i` | 1 | in | Global reset, low-active
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
Expand Down
3 changes: 2 additions & 1 deletion docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -174,14 +174,15 @@ of the entire processor including all the required configuration generics is `ne
.Compile Order
[IMPORTANT]
Most of the RTL sources use **entity instantiation**. Hence, the RTL compile order might be relevant.
The list below shows the hierarchical compile order srarting at the top.
The list below shows the hierarchical compile order starting at the top.

.VHDL Library
[IMPORTANT]
All core VHDL files from the list below have to be assigned to a **new library** named `neorv32`.

...................................
┌-neorv32_package.vhd - Processor/CPU main VHDL package file
├-neorv32_clockgate.vhd - Generic clock gating switch
├-neorv32_fifo.vhd - Generic FIFO component
│ ┌-neorv32_cpu_cp_bitmanip.vhd - Bit-manipulation co-processor (B ext.)
Expand Down
66 changes: 35 additions & 31 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,10 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| Name | Type | Default | Description
4+^| **General**
| `CLOCK_FREQUENCY` | natural | - | The clock frequency of the processor's `clk_i` input port in Hertz (Hz).
| `CLOCK_GATING_EN` | boolean | false | Enable clock gating when CPU is in sleep mode (see sections <<_sleep_mode>> and <<_processor_clocking>>).
| `INT_BOOTLOADER_EN` | boolean | false | Implement the processor-internal <<_bootloader_rom_bootrom>>, pre-initialized with the default <<_bootloader>> image.
| `HART_ID` | suv(31:0) | 0x00000000 | The hart thread ID of the CPU (passed to <<_mhartid>> CSRs).
| `VENDOR_ID` | suv(31:0) | 0x00000000 | JEDEC ID (passed to <<_mvendorid>> CSRs).
| `HART_ID` | suv(31:0) | 0x00000000 | The hart thread ID of the CPU (passed to <<_mhartid>> CSR).
| `VENDOR_ID` | suv(31:0) | 0x00000000 | JEDEC ID (passed to <<_mvendorid>> CSR).
4+^| **<<_on_chip_debugger_ocd>>**
| `ON_CHIP_DEBUGGER_EN` | boolean | false | Implement the on-chip debugger and the CPU debug mode.
| `DM_LEGACY_MODE` | boolean | false | Debug module spec. version: `false` = v1.0, `true` = v0.13 (legacy mode).
Expand Down Expand Up @@ -298,19 +299,39 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt

The processor is implemented as fully-synchronous logic design using a single clock domain that is driven entirely by the
top's `clk_i` signal. This clock signal is used by all internal registers and memories, which trigger on the rising edge of
this clock signal. External "clocks" like the OCD's JTAG clock or the TWI's serial clock are synchronized into the
processor's clock domain before being further processed.
this clock signal - except for the <<_processor_reset>> and the clock switching gate that trigger on a falling edge.
External "clocks" like the OCD's JTAG clock or the SDI's serial clock are synchronized into the processor's clock domain
before being further processed.

==== Clock Gating

The single clock domain of the processor can be split into an always-on clock domain and a switchable clock domain.
The switchable clock domain is used to clock the CPU core. This domain can be deactivated at runtime to reduce power
consumption. The always-on clock domain is used to clock all other processor modules like peripherals, memories and IO
devices. Hence, these modules can continue operation (e.g. a timer keeps running) even if the CPU is shut down.

The splitting into two clock domain is enabled by the `CLOCK_GATING_EN` generic (<<_processor_top_entity_generics>>).
When enabled, a generic clock switching gate is added to decouple the switchable clock from the always-on clock domain
(VHDL file `neorv32_clockgate.vhd`). Whenever the CPU enters <<_sleep_mode>> the CPU clock domain ist shut down.

.Clock Switch Hardware
[NOTE]
Only the registers of the <<_processor_reset>> system trigger on a _falling_ clock edge.
By default, a generic clock gate is used (`rtl/core/neorv32_clockgate.vhd`) to shut down the CPU clock.
Especially for FPGA setups it is highly recommended to replace this default version by a technology-specific primitive
or macro wrapper to improve efficiency (clock skew, global clock tree usage, etc.).


Many processor modules like the UARTs or the timers require a programmable time base for operations. In order to simplify
the hardware, the processor implements a global "clock generator" that provides _clock enables_ for certain frequencies.
These clock enable signals are synchronous to the system's main clock and will be high for only a single cycle. Hence,
processor modules can use these enables for sub-main-clock operations while still having a single clock domain only.

==== Peripheral Clocks

Many processor modules like the UARTs or the timers provide a programmable time base for operations. In order to simplify
the hardware, the processor implements a global "clock generator" that provides _clock enables_ for certain frequencies that
are derived from the man clock. Hence, these clock enable signals are synchronous to the system's main clock and will be high
for only a single cycle. The processor modules can use these enables for sub-main-clock operations while still providing a single
clock domain only.

In total, 8 sub-main-clock signals are available. All processor modules, which feature a time-based configuration, provide a
programmable three-bit prescaler select in their according control register to select one of the 8 available clocks. The
programmable three-bit prescaler select in their control register to select one of the 8 available clocks. The
mapping of the prescaler select bits to the according clock source is shown in the table below. Here, _f_ represents the
processor main clock from the top entity's `clk_i` signal.

Expand All @@ -321,28 +342,11 @@ processor main clock from the top entity's `clk_i` signal.
| Resulting clock: | _f/2_ | _f/4_ | _f/8_ | _f/64_ | _f/128_ | _f/1024_| _f/2048_| _f/4096_
|=======================

The software framework provides pre-defined aliases for the prescaler select bits:

.Prescaler Aliases from `neorv32.h`
[source,c]
--------------------------
enum NEORV32_CLOCK_PRSC_enum {
CLK_PRSC_2 = 0, /**< CPU_CLK (from clk_i top signal) / 2 */
CLK_PRSC_4 = 1, /**< CPU_CLK (from clk_i top signal) / 4 */
CLK_PRSC_8 = 2, /**< CPU_CLK (from clk_i top signal) / 8 */
CLK_PRSC_64 = 3, /**< CPU_CLK (from clk_i top signal) / 64 */
CLK_PRSC_128 = 4, /**< CPU_CLK (from clk_i top signal) / 128 */
CLK_PRSC_1024 = 5, /**< CPU_CLK (from clk_i top signal) / 1024 */
CLK_PRSC_2048 = 6, /**< CPU_CLK (from clk_i top signal) / 2048 */
CLK_PRSC_4096 = 7 /**< CPU_CLK (from clk_i top signal) / 4096 */
};
--------------------------

.Power-Down Mode
.Power Saving
[TIP]
If no peripheral modules requires a clock signal from the internal generator (all available modules disabled by clearing the
enable bit in the according module's control register) the generator is automatically deactivated to reduce dynamic power consumption.

If no peripheral modules requires a clock signal from the internal clock generator (all according modules are disabled by
clearing the enable bit in the according module's control register) the generator is automatically deactivated to reduce
dynamic power consumption.


<<<
Expand Down
7 changes: 4 additions & 3 deletions docs/datasheet/soc_sysinfo.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The SYSINFO allows the application software to determine the setting of most of
that are related to processor/SoC configuration. All registers of this unit are read-only.
This device is always implemented - regardless of the actual hardware configuration. The bootloader as well
as the NEORV32 software runtime environment require information from this device (like memory layout
and default clock speed) for correct operation.
and default clock frequency) for correct operation.


**Register Map**
Expand All @@ -29,7 +29,7 @@ and default clock speed) for correct operation.
[options="header",grid="all"]
|=======================
| Address | Name [C] | Function
| `0xfffffe00` | `CLK` | clock speed in Hz (via top's `CLOCK_FREQUENCY` generic)
| `0xfffffe00` | `CLK` | clock frequency in Hz (via top's `CLOCK_FREQUENCY` generic)
| `0xfffffe04` | `MEM[4]` | internal memory configuration (see <<_sysinfo_memory_configuration>>)
| `0xfffffe08` | `SOC` | specific SoC configuration (see <<_sysinfo_soc_configuration>>)
| `0xfffffe0c` | `CACHE` | cache configuration information (see <<_sysinfo_cache_configuration>>)
Expand Down Expand Up @@ -67,7 +67,8 @@ Bit fields in this register are set to all-zero if the according cache is not im
| `4` | `SYSINFO_SOC_MEM_EXT_ENDIAN` | set if external bus interface uses BIG-endian byte-order (via top's `MEM_EXT_BIG_ENDIAN` generic)
| `5` | `SYSINFO_SOC_ICACHE` | set if processor-internal instruction cache is implemented (via top's `ICACHE_EN` generic)
| `6` | `SYSINFO_SOC_DCACHE` | set if processor-internal data cache is implemented (via top's `DCACHE_EN` generic)
| `11:7` | - | _reserved_, read as zero
| `7` | `SYSINFO_SOC_CLOCK_GATING` | set if CPU clock gating is implemented (via top's `CLOCK_GATING_EN` generic)
| `11:8` | - | _reserved_, read as zero
| `12` | `SYSINFO_SOC_IO_CRC` | set if cyclic redundancy check unit is implemented (via top's `IO_CRC_EN` generic)
| `13` | `SYSINFO_SOC_IO_SLINK` | set if stream link interface is implemented (via top's `IO_SLINK_EN` generic)
| `14` | `SYSINFO_SOC_IO_DMA` | set if direct memory access controller is implemented (via top's `IO_DMA_EN` generic)
Expand Down
79 changes: 79 additions & 0 deletions rtl/core/neorv32_clockgate.vhd
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
-- #################################################################################################
-- # << NEORV32 - Generic Clock Gating Switch >> #
-- # ********************************************************************************************* #
-- # This is a generic clock switch that allows to shut down the clock of certain processor #
-- # modules in order to reduce power consumption. #
-- # #
-- # [NOTE] Especially for FPGA setups, it is highly recommended to replace this default module #
-- # by a technology-/platform-specific macro or primitive (e.g. a clock mux) wrapper. #
-- # ********************************************************************************************* #
-- # BSD 3-Clause License #
-- # #
-- # Copyright (c) 2024, Stephan Nolting. All rights reserved. #
-- # #
-- # Redistribution and use in source and binary forms, with or without modification, are #
-- # permitted provided that the following conditions are met: #
-- # #
-- # 1. Redistributions of source code must retain the above copyright notice, this list of #
-- # conditions and the following disclaimer. #
-- # #
-- # 2. Redistributions in binary form must reproduce the above copyright notice, this list of #
-- # conditions and the following disclaimer in the documentation and/or other materials #
-- # provided with the distribution. #
-- # #
-- # 3. Neither the name of the copyright holder nor the names of its contributors may be used to #
-- # endorse or promote products derived from this software without specific prior written #
-- # permission. #
-- # #
-- # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS #
-- # OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF #
-- # MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE #
-- # COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, #
-- # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE #
-- # GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED #
-- # AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING #
-- # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED #
-- # OF THE POSSIBILITY OF SUCH DAMAGE. #
-- # ********************************************************************************************* #
-- # The NEORV32 Processor - https://github.com/stnolting/neorv32 (c) Stephan Nolting #
-- #################################################################################################

library ieee;
use ieee.std_logic_1164.all;

entity neorv32_clockgate is
port (
clk_i : in std_ulogic; -- global clock line, always-on
rstn_i : in std_ulogic; -- global reset line, low-active, async
halt_i : in std_ulogic; -- shut down clock output when set
clk_o : out std_ulogic -- switched clock output
);
end neorv32_clockgate;

architecture neorv32_clockgate_rtl of neorv32_clockgate is

signal enable : std_ulogic;

begin

-- Warn about Clock Gating ----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
assert false report "[NEORV32] Clock gating enabled (using generic clock switch)." severity warning;


-- Clock Switch ---------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
clock_switch: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
enable <= '1';
elsif falling_edge(clk_i) then -- update on falling edge to avoid glitches on 'clk_o'
enable <= not halt_i;
end if;
end process clock_switch;

-- for FPGA designs better replace this by a technology-specific primitive or macro --
clk_o <= clk_i when (enable = '1') else '0';


end neorv32_clockgate_rtl;
8 changes: 5 additions & 3 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ entity neorv32_cpu is
);
port (
-- global control --
clk_i : in std_ulogic; -- global clock, rising edge
clk_i : in std_ulogic; -- switchable global clock, rising edge
clk_aux_i : in std_ulogic; -- always-on clock, rising edge
rstn_i : in std_ulogic; -- global reset, low-active, async
sleep_o : out std_ulogic; -- cpu is in sleep mode when set
debug_o : out std_ulogic; -- cpu is in debug mode when set
Expand Down Expand Up @@ -162,8 +163,8 @@ begin
-- CPU tuning options --
assert false report "[NEORV32] CPU tuning options: " &
cond_sel_string_f(FAST_MUL_EN, "fast_mul ", "") &
cond_sel_string_f(FAST_SHIFT_EN, "fast_shift ", "" ) &
cond_sel_string_f(REGFILE_HW_RST, "rf_hw_rst", "" )
cond_sel_string_f(FAST_SHIFT_EN, "fast_shift ", "") &
cond_sel_string_f(REGFILE_HW_RST, "rf_hw_rst ", "")
severity note;

-- simulation notifier --
Expand Down Expand Up @@ -207,6 +208,7 @@ begin
port map (
-- global control --
clk_i => clk_i, -- global clock, rising edge
clk_aux_i => clk_aux_i, -- always-on clock, rising edge
rstn_i => rstn_i, -- global reset, low-active, async
ctrl_o => ctrl, -- main control bus
-- instruction fetch interface --
Expand Down
Loading
Loading