Skip to content

Latest commit

 

History

History
212 lines (174 loc) · 10.2 KB

0141-2023-09-03.md

File metadata and controls

212 lines (174 loc) · 10.2 KB

3 Sep 2023

Previous journal: Next journal:
0140-2023-09-02.md 0142-2023-09-04.md

raybox-zero

Another quick update with a couple of tests...

test005: (COPY OF) Trying shared multiplier for trackInit

This is copied from 0140, to make comparison easier.

Click for details...

Code:

  • tt04-raybox-zero: 70f1a6c: harden_test: Minor update to change parameter order
    • Equivalent to: 7aae611: Wire up SPI for fixed pov
  • src/raybox-zero: 69f4dca: test005: Trying shared multiplier for trackInit

Summary:

  • test005 branches off test002 (Registered visualWallDist avoids post-subtraction) because this one seems to have the best performance so far (even if by a tiny margin).
  • Where we had this:
    wire `F2 trackInitX = stepDistX * partialX;
    wire `F2 trackInitY = stepDistY * partialY;
    ...it is now a single shared multiplier, driven by state, i.e. with mutiplicand and multiplier inputs selected by a mux on state...
    wire `F mul_in_a = (state==TracePrepX) ? stepDistX : stepDistY;
    wire `F mul_in_b = (state==TracePrepX) ?  partialX :  partialY;
    wire `F2 mul_out = mul_in_a * mul_in_b;
    ...and result being written to trackDistX and trackDistY respectively:
    TracePrepX:
        ...
        trackDistX <= `FF(mul_out);
        ...
    TracePrepY:
        ...
        trackDistY <= `FF(mul_out);
        ...

Options used:

  STARTED: 2023-09-01 22:39:33
    STOPT: 0
  OUTFILE: stats-test005.md
   SELECT: :[1245]
    FORCE: 0
      TAG: test005: Trying shared multiplier for trackInit
 FINISHED: 2023-09-01 23:16:12
4x2:1 8x2:1 4x2:2 8x2:2 4x2:3 8x2:3 4x2:4 8x2:4 4x2:5 8x2:5
suggested_mhz 24.39 25.00 47.62 50.00 0.00 0.00 45.52 44.80 25.00 25.00
utilisation_% 60.38 29.97 60.38 29.97 0.00 0.00 49.08 24.36 48.86 24.25
wire_length_um 0 258,488 0 260,097 0 0 181,794 190,478 180,108 174,155
TotalCells 11,037 36,469 11,037 36,491 0 0 20,672 35,751 20,509 35,604
cells_pre_abc 12,756 12,756 12,756 12,756 0 0 12,756 12,756 12,756 12,756
synth_cells 8,717 8,717 8,717 8,717 0 0 7,302 7,302 7,302 7,302
logic_cells 8,717 9,266 8,717 9,286 0 0 7,842 7,874 7,823 7,813
pin_antennas -1 2 -1 4 0 0 4 3 2 3
net_antennas -1 2 -1 3 0 0 3 3 2 3
spef_wns 0.00 0.00 0.00 0.00 0.00 0.00 -1.97 -2.32 0.00 0.00
spef_tns 0.00 0.00 0.00 0.00 0.00 0.00 -107.26 -128.96 0.00 0.00

Findings:

  • 4x2:4 fits, but doesn't reach 50MHz (gets closer though)
  • Speed isn't necessarily better, but area is much better at 48.86% for 4x2:5
  • Overall, shared multiplier seems to be a winner, and makes a notable difference for just reducing from 2 to 1.

test006: shared multiplier with registered inputs instead of mux

Click for details...

Code:

  • tt04-raybox-zero: 70f1a6c: harden_test: Minor update to change parameter order
    • Equivalent to: 7aae611: Wire up SPI for fixed pov
  • src/raybox-zero: 96ecd25: test006: shared multiplier with registered inputs instead of mux

Summary:

  • test006 branches off test005 (Shared multiplier for trackInit)
  • Changes shared multiplier's input mux to use registers for each input instead.

Options used:

  STARTED: 2023-09-03 19:30:17
    STOPT: 0
  OUTFILE: stats-test006.md
   SELECT: :[1245]
    FORCE: 0
      TAG: test006: shared multiplier with registered inputs instead of mux
 FINISHED: 2023-09-03 20:08:50
4x2:1 8x2:1 4x2:2 8x2:2 4x2:3 8x2:3 4x2:4 8x2:4 4x2:5 8x2:5
suggested_mhz 24.39 25.00 47.62 49.60 0.00 0.00 45.64 45.89 25.00 25.00
utilisation_% 61.75 30.65 61.75 30.65 0.00 0.00 50.76 25.19 50.64 25.13
wire_length_um 0 269,871 0 273,273 0 0 201,705 203,784 193,820 195,287
TotalCells 11,197 36,444 11,197 36,579 0 0 20,764 35,685 20,523 35,590
cells_pre_abc 12,739 12,739 12,739 12,739 0 0 12,739 12,739 12,739 12,739
synth_cells 8,877 8,877 8,877 8,877 0 0 7,447 7,447 7,447 7,447
logic_cells 8,877 9,403 8,877 9,545 0 0 8,028 8,043 7,975 7,983
pin_antennas -1 9 -1 5 0 0 1 4 3 1
net_antennas -1 9 -1 5 0 0 1 4 3 1
spef_wns 0.00 0.00 0.00 -0.16 0.00 0.00 -1.91 -1.79 0.00 0.00
spef_tns 0.00 0.00 0.00 -1.09 0.00 0.00 -100.02 -95.65 0.00 0.00

Findings:

  • Overall, this is worse than test005, except in 8x2:4 where it's slightly faster.

test006 but with extra optional reset logic in effect

Click for details...

Code:

  • tt04-raybox-zero: 70f1a6c: harden_test: Minor update to change parameter order
    • Equivalent to: 7aae611: Wire up SPI for fixed pov
  • src/raybox-zero: 96ecd25: test006 but modified with more registers being reset in the optional block.

Summary:

  • Same as test006 above, but with stepDist and mulin_a/b registers being reset to 0.

Options used:

  STARTED: 2023-09-03 19:39:52
    STOPT: 0
  OUTFILE: stats-test006-extrareset.md
   SELECT: :[1245]
    FORCE: 0
      TAG: test006 but with extra optional reset logic in effect
 FINISHED: 2023-09-03 20:23:28
4x2:1 8x2:1 4x2:2 8x2:2 4x2:3 8x2:3 4x2:4 8x2:4 4x2:5 8x2:5
suggested_mhz 24.39 25.00 47.62 49.31 0.00 0.00 46.95 46.42 25.00 25.00
utilisation_% 62.66 31.10 62.66 31.10 0.00 0.00 51.41 25.51 51.29 25.46
wire_length_um 0 270,010 0 271,623 0 0 207,030 210,238 200,742 207,355
TotalCells 11,353 36,654 11,353 36,698 0 0 20,928 35,897 20,602 35,696
cells_pre_abc 12,750 12,750 12,750 12,750 0 0 12,750 12,750 12,750 12,750
synth_cells 9,033 9,033 9,033 9,033 0 0 7,555 7,555 7,555 7,555
logic_cells 9,033 9,556 9,033 9,680 0 0 8,171 8,242 8,129 8,112
pin_antennas -1 4 -1 4 0 0 4 1 3 1
net_antennas -1 3 -1 4 0 0 4 1 3 1
spef_wns 0.00 0.00 0.00 -0.28 0.00 0.00 -1.30 -1.54 0.00 0.00
spef_tns 0.00 0.00 0.00 -3.46 0.00 0.00 -67.98 -82.44 0.00 0.00

Findings:

  • Marginally worse again than test006 above, except for faster clock speed in 4x2:4 and 8x2:4 -- more due to randomness??

test007: use registered rcp_in instead of mux

Click for details...

Code:

  • tt04-raybox-zero: 70f1a6c: harden_test: Minor update to change parameter order
    • Equivalent to: 7aae611: Wire up SPI for fixed pov
  • src/raybox-zero: 5cfe6e8: test007: use registered rcp_in instead of mux

Summary:

  • This goes back to test005 (shared multiplier) as its base.
  • Changes shared reciprocal input from 3-way mux to a register.
  • Not expected to change much: A register still needs to be muxed at its input?

Options used:

  STARTED: 2023-09-03 23:17:22
    STOPT: 0
  OUTFILE: stats-test007.md
   SELECT: :[1245]
    FORCE: 0
      TAG: test007: use registered rcp_in instead of mux
 FINISHED: 2023-09-04 00:03:56
4x2:1 8x2:1 4x2:2 8x2:2 4x2:3 8x2:3 4x2:4 8x2:4 4x2:5 8x2:5
suggested_mhz 25.00 25.00 50.00 50.00 0.00 0.00 50.00 50.00 25.00 25.00
utilisation_% 58.89 29.22 58.89 29.22 0.00 0.00 49.81 24.72 49.76 24.69
wire_length_um 252,696 252,558 246,455 254,935 0 0 198,647 199,845 191,574 191,679
TotalCells 21,330 36,337 21,261 36,321 0 0 20,684 35,694 20,428 35,577
cells_pre_abc 12,950 12,950 12,950 12,950 0 0 12,950 12,950 12,950 12,950
synth_cells 8,562 8,562 8,562 8,562 0 0 7,386 7,386 7,386 7,386
logic_cells 9,079 9,096 9,164 9,158 0 0 7,974 7,969 7,919 7,934
pin_antennas 1 4 1 5 0 0 3 4 1 2
net_antennas 1 4 1 5 0 0 3 3 1 2
spef_wns 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
spef_tns 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Findings:

  • All hardens complete sucecssfully.
  • Compared with test005, combos 1 and 2 are faster, use a little less area, and they all complete successfully.
  • Compared with test005, combos 4 and 5 are faster, and use marginally more area.
  • NOTE: In practice, these might not necessarily be faster, because while they reduce the rcp's combo chain length, the rcp already has an extra wait state in the wall_tracer FSM, so it's probably fast enough this way anyway.
  • Combos 4 and 5 are still the best overall for general purposes.