Previous journal: | Next journal: |
---|---|
0140-2023-09-02.md | 0142-2023-09-04.md |
Another quick update with a couple of tests...
This is copied from 0140, to make comparison easier.
Click for details...
Code:
- tt04-raybox-zero:
70f1a6c
: harden_test: Minor update to change parameter order- Equivalent to:
7aae611
: Wire up SPI for fixed pov
- Equivalent to:
- src/raybox-zero:
69f4dca
: test005: Trying shared multiplier for trackInit
Summary:
- test005 branches off test002 (Registered visualWallDist avoids post-subtraction) because this one seems to have the best performance so far (even if by a tiny margin).
- Where we had this:
...it is now a single shared multiplier, driven by state, i.e. with mutiplicand and multiplier inputs selected by a mux on
wire `F2 trackInitX = stepDistX * partialX; wire `F2 trackInitY = stepDistY * partialY;
state
......and result being written towire `F mul_in_a = (state==TracePrepX) ? stepDistX : stepDistY; wire `F mul_in_b = (state==TracePrepX) ? partialX : partialY; wire `F2 mul_out = mul_in_a * mul_in_b;
trackDistX
andtrackDistY
respectively:TracePrepX: ... trackDistX <= `FF(mul_out); ... TracePrepY: ... trackDistY <= `FF(mul_out); ...
Options used:
STARTED: 2023-09-01 22:39:33
STOPT: 0
OUTFILE: stats-test005.md
SELECT: :[1245]
FORCE: 0
TAG: test005: Trying shared multiplier for trackInit
FINISHED: 2023-09-01 23:16:12
4x2:1 | 8x2:1 | 4x2:2 | 8x2:2 | 4x2:3 | 8x2:3 | 4x2:4 | 8x2:4 | 4x2:5 | 8x2:5 | |
---|---|---|---|---|---|---|---|---|---|---|
suggested_mhz | 24.39 | 25.00 | 47.62 | 50.00 | 0.00 | 0.00 | 45.52 | 44.80 | 25.00 | 25.00 |
utilisation_% | 60.38 | 29.97 | 60.38 | 29.97 | 0.00 | 0.00 | 49.08 | 24.36 | 48.86 | 24.25 |
wire_length_um | 0 | 258,488 | 0 | 260,097 | 0 | 0 | 181,794 | 190,478 | 180,108 | 174,155 |
TotalCells | 11,037 | 36,469 | 11,037 | 36,491 | 0 | 0 | 20,672 | 35,751 | 20,509 | 35,604 |
cells_pre_abc | 12,756 | 12,756 | 12,756 | 12,756 | 0 | 0 | 12,756 | 12,756 | 12,756 | 12,756 |
synth_cells | 8,717 | 8,717 | 8,717 | 8,717 | 0 | 0 | 7,302 | 7,302 | 7,302 | 7,302 |
logic_cells | 8,717 | 9,266 | 8,717 | 9,286 | 0 | 0 | 7,842 | 7,874 | 7,823 | 7,813 |
pin_antennas | -1 | 2 | -1 | 4 | 0 | 0 | 4 | 3 | 2 | 3 |
net_antennas | -1 | 2 | -1 | 3 | 0 | 0 | 3 | 3 | 2 | 3 |
spef_wns | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -1.97 | -2.32 | 0.00 | 0.00 |
spef_tns | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -107.26 | -128.96 | 0.00 | 0.00 |
Findings:
- 4x2:4 fits, but doesn't reach 50MHz (gets closer though)
- Speed isn't necessarily better, but area is much better at 48.86% for 4x2:5
- Overall, shared multiplier seems to be a winner, and makes a notable difference for just reducing from 2 to 1.
Click for details...
Code:
- tt04-raybox-zero:
70f1a6c
: harden_test: Minor update to change parameter order- Equivalent to:
7aae611
: Wire up SPI for fixed pov
- Equivalent to:
- src/raybox-zero:
96ecd25
: test006: shared multiplier with registered inputs instead of mux
Summary:
- test006 branches off test005 (Shared multiplier for trackInit)
- Changes shared multiplier's input mux to use registers for each input instead.
Options used:
STARTED: 2023-09-03 19:30:17
STOPT: 0
OUTFILE: stats-test006.md
SELECT: :[1245]
FORCE: 0
TAG: test006: shared multiplier with registered inputs instead of mux
FINISHED: 2023-09-03 20:08:50
4x2:1 | 8x2:1 | 4x2:2 | 8x2:2 | 4x2:3 | 8x2:3 | 4x2:4 | 8x2:4 | 4x2:5 | 8x2:5 | |
---|---|---|---|---|---|---|---|---|---|---|
suggested_mhz | 24.39 | 25.00 | 47.62 | 49.60 | 0.00 | 0.00 | 45.64 | 45.89 | 25.00 | 25.00 |
utilisation_% | 61.75 | 30.65 | 61.75 | 30.65 | 0.00 | 0.00 | 50.76 | 25.19 | 50.64 | 25.13 |
wire_length_um | 0 | 269,871 | 0 | 273,273 | 0 | 0 | 201,705 | 203,784 | 193,820 | 195,287 |
TotalCells | 11,197 | 36,444 | 11,197 | 36,579 | 0 | 0 | 20,764 | 35,685 | 20,523 | 35,590 |
cells_pre_abc | 12,739 | 12,739 | 12,739 | 12,739 | 0 | 0 | 12,739 | 12,739 | 12,739 | 12,739 |
synth_cells | 8,877 | 8,877 | 8,877 | 8,877 | 0 | 0 | 7,447 | 7,447 | 7,447 | 7,447 |
logic_cells | 8,877 | 9,403 | 8,877 | 9,545 | 0 | 0 | 8,028 | 8,043 | 7,975 | 7,983 |
pin_antennas | -1 | 9 | -1 | 5 | 0 | 0 | 1 | 4 | 3 | 1 |
net_antennas | -1 | 9 | -1 | 5 | 0 | 0 | 1 | 4 | 3 | 1 |
spef_wns | 0.00 | 0.00 | 0.00 | -0.16 | 0.00 | 0.00 | -1.91 | -1.79 | 0.00 | 0.00 |
spef_tns | 0.00 | 0.00 | 0.00 | -1.09 | 0.00 | 0.00 | -100.02 | -95.65 | 0.00 | 0.00 |
Findings:
- Overall, this is worse than test005, except in 8x2:4 where it's slightly faster.
Click for details...
Code:
- tt04-raybox-zero:
70f1a6c
: harden_test: Minor update to change parameter order- Equivalent to:
7aae611
: Wire up SPI for fixed pov
- Equivalent to:
- src/raybox-zero:
96ecd25
: test006 but modified with more registers being reset in the optional block.
Summary:
- Same as test006 above, but with stepDist and mulin_a/b registers being reset to 0.
Options used:
STARTED: 2023-09-03 19:39:52
STOPT: 0
OUTFILE: stats-test006-extrareset.md
SELECT: :[1245]
FORCE: 0
TAG: test006 but with extra optional reset logic in effect
FINISHED: 2023-09-03 20:23:28
4x2:1 | 8x2:1 | 4x2:2 | 8x2:2 | 4x2:3 | 8x2:3 | 4x2:4 | 8x2:4 | 4x2:5 | 8x2:5 | |
---|---|---|---|---|---|---|---|---|---|---|
suggested_mhz | 24.39 | 25.00 | 47.62 | 49.31 | 0.00 | 0.00 | 46.95 | 46.42 | 25.00 | 25.00 |
utilisation_% | 62.66 | 31.10 | 62.66 | 31.10 | 0.00 | 0.00 | 51.41 | 25.51 | 51.29 | 25.46 |
wire_length_um | 0 | 270,010 | 0 | 271,623 | 0 | 0 | 207,030 | 210,238 | 200,742 | 207,355 |
TotalCells | 11,353 | 36,654 | 11,353 | 36,698 | 0 | 0 | 20,928 | 35,897 | 20,602 | 35,696 |
cells_pre_abc | 12,750 | 12,750 | 12,750 | 12,750 | 0 | 0 | 12,750 | 12,750 | 12,750 | 12,750 |
synth_cells | 9,033 | 9,033 | 9,033 | 9,033 | 0 | 0 | 7,555 | 7,555 | 7,555 | 7,555 |
logic_cells | 9,033 | 9,556 | 9,033 | 9,680 | 0 | 0 | 8,171 | 8,242 | 8,129 | 8,112 |
pin_antennas | -1 | 4 | -1 | 4 | 0 | 0 | 4 | 1 | 3 | 1 |
net_antennas | -1 | 3 | -1 | 4 | 0 | 0 | 4 | 1 | 3 | 1 |
spef_wns | 0.00 | 0.00 | 0.00 | -0.28 | 0.00 | 0.00 | -1.30 | -1.54 | 0.00 | 0.00 |
spef_tns | 0.00 | 0.00 | 0.00 | -3.46 | 0.00 | 0.00 | -67.98 | -82.44 | 0.00 | 0.00 |
Findings:
- Marginally worse again than test006 above, except for faster clock speed in 4x2:4 and 8x2:4 -- more due to randomness??
Click for details...
Code:
- tt04-raybox-zero:
70f1a6c
: harden_test: Minor update to change parameter order- Equivalent to:
7aae611
: Wire up SPI for fixed pov
- Equivalent to:
- src/raybox-zero:
5cfe6e8
: test007: use registered rcp_in instead of mux
Summary:
- This goes back to test005 (shared multiplier) as its base.
- Changes shared reciprocal input from 3-way mux to a register.
- Not expected to change much: A register still needs to be muxed at its input?
Options used:
STARTED: 2023-09-03 23:17:22
STOPT: 0
OUTFILE: stats-test007.md
SELECT: :[1245]
FORCE: 0
TAG: test007: use registered rcp_in instead of mux
FINISHED: 2023-09-04 00:03:56
4x2:1 | 8x2:1 | 4x2:2 | 8x2:2 | 4x2:3 | 8x2:3 | 4x2:4 | 8x2:4 | 4x2:5 | 8x2:5 | |
---|---|---|---|---|---|---|---|---|---|---|
suggested_mhz | 25.00 | 25.00 | 50.00 | 50.00 | 0.00 | 0.00 | 50.00 | 50.00 | 25.00 | 25.00 |
utilisation_% | 58.89 | 29.22 | 58.89 | 29.22 | 0.00 | 0.00 | 49.81 | 24.72 | 49.76 | 24.69 |
wire_length_um | 252,696 | 252,558 | 246,455 | 254,935 | 0 | 0 | 198,647 | 199,845 | 191,574 | 191,679 |
TotalCells | 21,330 | 36,337 | 21,261 | 36,321 | 0 | 0 | 20,684 | 35,694 | 20,428 | 35,577 |
cells_pre_abc | 12,950 | 12,950 | 12,950 | 12,950 | 0 | 0 | 12,950 | 12,950 | 12,950 | 12,950 |
synth_cells | 8,562 | 8,562 | 8,562 | 8,562 | 0 | 0 | 7,386 | 7,386 | 7,386 | 7,386 |
logic_cells | 9,079 | 9,096 | 9,164 | 9,158 | 0 | 0 | 7,974 | 7,969 | 7,919 | 7,934 |
pin_antennas | 1 | 4 | 1 | 5 | 0 | 0 | 3 | 4 | 1 | 2 |
net_antennas | 1 | 4 | 1 | 5 | 0 | 0 | 3 | 3 | 1 | 2 |
spef_wns | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
spef_tns | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Findings:
- All hardens complete sucecssfully.
- Compared with test005, combos 1 and 2 are faster, use a little less area, and they all complete successfully.
- Compared with test005, combos 4 and 5 are faster, and use marginally more area.
- NOTE: In practice, these might not necessarily be faster, because while they reduce the rcp's combo chain length, the rcp already has an extra wait state in the wall_tracer FSM, so it's probably fast enough this way anyway.
- Combos 4 and 5 are still the best overall for general purposes.