Add files via upload

todxx · May 19, 2021 · 0133b3e · 0133b3e · birihtimal · Jun 14, 2021
1 parent 0739e77
commit 0133b3e
Show file tree

Hide file tree

Showing 2 changed files with 313 additions and 0 deletions.
diff --git a/doc/AUTOLYKOS_TUNING.txt b/doc/AUTOLYKOS_TUNING.txt
@@ -0,0 +1,228 @@
+Team Red Miner Autolykos2 (ERGO) Mining
+=======================================
+This document provides some quick pointers on how to tune the
+autolykos2 algo used by ERGO.
+
+
+General background
+==================
+Autolykos2 is a memory-intensive low/medium power algo. However, with
+the small memory accesses involved the algo behaves more like algos
+like verthash rather than ethash. Performance is tied to the core clk,
+and for max speed (especially for Vegas) core clk needs to be higher
+than ethash to support driving the mem controller on the gpu(s).
+
+This algo accesses mem in 32 byte chunks. This means that RDNA
+generation gpus (Navi, Big Navi) will not perform well. Their 128 byte
+cacheline size means that 128 bytes are read for every 32 byte
+request, effectively halving the available memory bandwidth compared
+to GCN (which uses 64 byte cachelines).
+
+
+Polaris Tuning
+==============
+Polaris gpus are simple for autolykos2. We have not spent a lot of
+time on tuning, so the examples below should be seen as a starting
+point, there might be better combinations of core clk, mem clk, mem
+straps to find.
+
+- Quality ethash timings work well.
+
+- Mem clk should be high, existing ethash config is fine.
+
+- Core clk is a big factor for the hashrate.
+
+- In our Polaris tests, a Nitro 470 4GB (Elpida), Nitro+ 570 8GB
+  (Samsung) and Nitro+ 580 8GB (Samsung) all displayed identical
+  hashrates for the same core clk as long as memory bandwidth was
+  sufficient. The 580 will rebuild the table slightly faster though,
+  hence produce a slightly better avg hashrate over time.
+
+Polaris tuning examples
+-----------------------
+Note: sensor power reported, not accurate.
+
+Type           GPU CUs CoreMHz MemMHz TEdge  VDDC   Power
+Nitro+ 570 8GB   0 32  1200    2080   42C    875 mV  75 W
+Nitro+ 470 4GB   1 32  1235    2000   46C    875 mV  59 W
+Nitro+ 580 8GB   2 36  1275    2080   40C    900 mV  80 W
+----------------------- GPU Status -------------------------
+GPU 0 [42C, fan 44%]       autolykos2: 64.70Mh/s
+GPU 1 [46C, fan 44%]       autolykos2: 66.46Mh/s
+GPU 2 [40C, fan 44%]       autolykos2: 68.62Mh/s
+
+
+RX Vega 56/64 Tuning
+====================
+RX Vegas are great for autolykos2 and can reach 200 MH/s when
+stretched to the max, although at a > 200W power draw. Tuning them
+optimally is slightly more complex. Mining distros can help greatly
+here. We discuss three different tunings. Background info:
+
+- Mem timings should be used, ethash timings of some sort are good
+  choices. Other timings for Equihash, Cuckoo or CN can produce good
+  results as well.
+
+- Mem clock does not need to be high unless you're aiming for the
+  highest hashrates.
+
+- The most efficient setups makes sure the soc clk stays at a lower
+  level, and maximizes the mem clk for that level, i.e. sets it to the
+  soc clk frequency. 
+
+- Your _effective_ core clk will decide your hashrate. Vegas are
+  notorious for not running at the configured frequency when AVFS
+  p-states are used.
+
+RX Vega Simple Tuning
+---------------------
+This is for people who don't care about soc clk level and just want to
+start hashing at a decent level around 165 MH/s.
+
+1. Set ethash mem timings (see our ethash guide for examples).
+2. Set core clk to 1225 MHz
+3. Start with mem clk at 960 MHz (Vega 64) or 847 MHz (Vega 56).
+4. Set voltage to 875mV.
+5. Run the miner. Check the hashrate.
+6. Increase core clk until you hit 165 MH/s. If you hit a bottleneck
+   where increased core clk doesn't boost the hashrate, increase mem
+   clk a little more. Repeat from 4.
+7. If you crash, bump voltage a little more. Repeat from 4.
+8. If you run stable for a while, lower voltage.
+
+
+RX Vega Efficient Tuning
+------------------------
+This tuning targets 162-170 MH/s. For Vega 64, flashing a Vega 56 bios
+will be the best choice, but it isn't as critical as for ethash
+mining. The goal is to stay at soc clk 847 MHz for Vega 56 (or Vega 64
+with flashed 56 bios), and soc clk 960 MHz for Vega 64s. You might
+need to lock p-state levels using OverdriveNTool (Windows), mining
+distro helpers, or sysfs controls (Linux).
+
+Note 1: Vega 56 Hynix can follow the same guide below, but ended up
+slightly below 160 MH/s at 847 MHz soc/mem clk for us. You can then
+switch up to 960 MHz soc clk level, following the Vega 64 guide below
+instead. You can keep the mem clk lower than 960 MHz though, depending
+on what hashrate you'd like to target.
+
+Note 2: if none of the above doesn't make sense to you, the critical
+piece of information here is that RX Vegas can't use a mem clk higher
+than the current soc clk. However, a higher soc clk means a more power
+hungry gpu, meaning we can't lower voltage as much as we'd like or the
+gpu will crash. Finding the sweet spot soc clk level, and maximizing
+the use of it by setting mem clk equal to soc clk is important when
+optimizing for efficiency.
+
+ 1. Configure ethash timings.
+ 2. Vega 56: Use core p-state 2: set to 1225 MHz.
+    Vega 64: Use core p-state 3: set to 1225 MHz.
+ 3. Vega 56: Use mem p-state 2: set to 847 MHz.
+    Vega 64: Use mem p-state 3: set to 960 MHz.
+ 4. Set voltage to 850mV as a start.
+ 5. Lock core and mem p-states.
+ 6. Run the miner. Press 's' and verify that the soc clk is at 847 MHz
+    (Vega 56) or 960 MHz (Vega 64).
+ 7. Hopefully you'll reach around 165 MH/s and we're done.
+ 8. If not, increase core clk slightly. Repeat from 6.
+ 9. If you crash, increase voltage.
+10. If you've run stable for a longer period, try lowering voltage.
+
+
+RX Vega Max Performance Tuning
+------------------------------
+This tuning targets 190-200 MH/s. Power draw will be around 200-210W
+at the wall. For Vega 64, flashing a Vega 56 bios will be the best
+choice here as well, but it isn't critical. For this tuning, we just
+go with the highest p-states.
+
+For Vega 56 with Samsung mem, if you have applied timings that can
+reach 53-54 MH/s, then keep them.
+
+Note: for Vega 56 Hynix, the guide below can still be followed, but
+the target hashrate for us had to be lowered to 185 MH/s.
+
+1. Configure ethash timings.
+1. Use core p-state 7: set to 1400 MHz.
+2. Vega 56: Use mem p-state 3: set to 990 MHz if you can run ethash at 52-54 MH/s.
+   Vega 56: Use mem p-state 3: set to 950 MHz if you can run ethash at 50 MH/s.
+   Vega 64: Use mem p-state 3: set to 1107 MHz.
+
+   NOTE: if your gpu can't take the high mem clk values suggested
+            above, set it to the level you can mine ethash at.
+
+3. Set voltage to 900mV as a start.
+3. Lock core and mem p-states.
+4. Run the miner. Check the hashrate.
+5. As long as you're underperforming the hashrate target, keep raising
+   the core clk. Under plain amdgpu-pro on linux, the scaling is
+   absurd and you might have to increase up to 1600 MHz before your
+   true effective clock is around 1400 MHz. Windows does not scale as
+   aggressively.
+6. If you crash, increase voltage.
+7. If you continue to crash even with 925mV or so, you need to give up
+   and settle for a lower hashrate target with a lowered mem clk.
+8. If you've run stable for a longer period, try lowering voltage.
+
+
+Radeon VII Tuning 
+=================
+Radeon VIIs perform reasonably well, but are limited by core clock and
+the resulting high power usage.  Typical VIIs can expect to hit between
+210-240 MH/s on air cooling, and up to 270MH/s on liquid cooling.  To
+reach the highest hashrate and efficiency you will need to run in linux
+using the same setup procedure to run ethash C mode as described in 
+the ETHASH_TUNING_GUIDE.txt (linux kernel params + running as root).
+Using a mining distro that already supports the changes needed for TRM
+ethash C mode will be the easiest option. We briefly discuss some high
+level VII tuning concepts below:
+
+- Mem timings are NOT important.  VIIs will very much be bottlenecked
+  on core clock, and memory tuning does not need to be pushed.
+
+- Mem clock can be significantly lowered to save power and keep the 
+  HBM2 cool.  Even at high hashrates, memory clk can usually be dropped
+  to around 750MHz.
+
+- The limiting factor in hashrate will be core clk.  This in turn will
+  be limited by the cooling of the card.
+
+- The TRM 'VII Boost' enabled by the ethash C mode procedure described 
+  above will increase hashrate by around 10% at the same core clock.
+
+Radeon VII tuning examples
+--------------------------
+These are rough examples that should serve as a good starting point for
+tuning.
+Note: sensor power reported, not accurate.
+
+Setup   CoreMHz SocMHz MemMHz  VDDC    Power   Peak Hashrate
+Linux*  1500    971    801     850 mV  145 W     237.5Mh/s
+Linux*  1700    971    801     925 mV  183 W     268.2Mh/s
+Windows 1500    971    801     850 mV   -       ~210.0Mh/s
+* - Linux tests performed with kernel params set as described for
+      ethash C mode.
+
+Navi GPUs
+=========
+As stated above, Navis simply won't do that well on autolykos2 due to
+architectural changes that don't work well with the smaller mem
+accesses. Therefore, we don't expect RDNA gpus to run this algo.
+
+For tuning, you can use an existing configuration for ethash as a
+starting point, then lower the core clk about -10%. 
+
+Example tunings:
+Type    GPU CUs CoreMHz SocMHz MemMHz TEdge TMem  VDDC   Power
+5700XT  0   40  1100    1085   912    41C   70C   787 mV  84 W
+5600XT  1   36   950    1266   910    40C   70C   800 mV  93 W
+------------------------ GPU Status ---------------------------
+GPU 0 [41C, fan  0%]       autolykos2: 108.8Mh/s
+GPU 1 [40C, fan 49%]       autolykos2: 82.12Mh/s
+
+Type    GPU CUs CoreMHz SocMHz MemMHz TEdge TMem  VDDC   Power
+RX6800  0   60  1075    685    1049   52C   76C   787 mV 116 W (voltage not tuned) 
+------------------------ GPU Status ---------------------------
+GPU 0 [52C, fan 28%]       autolykos2: 118.8Mh/s
+
+
diff --git a/doc/DUAL_ZIL_MINING.txt b/doc/DUAL_ZIL_MINING.txt
@@ -0,0 +1,85 @@
+Team Red Miner Dual ZIL Mining
+==============================
+This document describes how to enable dual ZIL mining in TRM together
+with a range of other primary algos. It describes both the new
+mechanism introduced in v0.8.3 and the older variant that only can be
+used when the primary algo is ethash (including etchash).
+
+
+New mechanism from v0.8.3
+-------------------------
+From v0.8.3, take any existing TRM command line configuration for
+ethash, kawpow, verthash or autolykos and add:
+
+--zil -o stratum+tcp://eu.ezil.me:5555 -u <eth wallet>.<zil wallet>.<worker> -p x --zil_end
+
+preferably at the end of the command line. The miner will pause the
+primary algo and switch to ZIL during the ZIL windows, then switch
+back to the primary algo afterwards. More primary algos might be added
+if there's enough interest. The intention is that the miner
+automatically configures the primary algo to work well with the dual
+ZIL mining. It is fully possible to add more arguments between the
+--zil and --zil_end arguments, all arguments in USAGE.txt are
+available, but the default configuration chosen by the miner should be
+optimal for most setups with the following settings:
+
+- Cached copy of the ZIL epoch 0 DAG.
+- Adjustment of memory allocated for the primary algo to fit the ZIL DAG.
+- Choose ethash A-mode for the ZIL mining.
+- Use the standard faster kernels for 4GB gpus since the DAG is max 1GB.
+
+
+Pool Support
+------------
+We have primarily tested on ezil.me, and cleared that they are ok with
+having empty connections that only mine during ZIL windows. Other
+pools like rustpool.xyz and K1Pool should work ok as well.
+
+
+Potential issues with ethash B/C-mode
+-------------------------------------
+The TRM B- and C-modes want to use as much vram as possible. The value
+of the B/C-mode is diminished as the amount of allocated vram
+decreases. This typically means you need to increase the core clk for
+a preserved hashrate.
+
+When you add dual ZIL mining to an existing mining config where one or
+more gpus are running in B/C-mode, there's an obvious conflict of
+interest: to be able to immediately start mining ZIL, you want the DAG
+for epoch zero cached and ready in vram. This will consume around 1GB,
+and therefore steal vram from the B/C-modes.
+
+The auto config mode with --zil ... --zil_end will automatically
+reduce the vram allocated for B/C-modes to make room for the cached
+ZIL DAG. If you e.g. run a rig of 5700XTs running in B-mode mining ETH
+and then add ZIL, you will most probably see a reduced hashrate during
+the ETH mining and need to increase core clk.
+
+If you want to keep your current ETH tuning, the other way is to use
+the old way of running dual ZIL mining (see section below), and simply
+not cache the ZIL DAG but rebuild DAGs as you enter/exit the ZIL
+mining windows. This will steal some mining time for each ZIL window
+instead.
+
+
+Old mechanism up until v0.8.2.1
+-------------------------------
+Before v0.8.3, TRM only supported dual ZIL mining together with any
+other ethash coin, typically ETH or ETC. This configuration was more
+complex. You needed to:
+
+- Set the pool strategy to --pool_strategy=min_epoch.
+
+- Configure multiple pools, where the first pool must be the ETH/ETC
+  primary pool.
+
+- Instruct the miner to use a DAG cache and to prebuild epoch zero
+  using --eth_dag_cache=0. This is an optional step.
+
+- When using a DAG cache, not use B/C-modes on any gpus by adding
+  e.g. --eth_config=A.
+
+This way of running dual ZIL mining is still fully supported, and the
+recommended way in some special cases (see the Potential issues
+section). See the start script bundled in the miner release for a
+working example.