Benchmarking and Tuning
Here are the instructions to tune the rieMiner\'s parameters in order to get the best performance, and also includes indications about how to benchmark and compare values. We assume that you already read one of the other rieMiner\'s guide that explain how to mine or find a record.
Benchmarking
Comparing Riecoin mining performance is relatively difficult, and here is what you should know before comparing performance or tuning the settings. There are some benchmarks further below to have ideas about how a given computer should perform or examples of the following remarks.
Metrics
In rieMiner, the performance is based on two metrics,
- The candidates/s c: how many candidates (numbers that could be the first member of a prime constellation) are generated and tested every second. Higher is better. The higher the Difficulty is, the lower the candidates/s will be for the same computing power
- The ratio r: the ratio of candidates found to prime numbers. Lower is better, because that means that you will find more blocks for a same c during mining. The higher the Difficulty is, the larger the ratio will be (it is proportional). It is independent of the computing power.
If you are looking for k-tuples, you can calculate the k-tuple find rate (tuples per second) by doing c/rk. So, multiplying this by 86400 will give the estimated average number of k-tuples every day. This is the relevant metric for comparing performance. Computing the inverse of this value gives the average time to find a k-tuple. This is how the time to find a block is estimated in rieMiner.
That means, in general, do not just consider the candidates/s metric! If it is lower after changing a setting (in particular, the PrimeTableLimit), it does not always mean that the mining performance was reduced. You must look at the k-tuple rate or average time to find one instead. Similarly, a lower candidates/s with a higher Difficulty does not mean that the mining performance is lower.
There are some specific situations where it is enough to consider the candidates/s. This is the case if you can guarantee that the ratio and the Difficulty are always the same across the different benchmarks.
Convergence
The performance metrics take some time to converge, so do not make conclusions too fast about the performance! Test actual mining or use the Benchmark Mode during 10-20 minutes or more. Testing during a couple of minutes will in general not be enough. Note that if you test during mining and more blocks are found, it will reduce the candidates/s a bit, so you might take this in account when comparing the metrics.
Benchmark Mode
You should use this mode in order to compare performance of different computers or settings. Indeed, measuring performance during mining is subject to the random block occurrences, which as said above affect the performance. The Benchmark Mode allows to do "dummy mining" with reproducible conditions and compare more easily performance.
Here is a template of the Benchmark Mode. This is for a benchmark at Difficulty 1024 during 16 minutes. Blocks will appear every 150 s.
Mode = Benchmark Difficulty = 1024 BenchmarkBlockInterval = 150 BenchmarkTimeLimit = 960 BenchmarkPrimeCountLimit = 0 # PrimorialNumber = 70
You must reproduce the current mining conditions (put the current Difficulty). You should also use the same PrimorialNumber as the one used in mining (the guessed value is slightly different between the modes).
The Search Mode is an alternative for benchmarking, but it is less reproducible and does not propose dummy blocks. In the other direction, do not use the Benchmark Mode to find new records!
Tuning
Relevant configuration options
The options that can play a role to the mining performance are PrimeTableLimit
, SieveWorkers
, SieveBits
and SieveIterations
. Threads
can also be used to reduce the number of threads if wanted. Here is a template (to append to the templates from the other guides or the Benchmark template above).
Threads = 0 PrimeTableLimit = 0 SieveWorkers = 0 SieveBits = 0 SieveIterations = 0
You can learn what these settings actually mean by reading the mining algorithm explanation. 0
is a special value that makes an initial but rough guess. Start the miner once with the automatic settings and report the guessed values, shown at the beginning. Then, you can use these values as starting points and tune the parameters like explained below and progressively fill the configuration file with manual values.
PrimeTableLimit and SieveWorkers
They are the main parameters for rieMiner tuning. Generally,
- Higher
PrimeTableLimit
is better until a certain point, though increasing this will also increase the memory usage and may cause CPU Underuse. When increasing the PrimeTableLimit, the candidates/s metric will be lower, but the ratio too. So, do not assume that the mining is slower due to a lower candidates/s: you must use the estimated time to find a block instead like explained above. - Less
SieveWorkers
is better, as more will increase the memory usage and reduce the candidates/s a bit. However, there is a required minimum, as not having enough SieveWorkers will cause CPU Underuse.
To tune them, first look at your CPU usage during mining. It should be maxed out most of the times. If not, then you are experiencing CPU Underuse. For example, the CPU usage graph of the Windows 10\'s Task Manager may look like this:


If there is no CPU Underuse, try both, not in a particular order (you can use your intuition after few tries):
- If you have available free memory, increase the
PrimeTableLimit
until you get some CPU Underuse, run out of memory, or lose performance. - Try to decrement the
SieveWorkers
until there is CPU Underuse.
If there is CPU Underuse, do the inverse operations.
Repeat the process until you feel that the settings are optimal. In all cases, it is trial and error and there is no precise quantity to increase or decrease. Multiply or divide the PrimeTableLimit
by something like 1.1, 1.5, 2, 3 or something else. But you should vary the SieveWorkers
only by steps of 1 or 2.
Other parameters
SieveBits
: higher is better until a certain point, but normally, 25 is already a good value. If you have a CPU with less than 8 MiB of L3 Cache, or have a lot ofSieveWorkers
(more than 4), you can try to decrement this. If you have a lot of L3 Cache (for example with a server CPU), you may also try 26.SieveIterations
: normally, 16 is a good value and you should not have to touch this. It is unclear how this affects performance. You can try to change the value a bit and see if there is any improvement. Smaller values will reduce memory usage.
If you change these values, you should try to retune the PrimeTableLimit
and SieveWorkers
to see if you can still gain more performance.
Remarks for record attempters
The instructions above are also valid for those using the Search Mode or mining for records. Here are few additional remarks in these cases:
- The longer the constellation pattern length is, the lower the
PrimeTableLimit
should be. While it could be well over billions 5-tuples and shorter, it should not exceed a few millions or tens of millions for 10 (at Difficulty ~540) and 9-tuples (~725) for example. - Longer tuples will also usually require a lot of
SieveWorkers
, do not be surprised if you need to raise a lot this number. However, you cannot by default use more than 64 Sieve Workers. If you need more, you will have to add manually somePrimorialOffsets
in the options, though in that case you should rather look for shorter tuples.
Benchmark Results
This section shows some rieMiner benchmark results in order to help comparing different processors, provide an idea on how to tune the parameters, or highlight some observations about current Riecoin mining.
Except when mentioned, an AMD Ryzen R7 3700X was used for the benchmarks, using all the 16 threads, and default settings were used; the constellation pattern is 0, 2, 4, 2, 4, 6, 2 (7-tuples). The benchmarks were done during a Debian 10 Live USB session. rieMiner was recompiled for the machine during the live session just before the benchmarks.
Ratios and blocks per day
Before showing the actual results, it is worth to remind as mentioned above that the ratio is an essential metric of Riecoin mining, the candidates/s metric alone does not mean much usually. Due to how the mining algorithm is constructed, it is actually possible to compute it using the formula
It is not obvious in normal circumstances that the ratios between k and (k + 1)-tuples counts or rates are the same for any k, though the tendency may be observed after long mining sessions or if generating very large numbers of tuples in a benchmark. Here are some values of the product for various PrimeTableLimits.
L | Product |
---|---|
235 = 34359738368 | 0.0231432770 |
234 = 17179869184 | 0.0238239564 |
233 = 8589934592 | 0.0245458897 |
232 = 4294967296 | 0.0253129494 |
231 = 2147483648 | 0.0261294878 |
230 = 1073741824 | 0.0270004472 |
Calculated ratios will be used in the benchmarks below. The blocks/day for k-tuplets is then given by
Results for different processors
Here are benchmarks with different CPUs.
- rieMiner 0.93 except if mentioned otherwise
- Difficulty 1024
- 150 s Block Interval, during 16 minutes
- Prime Table Limit 231. By default, 1 Sieve Worker and 25 Sieve Bits
- Using the calculated ratio r* ≈ 1024*log(2)*0.0261294878 ≈ 18.546259
The turbo/boost features were disabled and the CPU always ran at the mentioned frequency.
a is a normalized metric, and corresponds to the candidates/s without HT/SMT divided by the number of cores and the GHz, yielding a result that can be interpreted as the architecture performance (speed of a single core at 1 GHz for this benchmark). This number is useful to make Riecoin profitability calculators as various processors with the same architecture should have a similar a. The list is sorted by this metric.
Highlighted lines are benchmarks done with actual hardware. Others were extrapolated. Do not compare these values with the ones that you currently obtain while mining! To compare your CPU, you must run the Benchmark Mode in the same conditions as these benchmarks (see above)!
Processor (memory) | Architecture | c/s | r* | b/d | a | Remarks or specific parameters |
---|---|---|---|---|---|---|
AMD Ryzen R9 5950X @ 4 GHz (DDR4 3200 CL14) | Zen 3 | 46137.3 | 18.546 | 5.282 | 554.0 | Extrapolated from 3700X using 19% IPC improvement over Zen 2. 35456.6 c/s extrapolated without SMT (speedup 1.301x). |
Intel Core i7-10900K @ 4 GHz (DDR4 3200 CL14) | Skylake | 21162.5 | 18.546 | 2.422 | 472.4 | Extrapolated using old rieMiner benchmarks for 6700K. HT speedup assumed to be 1.12x (18895.1 c/s). |
AMD Ryzen R7 3700X @ 4 GHz (DDR4 3200 CL14) | Zen 2 | 19385.4 | 18.546 | 2.219 | 465.6 | rieMiner 0.92, 4 Sieve Workers. 14897.8 c/s for 8 Threads (3 Sieve Workers), meaning that the SMT speedup is about 1.301x. |
AMD Ryzen R7 2700X @ 4 GHz (DDR4 3200 CL14) | Zen+ | 16446.4 | 18.546 | 1.882 | 395.0 | Extrapolated from 3700X using old rieMiner benchmarks. 12639.2 c/s extrapolated without SMT (speedup 1.301x). |
AMD Ryzen R7 1800X @ 4 GHz (DDR4 3200 CL14) | Zen | 15663.2 | 18.546 | 1.793 | 376.2 | Extrapolated from 2700X assuming 5% IPC improvement over Zen. 12037.3 c/s extrapolated without SMT (speedup 1.301x). |
Intel Core i7-5775C @ 4 GHz (DDR3 1600 CL8) | Broadwell | 7614.8 | 18.546 | 0.872 | 427.5 | rieMiner 0.92, 2 Sieve Workers. 6839.5 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.113x. |
Intel Core i7-4790K @ 4 GHz (DDR3 1600 CL8) | Haswell | 6406.5 | 18.546 | 0.733 | 369.1 | rieMiner 0.92, 2 Sieve Workers. 5905.0 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.0849x. |
Intel Core i7-3770K @ 4 GHz (DDR3 1600 CL8) | Ivy Bridge | 5910.4 | 18.546 | 0.677 | 327.9 | rieMiner 0.92, 2 Sieve Workers. 5245.7 c/s for 4 Threads (1 Sieve Worker), meaning that the HyperThreading speedup is about 1.127x. |
Intel Core i7-2700K @ 4 GHz (DDR3 1600 CL8) | Sandy Bridge | 5628.9 | 18.546 | 0.644 | 312.2 | Extrapolated from 3770K assuming 5% IPC improvement over Sandy Bridge. 4995.9 c/s extrapolated without HT (speedup 1.127x). |
AMD Phenom II X6 1100T @ 13 x 0.3 = 3.9 GHz (DDR3 1600 CL8) | K10 | 6933.53 | 18.546 | 0.794 | 296.2 | 2 Sieve Workers, 24 Sieve Bits. |
Intel Core i7-875K @ 4 GHz (DDR3 1600 CL8) | Nehalem | 4690.8 | 18.546 | 0.537 | 261.8 | Extrapolated from 2700K assuming 20% IPC improvement over Nehalem. HT speedup assumed to be 1.12x (4188.2 c/s). |
AMD Athlon 64 X2 6400+ @ 3.2 GHz (DDR2 800 CL5) | K8 | 1498.2 | 18.546 | 0.172 | 234.1 | 23 Sieve Bits. |
Intel Core 2 Quad QX9650 @ 4 GHz (DDR3 1600 CL8) | Core 2 | 3707.1 | 18.546 | 0.424 | 231.7 | rieMiner 0.92 |
AMD FX-8350 @ 13.5 x 0.3 = 4.05 GHz (DDR3 1600 CL8) | Piledriver | 7308.9 | 18.546 | 0.837 | 225.7 | 2 Sieve Workers. |
Broadcom BCM2712 @ 2.4 GHz | Cortex-A76 | 1653.2 | 18.546 | 0.189 | 172.2 | Raspberry Pi 5, Raspberry Pi OS 64 bits, 24 Sieve Bits |
Broadcom BCM2711 @ 1.6 GHz | Cortex-A72 | 918.1 | 18.546 | 0.105 | 143.5 | rieMiner 0.92, Raspberry Pi 4, rieMinerL, Raspberry Pi OS 64 bits, 23 Sieve Bits, 24 Sieve Iterations |
Intel Pentium D 965 @ 4 GHz (DDR3 1067 CL6) | Netburst | 806.6 | 18.546 | 0.0492 | 65.4 | 24 Sieve Bits. 523.3 c/s for 2 Threads, meaning that the HyperThreading speedup is about 1.54x. |
Intel Atom D525 @ 1.8 GHz (DDR3 800 CL6) | Bonnell | 294.1 | 18.546 | 0.0336 | 40.1 | 24 Sieve Bits. 144.4 c/s for 2 Threads, meaning that the HyperThreading speedup is about 2x! |
Results for different memory speeds
We notice that memory speed does not matter much (despite rieMiner using a lot of memory) as much worse frequency and latency (DDR4 2400 CL18 vs 3200 CL14) is only about 3% slower.
- Difficulty 1024
- Prime Table Limit 231. 4 Sieve Workers, 150 s Block Interval, during 16 minutes
- Using the calculated ratio r* ≈ 18.546259
Memory Speed | c/s | r* | b/d |
---|---|---|---|
DDR4 3200 CL14 | 19385.4 | 18.546 | 2.219 |
DDR4 3200 CL18 | 19025.1 | 18.546 | 2.178 |
DDR4 2400 CL14 | 19011.2 | 18.546 | 2.176 |
DDR4 2400 CL18 | 18794.4 | 18.546 | 2.152 |
The prime table generation is more sensitive to memory performance (especially the frequency).
Memory Speed | Prime table generation time (s) |
---|---|
DDR4 3200 CL14 | 5.37404 |
DDR4 3200 CL18 | 5.63299 |
DDR4 2400 CL14 | 6.31868 |
DDR4 2400 CL18 | 6.55031 |
Results for Different Difficulties
The notable observation is that the ratio is proportional to the difficulty and follows the formula above. It also gives an idea about how the candidates/s metric depends on the difficulty, though the relation is difficult to establish. It can be approximated by the assumption that it is proportional to about D−2.2 to D−2.6 (D−2.3 is used in the Riecoin protocol).
Difficulty | c/s | r | r* | b/d | Inverse c/s factor () |
---|---|---|---|---|---|
8192 | 100.2 | 156.58 | 148.370 | 0.00000000547 | 197.537 (2.542) |
6144 | 205.0 | 111.10 | 111.278 | 0.0000000839 | 96.541 (2.551) |
4096 | 561.4 | 74.04 | 74.185 | 0.00000392 | 35.260 (2.570) |
3072 | 1256.7 | 56.47 | 55.639 | 0.0000658 | 15.751 (2.509) |
2048 | 3703.0 | 37.10 | 37.093 | 0.00331 | 5.346 (2.418) |
1536 | 7909.3 | 27.75 | 27.819 | 0.0530 | 2.503 (2.263) |
1024 | 19795.2 | 18.54 | 18.546 | 2.266 | 1.000 |
Results for Different Prime Table Limits
These benchmarks highlight the importance of the PrimeTableLimit parameter and that it is important to not just look at the candidates/s metric. They were run at Difficulty 2048 as there is no CPU Underuse with only 1 Sieve Worker in every case. The higher the PrimeTableLimit is, the lower is the ratio, but also the candidates per second.
- Difficulty 2048
- 1 Sieve Worker, no blocks, during 15 minutes
- r is the ratio, r* the calculated ratio, the latter is used to calculate the blocks/day
PrimeTableLimit | c/s | r | r* | b/d |
---|---|---|---|---|
234 = 17179869184 | 3334.6 | 33.95 | 33.820 | 0.005693 |
233 = 8589934592 | 3536.9 | 34.89 | 34.844 | 0.004900 |
232 = 4294967296 | 3641.8 | 35.96 | 35.933 | 0.004068 |
231 = 2147483648 | 3703.7 | 37.10 | 37.093 | 0.003312 |
230 = 1073741824 | 3738.7 | 38.38 | 38.329 | 0.002658 |
224 = 16777216 | 3806.8 | 47.81 | 47.911 | 0.000567 |
216 = 65535 | 3843.1 | 71.99 | 71.849 | 0.000033 |
Despite the candidates/s being lower at higher difficulties, the blocks/days are better.
Results for Different Constellation Patterns
- Difficulty 2048
- No blocks, during 15 minutes
- Prime Table Limit 231. By default, 1 Sieve Worker, 25 Sieve Bits
- Using the calculated ratio r* ≈ 2048*log(2)*0.0261294878 ≈ 37.092517
Length | Pattern | c/s | r* | b/d | Remarks |
---|---|---|---|---|---|
5 | 0, 2, 6, 8, 12 | 3778.6 | 37.093 | 4.649 | |
6 | 0, 4, 6, 10, 12, 16 | 3767.9 | 37.093 | 0.125 | |
7 | 0, 2, 6, 8, 12, 18, 20 | 3703.7 | 37.093 | 0.00331 | |
8 | 0, 2, 6, 8, 12, 18, 20, 26 | 3534.3 | 37.093 | 0.0000852 | |
9 | 0, 2, 6, 8, 12, 18, 20, 26, 30 | 3002.0 | 37.093 | 0.00000195 | 3 Sieve Workers |