Sunday, July 17, 2016

Diving into the OpenCL deep end

Programs for mining on GPUs are usually written in OpenCL.  It's based on C, which I know well, so a few weeks ago I decided to try to improve some mining OpenCL code.  My intention was to both learn OpenCL and better understand mining algorithims.

I started with simple changes to the OpenCL code for Genoil's ethminer.  I then spent a lot of time reading GCN architecture and instruction set documents to understand how AMD GPUs run OpenCL code.  Since I recently started mining Sia, I took a look at the gominer kernel code, and thought I might be able to optimize the performance.  I tested with the AMD fglrx drivers under Ubuntu 14.04 (OpenGL version string: 4.5.13399) with a r9 290 card.

The first thing I tried was replacing the rotate code in the ror64 function to use amd_bitalign.  The bitalign instruction (v_alignbit_b32) can do a 32-bit rotate in a single cycle, much like the ARM barrel shifter.  I was surprised that the speed did not improve, which suggests the AMD OpenCL drivers are optimized to use the alignbit instruction.  What was worse was that the kernel would calculate incorrect hash values.  After double and triple-checking my code, I found a post indicating a bug with amd_bitalign when using values divisible by 8.  I then tried amd_bytealign, and that didn't work either.  I was able to confirm the bug when I found that a bitalign of 21 followed by 3 worked (albeit slower), while a single bitalign of 24 did not.

It would seem there is no reason to use the amd_bitalign any more.  Relying on the driver to optimize the code makes it portable to other platforms.  I couldn't find any documentation from AMD saying the bitalign and other media ops are deprected, but I did verify that the pragmas make no difference in kernel:
#pragma OPENCL EXTENSION cl_amd_media_ops : enable
#pragma OPENCL EXTENSION cl_amd_media_ops : disable

After finding a post stating the rotate() function is optimized to use alignbit, I tried changing the "ror64(x, y)" calls to "rotate(x, 64-y)".  The code functioned properly but was  actually slower.  By using AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps, I was able to view the assember .isa files, and could tell that the calls to rotate with 64-bit values were using v_lshlrev_b32, v_lshrrev_b64, and v_or_b32 instead of a pair of v_alignbit_b32 instructions.  Besides using 1 additional instruction, the 64-bit shift instructions apparently take 2 or even 4 times longer to execute on some platforms.

In the end, I wasn't able to improve the kernel speed.  I think re-writing the kernel in GCN assembler is probably the best way to get the maximum hashing performance.

Monday, July 11, 2016

Mining Sia coin on Ubuntu

Sia is a hot crypto-currency for miners.  Just a week ago, the sia network hashrate was 6.5 Th/s, and the only way to mine was solo as there were no public pools.  In the last three days,, and started up and the network hashrate grew to 14.7 Th/s, with the two pools making up 80% of the total network hashrate.

Mining on Windows is relatively easy, with nanopool posting a binary build of siamining's gominer fork.  For Ubuntu, you need to build it from the source.  For that, you'll need to install go first.  If you type 'go' in Ubuntu 14.04, you'll get the following message:
The program 'go' is currently not installed. You can install it by typing:
apt-get install gccgo-go

I tried the similar package 'gccgo', which turned out to be a rabbit hole.  The version 1.4.2 referred to in the gominer readme is a version of the package 'golang'.  Neither gccgo-go or gccgo have the latest libraries needed my gominer.  And the most recent version of golang in the standard Ubuntu repositories is 1.3.3.  However the Ethereum foundation publishes a 1.5.1 build of golang in their ppa.

Even with the golang 1.5.1, building gominer wasn't as simple as "go get".  The reason is that the gominer modifications to support pooled mining are in the "poolmod3" branch, and there is no option to install directly from a branch.  So I made my own fork of the poolmod3 branch, and added detailed install instructions for Ubuntu:
add-apt-repository -y ppa:ethereum/ethereum
sudo apt-get update
apt-get install -y git ocl-icd-libopencl1 opencl-headers golang
go get
Once I got it running on a single GPU, I wanted to find out if it was worthwhile to switch my eth mining rigs to sia.  I couldn't find a good sia mining calculator, so I pieced together some information about mining rewards and used the Sia Pulse calculator.  I wanted to compare a single R9 290 clocked at 1050/1125, which gets about 29Mh/s mining eth, earning $2.17/day.  For Sia, the R9 290 gets about 1100Mh, which if you put that into the Sia Pulse calculator along with the current difficulty of 4740Th, it will calculate daily earnings of 6015 SC/day.  Multiplying by the 62c/1000SC shown on will give you a total of $3.73/d, but that will be wrong.  The Sia Pulse calculator defaults to a block reward of 300,000, but that goes down by 1 for each block.  So at block 59,900, the block reward is 240,100. and the actual earnings would be $2.99/d.

Since the earnings are almost 40% better than eth, I decided to switch my mining rigs from eth to sia.  I had to adjust the overclocking settings, as sia is a compute-intensive algorithm instead of a memory-intensive algorithm like ethereum.  After reducing the core clock of a couple cards from 1050 to 1025, the rigs were stable.  When trying out nanopool, I was getting a lot of "ERROR fetching work;" and "Error submitting solution - Share rejected" messages.  I think their servers may have been getting overloaded, as it worked fine when I switched to  I also find has more detailed stats, in particular % of rejected shares (all below 0.5% for me).

I may end up switching back to eth in the near future, since a doubling in network hashare for sia will eventually mean a doubling of the difficulty, cutting the amount of sia mined in half.  In the process I'll at least have learned a bit about golang, and I can easily switch between eth and sia when one is more profitable than the other.

Friday, June 3, 2016

When does 18 = 26? When buying cheap cables.

I recently bought some cheap molex to PCI-e power adapters from a seller on AliExpress.  Although there are deals for quality goods on AliExpress, I was a bit suspicious when I ordered these given just how cheap they were.  PCI-e power connectors are supposed to be rated for 75W of power carried over 2 conductors at 12V, which means 3.1A per conductor.  In order to avoid a large voltage drop the wires used are usually 18AWG, although 20AWG wires (with 1.6x the resistance) would be reasonably safe.

When the package arrived, I inspected the adapter cables, which were labeled 18AWG.  Despite the label, they didn't feel like 18AWG wires, which have a conductor diameter of 1mm.  I decided to do a destructive test on one of the adapters by cutting and stripping one of the wires.  The conductor measured only 0.4mm in diameter, which is actually 26AWG.  The first photo above shows a real 18AWG wire taken from an old ATX PSU next to the fake 18AWG wire from the adapter cables.

When I opened a dispute through AliExpress, things got more amusing.  I provided the photo, as well as an explanation that real 18AWG wire should be 1mm in diameter.  The seller claimed "we never heard of this before", and after exchanging a couple more messages said, "you can't say it is fake just because it is thin".  At that point I realized I was dealing with one of those "you can't fix stupid" situations.

So what would happen if I actually tried to use the adapter cables on a video card that pulls 75W on the PCI-e power connector?  Well you can find posts on overclocking sites about cables that melted and burst into flames.  If you have a cheap PSU without short-circuit protection, when the insulation melts and the wires short, your power supply could be destroyed.  And if that happend I'm sure the AliExpress seller is not going to replace your power supply.  How much hotter the cables would get compared to genuine 18AWG cables is a function of the resistance.  Each gauge has 1.26 times more resistance than the previous, so 20AWG has 1.26^2 = 1.59 times the resistance of 18AWG.  The 26AWG wire used in these cheap adapter cables would have 1.26^8 or just over 6 times the resistance of 18AWG wire, and would have a temperature increase 6 times greater than 18AWG for a given level of current.

It could make for a fun future project; create a resistive load of 75W, take an old ATX PSU, hook up the adapter cables, and see what happens.  People do seem to like pictures and videos of things bursting into flames posted on the internet...

Thursday, May 26, 2016

Installing Python 2.5.1 on Linux

Perl has been my go-to interpreted language for over 20 years now, but in the last few years I've been learning (and liking) python.  Python 2.7 is a standard part of of Linux distributions, and while many recent distributions include Python 3.4, Python 3.5.1 is not so common.  I'm working on some code that will use the new async and await primitives, which are new in Python 3.5.  I've searched Extra Packages for Enterprise Linux and other repositories for Python 3.5 binaries, but the latest I can find is 3.4.  That means I have to build it from source.

While the installation process isn't very complicated, it does require installing gcc and associated build tools first.  Since I'm installing it on a couple servers (devel and prod), I wrote a short (10-line) install script for rpm-based Linux distributions.  Download the script, then run "sh".  The python3.5 binary will be installed in /usr/local/bin/.

When installing pip packages for python3, use "pip3", while "pip" will install python2 packages.  And speaking of pip, you may want to update it to the latest version:
sudo /usr/local/bin/pip3 install --upgrade pip

Friday, April 22, 2016

More about mining

In my last post, I gave a basic introduction to ethereum mining.  Since there is not much information available about eth mining compared to bitcoin mining, and some of the information I have found is even wrong, I decided to go into more detail on eth mining.

Comparing the bitcoin protocol to ethereum, one of the significant differences is the concept of uncle blocks.  When two miners find a block at almost the same time, only one of them can be the next block in the chain, and the other will be an uncle.  They are equivalent to stale blocks in bitcoin, but unlike bitcoin where the stale blocks go unrewarded, uncle blocks are rewarded based on how "fresh" they are, with the highest reward being 4.375 eth.  An example of this can be found in block 1,378,035. Each additional generation that passes (i.e. each increment of the block count) before an uncle block gets included reduces the reward by .625 eth.  An example of an uncle that was 2 generations late getting included in the blockchain can be found in block 1,378,048.  The miner including the uncle in their block gets a bonus of .15625 eth on top of the normal 5 eth block reward.

Based on the current trend, I expect the uncle rate to be in the 6-7% range over the next few months.  With the average uncle reward being around 3.5 eth (most uncles are more than one generation old), uncles provide a bonus income to miners of about 4%.  Since uncles do not factor into ethereum's difficulty formula, when more uncles are mined the difficulty does not increase.  The mining calculators I've looked at don't factor in uncle rewards, so real-world returns from mining in an optimal setup should be slightly higher than the estimates of the mining calculators.

Another thing the calculators do not factor is the .15625 eth uncle inclusion reward, but this is rather insignificant, and most pools do not share the uncle inclusion reward.  Assuming a 6% uncle rate, the uncle inclusion reward increases mining returns by less than 0.2%.  If your pool is down or otherwise unavailable for 3 minutes of the day, that would be a 0.21% loss in mining rewards.  So a stable pool with good network connections is more important than a pool that shares the uncle inclusion reward.  Transaction fees are also another source of mining revenue, but most pools do not share them, and they amount to even less than the uncle inclusion reward in any case.

Finding a good pool for ethereum mining has been much more difficult than bitcoin, where it is pretty hard to beat Antpool.  For optimal mining returns, you need to use stratum mode, and there are two main variations of the stratum protocol for eth mining; dwarf and coinotron.  Coinotron's stratum protocol is directly supported by Genoil's ethminer, which avoids the need to run eth-proxy in addition to the miner. and support coinotron's stratum protocol, while nanopool, f2pool, and mininpoolhub support dwarf's protocol.  Miningpoolhub is able to support both on the same port since the json connection string is different. and coinotron only have servers in Europe, and half the time I've tried to go to coinotron's web site it doesn't even load after 15 seconds.  Miningpoolhub has servers in the US, Europe, and Asia, and has had reasonable uptimes.  As well, the admin responds adequately to issues, and speaks functional english.  They have a status page that shows enough information to be able to confirm that your mining connection to the pool is working properly.  I have a concern over how the pool reports rejected shares, but the impact on mining returns does not appear to be material.  Rejected shares happens on other pools too, and since I am still investigating what is happening with rejected shares, there is not much useful information I can provide about it.

So for now my recommended pool is   My recommended mining progam is v1.0.7 of Genoil's ethminer, which added support for stratum connection failover where it can connect to a secondary pool server if the first goes down.  The Ethereum Foundation is supporting the development of open-source mining pool software, so we may see an ideal eth mining pool in the near future, and maybe even improvements to the official ethminer supporting stratum protocol.

Saturday, April 16, 2016

Digging into ethereum mining

After bitcoin, ethereum (eth) has the highest market capitalization of any cryptocurrency.  Unlike bitcoin, there are no plug-and-play mining options for ethereum.  As was done in the early days of bitcoin, ethereum mining is done with GPUs (primarliy AMD) that are typically used for video gaming.

The first ethereum mining I did was with a AMD R9 280x card using the ethereum foundation's ethminer program under Windows 7e/64.  The installer advised that I should use a previous version of AMD's Catalyst drivers, specifically 15.7.1.  Although the AMD catalyst utilities show some information about the installed graphics card, I like GPU-z as it provides more details.  After setting up the software and drivers, I started mining using dwarfpool since it was the largest ethereum mining pool.

As an "open" pool, dwarf does not require setting up an account in advance.  One potential problem with that is the eth wallet address used for mining does not get validated.  I found this out because I had accidentally used a bitcoin wallet address, and dwarfpool accepted it.  After fixing it, I emailed the admin and had the account balance transferred to my eth wallet.

Dwarf recommends the use of their eth-proxy program, which proxies between the get-work protocol used by ethminer, and the more efficient stratum protocol which is also supported by dwarfpool.  Even using eth-proxy, I wasn't earning as much ethereum as I expected.

The ethereum network is running the homestead release as of 2016/03/14, which replaced the beta release called frontier.  The biggest change in homestead was the reduction in the average block time from 17 seconds to 14.5 seconds, moving half way to the ultimate target of a 12-second block time.  I wasn't sure if the difference in the results I was getting from mining was due to the calculators not having been updated from frontier or some other reason.  After reading a comment in the ethereum mining forum, I realized returns can be calculated with a bit of basic math.

The block reward in ethereum is 5 eth, and with an average block generation time of 14.5 seconds, there is 86400/14.5 * 5 = 29793 eth mined per day.  Ethereum blockchain statistics sites like report the network hash rate which is currently around 2,000 gigahashes per second.  A R9 280x card does about 20 megahashes per second, or 1/100,000th of the network hashrate, and therefore should earn about 29,793/100,000 or 0.298 eth per day.  The manual calculations are in line with my favorite eth mining calculator (although it can be a bit slow loading at times).  Due to the probabilistic nature of mining, returns will vary by 5-10% up or down each day, but in less than a week you can tell if your mining is working optimally.

Using the regular ethminer, or even using eth-proxy, I was unable to get pool returns in line with the calculations.  However using Genoil's ethminer, which natively supports the stratum protocol, I have been able to get the expected earnings from  Dwarf uses an unsupported variation of the stratum protocol, so I could not use Genoil's ethminer with it.  I briefly tried nanopool, but had periods where the pool stopped sending work for several minutes, even though the connection to the pool was still live.

Both the official ethminer and Genoil's version were built using MS Visual C++, so if your system doesn't already have it installed, you'll need MS Visual Studio redistributable files.  Getting the right version of the AMD Windows catalyst drivers for ethminer to work and work well can be problematic.  Version 15.12 works at almost the same speed as 15.7.1, however the crimson version 16 drivers perform about 20% slower.

For me, as a Linux user for over 20 years, the easiest setup for eth mining was with Linux/Ubuntu.  I plan to do another post about mining on Ubuntu.

Sunday, March 27, 2016

Hacking GPU PCIe power connections

Until recently, I never thought much about PCIe power connectors.  Three 12 power and three ground wires was all I thought there was to them.  I thought it was odd that the 8-pin connectors just added two more ground pins and not another power pin, but never bothered to look into it.  That all changed when I got a new GPU card with a single 8-pin connector.

My old card had two 6-pin connectors, which I had plugged a 1-2 18AWG splitter cable into.  That was connected to a 16AWG PCIe power cable, which is good for about 200W at a drop of under 0.1V.  My new card with the single 8-pin connector wouldn't power up with just a 6-pin plug installed.  Using my multi-meter to test for continuity between the pins, I realized that it's not just a row of 12V pins and a row of ground pins.  There was continuity between the three 12V pins, and between three of what I thought were five ground pins.  After searching for the PCIe power connector pinout, I found out why.
Diagram edited from

Apparently some 6-pin PCIe cables only have 2 12V wires, 2 ground, and a grounded sense wire (blue in the diagram above).  With just two 12V wires, a crap 18" 20AWG PCIe power cable would have a drop of over 0.1V at 75W.  Since the 8-pin connector has three 12V pins, it can provide 150% more power.  My 6-pin 16AWG PCIe cable would have voltage drop of only 40mV at 75W, so I just needed to figure out a way to trick the GPU card into thinking I had an 8-pin connector plugged in.  The way to do that is ground the 2nd sense pin (green in diagram above).

I didn't want the modification to be permanent, so soldering a wire to the sense pin was out.  The PCIe power connectors use the same kind of pins as ATX power connectors, and I had an old ATX power connector I had cut from a dead PSU.  To get one of the female contacts out of the ATX connector, I used a hack saw to cut apart the ATX connector.  Not pretty, but I'm no maker, I'm a hacker. :-)  I stripped the end of the wire (red in the first photo), wrapping the bare part of the wire around the screw that holds the card bracket in the case.  I powered up the computer, and the video card worked perfectly.

Looking for a cleaner solution, I decided to make a jumper wire to go between the sense pin and the adjacent ground.  I also did some searching on better ways to remove the female contacts from the connectors.  For this, has a good technique using staples.  When the staples aren't enough to get the contacts out, I found a finish nail counter-sink punch helps.

Here's the end result, using a marrette (wire nut) to make the jumper: