Wednesday, May 10, 2017

GDDR5 memory timing details

In my Advanced Tonga BIOS editing post, I discussed some basic memory timing information, but did not get into the details.  GDDR5 memory is much more complex than the asynchronous DRAM of 20 years ago.  There are many sources of information on SDRAM, while GDDR information is harder to come by.  Although a thorough description of GDDR5 can be found in the spec published by JEDEC, neither nVIDIA nor AMD share information on how their memory controllers are programmed with memory timing information.  By analyzing the AMD video driver source, and with help from people contributing to a discussion on bitcointalk, I have come to understand most of the workings of AMD BIOS timing straps.

When a modern (R9 series and Rx series) AMD GPU card boots up, memory timing information (straps) are copied from the BIOS to registers in the memory controller.  Some timing information such as refresh frequency is not dependent on the memory speed and therefore is not contained in the memory strap table, but much of the important timing information is.  The memory controller registers are 32-bits wide, and so the 48-byte memory straps map to 12 different memory controller registers.  The shift masks in the Linux driver source are therefore non-functional, and can only be taken as hints as to the meaning of the individual bits.  Due to an apparently bureaucratic process for releasing open-source code, AMD engineers are generally reluctant to update such code.

Jumping right to the code, here's a C structure definition for the Rx memory straps:
uint32_t SEQ_MISC1;
uint32_t SEQ_MISC3;
uint32_t SEQ_MISC8;

Looking at the RAS timing, it consists of 6 fields: RCDW, RCDWA, RCDR, RCDRA, RRD, and RC.  The full field definitions can be found in my fork of Kristy-Leigh's code.  Many of the "pad" fields are likely the high bits of the preceding field that are not currently used.  I tested a couple pad fields already (MISC RP_RDA & RP), confirming that the pad bits were actually the high bits of the fields.

For GDDR5, some timing values have both Long and Short versions that apply for access within a bank group or to different bank groups.  The RRD field of RAS timing is likely RRDL, because the values typically seen for this field are 5 and 6.  If RRDS was 5, this would mean at most one page could be opened every five cycles, limiting 32-byte random read performance to 2/5 or 40% of the maximum interface speed.  From my work with Ethereum mining, I know that RRDS can be no more than 4.  In addition, performance tests with RRD timing reduced to 5 from 6 are congruent with it being RRDL.  The actual value of RRDS used by the memory controller does not seem to be contained in the timing strap.  The default 1750Mhz strap for Samsung K4G4 memory has a value of 10 for FAW, which can be no more than 4 * RRDS.  Therefore RRDS is most likely less than 4, and possibly as low as 2.

To simplify the process of modifying memory straps for improved performance, I wrote strapmod.  I also wrote a cgi wrapper for the program, which you can run from my server  For example, this is the output with the 1750Mhz strap for Samsung K4G4 memory:
Rx strap detected
Old, new RRD: 6 , 5
Old, new FAW: A , 0
Old, new 32AW: 7 , 0
Old, new ACTRD: 19 , 0x10


  1. Good work, Ralph! How much of a boost did you see using the customized timings vs copying the 1500Mhz timings?

    1. The benefit depends on the type and the base strap used (I wasn't using the 1500Mhz strap though). The biggest benefit is with Rx cards running high memory clocks. For R9 (i.e. Tonga) running Hynix memory at 1625 with the 1375 strap, it doesn't need much tuning since it already has tight values for RRD and FAW. Elpida 1375 isn't as tight, so my strapmod utility can help.

  2. Wow! It works great on Samsung memory of MSI Armor RX 470 series. 0.7 to 0.9 Mhash increase with your timings.

  3. Truly amazing: I went from 28.4Mh/s to 31.7Mh/s using it. Great job.

  4. Is there any disadvantage for customizing the straps with your tool? I have not found any disadvantage but I always OC first and undervolt with stock straps, I have tried to understand how this affect my testing with no luck ( I am still reading), i was thinking just to just customize my straps with your tool and then test undervolting and OC, It's incredible the quantity of time I spend running experiments just to understand how it works I already have very descent results, but I have a necessity of testing . Thanks in advance.

    1. I suppose using the custom straps could cause stability problems, but I did a lot of testing on Tonga and Polaris to find the tweaks that improve performance without impacting stability. A couple times I came close to bricking a card. To play it safe, always test custom straps above the boot-up strap. So if your BIOS memory clock is 1750, just change the strap after 1750 like 2000, and the strap will only get used when you overclock the memory beyond 1750.

  5. By the way I am fan of your blog, It's very refreshing.

  6. nice, good job!
    btw the site is offline :(

    1. My virtualhost provider changed service terms on me and suspended my service. I'm working on getting it back online.

  7. Great job!
    Does it works only with Samsung memory?
    Not with Elpida, Hynix etc?

    1. I've tested strapmod with Hynix and Samsung memory on Rx cards. With R9 cards I've tested it with Hynix and Elpida memory.

    2. So, this technics works for each type of memory... hmm, will try right now. Have 3 cards with Elpida.

  8. Thanks so much for the detailed analysis and work around it!

    Does it work for 4gb cards as well? I noticed the reference values are different for the 4 and 8 gb versions.

  9. Do you have the Hynix Memory straps? I can't seem to get your files anymore..

  10. Hi Ralph,

    Your straps does increase reported hashrate by the mining program but the effective hashrate is much lower, thus I suspect these timings are too tight and as a result generate high stale shares number hich explains why I'm seeing lower shares reported by pool. Am I right?

  11. This comment has been removed by the author.

  12. Hai, really interested about this. But i have no idea what your discussion about. I have MSI RX580 8G Gaming X (Hynix), this gpu have 2 set of timing (1: Samsung) & (2: Hynix). I edit bios by copy from 1:1750 and paste to 2:2000 & 2:2250. If you don't mind, can u check if below detail ar the best customize or u can correct for me to get the highest gpu compabilities on hashrate. Tqsm.

    Samsung :-

    TRCDW=14 TRCDWA=14 TRCDR=24 TRCDRA=24 TRRD=5 TRC=69 Pad0=0 TNOPW=0 TNOPR=0 TR2W=28 TCCDL=3 TR2R=5 TW2R=16 Pad0=0 TCL=22 Pad1=0 TRP_WRA=51 TRP_RDA=25 TRP=20 TRFC=157 Pad0=0 PA2RDATA=0 Pad0=0 PA2WDATA=0 Pad1=0 TFAW=0 TCRCRL=2 TCRCWL=7 TFAW32=0
    MC_SEQ_MISC1: 0x20140514
    MC_SEQ_MISC3: 0xA000897A
    MC_SEQ_MISC8: 0x00000003

    Hynix :-

    TNOPW=0 TNOPR=0 TR2W=25 TCCDL=2 TR2R=5 TW2R=17 Pad0=0 TCL=18 Pad1=0 TRP_WRA=48 TRP_RDA=22 TRP=19 TRFC=148 Pad0=0 PA2RDATA=0 Pad0=0 PA2WDATA=0 Pad1=0 TFAW=10 TCRCRL=2 TCRCWL=6 TFAW32=7
    MC_SEQ_MISC1: 0x20140174
    MC_SEQ_MISC3: 0xA000896A
    MC_SEQ_MISC8: 0x20310002

    1. Forgot to tell that i use SRBPolaris v3 on Windows 10. HeHe.