Wednesday, May 10, 2017

GDDR5 memory timing details



In my Advanced Tonga BIOS editing post, I discussed some basic memory timing information, but did not get into the details.  GDDR5 memory is much more complex than the asynchronous DRAM of 20 years ago.  There are many sources of information on SDRAM, while GDDR information is harder to come by.  Although a thorough description of GDDR5 can be found in the spec published by JEDEC, neither nVIDIA nor AMD share information on how their memory controllers are programmed with memory timing information.  By analyzing the AMD video driver source, and with help from people contributing to a discussion on bitcointalk, I have come to understand most of the workings of AMD BIOS timing straps.

When a modern (R9 series and Rx series) AMD GPU card boots up, memory timing information (straps) are copied from the BIOS to registers in the memory controller.  Some timing information such as refresh frequency is not dependent on the memory speed and therefore is not contained in the memory strap table, but much of the important timing information is.  The memory controller registers are 32-bits wide, and so the 48-byte memory straps map to 12 different memory controller registers.  The shift masks in the Linux driver source are therefore non-functional, and can only be taken as hints as to the meaning of the individual bits.  Due to an apparently bureaucratic process for releasing open-source code, AMD engineers are generally reluctant to update such code.

Jumping right to the code, here's a C structure definition for the Rx memory straps:
SEQ_WR_CTL_D1_FORMAT SEQ_WR_CTL_D1;
SEQ_WR_CTL_2_FORMAT SEQ_WR_CTL_2;
SEQ_PMG_TIMING_FORMAT SEQ_PMG_TIMING;
SEQ_RAS_TIMING_FORMAT SEQ_RAS_TIMING;
SEQ_CAS_TIMING_FORMAT SEQ_CAS_TIMING;
SEQ_MISC_TIMING_FORMAT SEQ_MISC_TIMING;
SEQ_MISC_TIMING2_FORMAT SEQ_MISC_TIMING2;
uint32_t SEQ_MISC1;
uint32_t SEQ_MISC3;
uint32_t SEQ_MISC8;
ARB_DRAM_TIMING_FORMAT ARB_DRAM_TIMING;
ARB_DRAM_TIMING2_FORMAT ARB_DRAM_TIMING2;

Looking at the RAS timing, it consists of 6 fields: RCDW, RCDWA, RCDR, RCDRA, RRD, and RC.  The full field definitions can be found in my fork of Kristy-Leigh's code.  Many of the "pad" fields are likely the high bits of the preceding field that are not currently used.  I tested a couple pad fields already (MISC RP_RDA & RP), confirming that the pad bits were actually the high bits of the fields.


For GDDR5, some timing values have both Long and Short versions that apply for access within a bank group or to different bank groups.  The RRD field of RAS timing is likely RRDL, because the values typically seen for this field are 5 and 6.  If RRDS was 5, this would mean at most one page could be opened every five cycles, limiting 32-byte random read performance to 2/5 or 40% of the maximum interface speed.  From my work with Ethereum mining, I know that RRDS can be no more than 4.  In addition, performance tests with RRD timing reduced to 5 from 6 are congruent with it being RRDL.  The actual value of RRDS used by the memory controller does not seem to be contained in the timing strap.  The default 1750Mhz strap for Samsung K4G4 memory has a value of 10 for FAW, which can be no more than 4 * RRDS.  Therefore RRDS is most likely less than 4, and possibly as low as 2.

To simplify the process of modifying memory straps for improved performance, I wrote strapmod.  I also wrote a cgi wrapper for the program, which you can run from my server http://45.62.227.192/cgi-bin/strapmod.  For example, this is the output with the 1750Mhz strap for Samsung K4G4 memory:
Rx strap detected
Old, new RRD: 6 , 5
Old, new FAW: A , 0
Old, new 32AW: 7 , 0
Old, new ACTRD: 19 , 0x10
777000000000000022CC1C0010626C49D0571016B50BD509004AE700140514207A8900A003000000191131399D2C3617
777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A003000000101131399D2C3617

4 comments:

  1. Good work, Ralph! How much of a boost did you see using the customized timings vs copying the 1500Mhz timings?

    ReplyDelete
    Replies
    1. The benefit depends on the type and the base strap used (I wasn't using the 1500Mhz strap though). The biggest benefit is with Rx cards running high memory clocks. For R9 (i.e. Tonga) running Hynix memory at 1625 with the 1375 strap, it doesn't need much tuning since it already has tight values for RRD and FAW. Elpida 1375 isn't as tight, so my strapmod utility can help.

      Delete
  2. Wow! It works great on Samsung memory of MSI Armor RX 470 series. 0.7 to 0.9 Mhash increase with your timings.

    ReplyDelete
  3. Truly amazing: I went from 28.4Mh/s to 31.7Mh/s using it. Great job.

    ReplyDelete