PDP-11/45: Diagnostics XIII - FP11 FPU, cont.

Thu 24 November 2016 by Fritz Mueller

Have been looking into the FP11 MOD problem in spare moments of the past few weeks, but haven't written up an account of the progress, so this will be a bit of a catch-up article.

Having now studied the design of this thing in more depth, there are a few things I find interesting:

  • The inner loops of the multiplication, division, and floating-point normalization algorithms on the FP11 are not implemented in microcode, but rather as "hardware subroutines". Microcode does all the setup of the various internal registers and counters, then pauses while the hardware runs the inner loop, then picks up again to mediate rounding, masking, exceptions, etc. afterward.

  • The multiplication implementation uses an interesting algorithm called "skipping over ones and zeros", described in section 5.3.1 of the FP11 maintenance manual. This reduces the number of time-consuming additions needed on average. It works along the lines of a familiar mental shortcut: suppose you had to multiply some number X by 999. Rather than multiply X by 9 three times and shift and add them all up, you would probably just take X * 1,000 and subtract off X * 1. The key observation is that you can do this for any contiguous string of 9s in the multiplier: subtract the multiplicand from the partial product at the place value where the string begins, then add the multiplicand at one past place value where the string ends. The FP11 implements the binary equivalent of this with a small state machine (comprised of flip-flops MR1, MR0, and STRG1) which identifies strings of contiguous 1s and invokes ALU subtractions and additions on the boundaries as the multiplier is shifted through.

  • Debugging techniques: a KM11 in single-clock-transition mode may be used to step within the hardware subroutines, as they are driven off the main FP11 clock. It can be a lot of switch presses to step through an entire multiply (120 or so clock transitions at least for a double-precision multiply, and typically more because each necessary intermediate add/subtract adds eight clock transitions!) and this gets to be pretty tedious and error-prone. A logic analyzer is very useful here to capture a visualization of an entire multiplication at one go, and enable counting off clock transitions needed to get to something you'd like to take a closer look at with a logic probe. Alternatively, if your FP11 is working well enough to run maintenance instructions, there are software techniques that can prematurely terminate the hardware subroutines and also give some useful visibility into the intermediate states.

I opted to try out the software techniques to see if I could get more information on the (mis)behavior in my FP11 order to focus my hardware troubleshooting. The following program came in handy. This is based off some example code in the FP11 maintenance manual, though I elaborated it slightly with a binary printout routine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
        000000                          AC0=%0
        000001                          AC1=%1
        000002                          AC2=%2
        177560                          SERIAL=177560
        170006                          MRS=170006
000000                                  .ASECT
        001000                          .=1000
001000  170127  040220          START:  LDFPS   #40220          ;DISABLE INTS, SET DBL AND MAINT MODE
001004  172667  000316                  LDD     MLYR,AC2        ;LOAD MULTIPLIER IN AC2
001010  012703  000230                  MOV     #230,R3         ;R3 GETS OCTAL 230 (FRAC MUL MICROSTATE)
001014  170003                          LDUB                    ;LOAD R3 TO MBR
001016  012702  177564                  MOV     #SERIAL+4,R2    ;SERIAL XMIT BASE TO R2
001022  012762  000015  000002          MOV     #15,2(R2)       ;OUTPUT '\R'
001030  105712                          TSTB    (R2)            ;CHECK XMIT CLEAR
001032  100376                          BPL     .-2             ;LOOP UNTIL SO
001034  012762  000012  000002          MOV     #12,2(R2)       ;OUTPUT '\N'
001042  105712                          TSTB    (R2)            ;CHECK XMIT CLEAR
001044  100376                          BPL     .-2             ;LOOP UNTIL SO
001046  005004                          CLR     R4              ;R4 HOLDS SC VALUE
001050  005204                  NXTMUL: INC     R4              ;INCREMENT SC
001052  170004                          LDSC                    ;LOAD 1S COMPLEMENT OF R4 INTO SC
001054  012705  001356          LSTMUL: MOV     #QR+10,R5       ;SET R5 PAST END OF STORAGE TABLE
001060  172567  000232                  LDD     MCND,AC1        ;LOAD MULTIPLICAND INTO AC1
001064  171102                          MULD    AC2,AC1         ;DO PARTIAL MULTIPLY
001066  170007                          STQ0                    ;TRANSFER QR TO AC0
001070  174045                          STD     AC0,-(R5)       ;STORE QR IN TABLE
001072  042715  177600                  BIC     #177600,(R5)    ;CLEAR OFF SIGN AND EXPONENT
001076  170005                          STA0                    ;TRANSFER AR TO AC0
001100  174045                          STD     AC0,-(R5)       ;STORE AR IN TABLE
001102  042715  177600                  BIC     #177600,(R5)    ;CLEAR OFF SIGN AND EXPONENT
001106  170006                          MRS                     ;SHIFT AR AND QR RIGHT ONE PLACE
001110  170006                          MRS                     ;SHIFT AR AND QR RIGHT ONE PLACE
001112  170007                          STQ0                    ;TRANSFER QR TO AC0
001114  174067  000236                  STD     AC0,TEMP        ;STORE QR IN TEMP
001120  016703  000232                  MOV     TEMP,R3         ;FETCH MSW OF QR TO R3
001124  042703  177600                  BIC     #177600,R3      ;CLEAR OFF SIGN AND EXPONENT
001130  006303                          ASL     R3              ;SHIFT MSBS OF QR ONE PLACE LEFT
001132  006303                          ASL     R3              ;SHIFT MSBS OF QR ONE PLACE LEFT
001134  050365  000010                  BIS     R3,10(R5)       ;SET QR59 AND QR58 IN TABLE
001140  170005                          STA0                    ;TRANSFER AR TO AC0
001142  174067  000210                  STD     AC0,TEMP        ;STORE AR IN TEMP
001146  016703  000204                  MOV     TEMP,R3         ;FETCH MSW OF AR TO R3
001152  042703  177600                  BIC     #177600,R3      ;CLEAR OFF SIGN AND EXPONENT
001156  006303                          ASL     R3              ;SHIFT MSBS OF AR ONE PLACE LEFT
001160  006303                          ASL     R3              ;SHIFT MSBS OF AR ONE PLACE LEFT
001162  050315                          BIS     R3,(R5)         ;SET AR59 AND AR58 IN TABLE
001164  012705  001336                  MOV     #AR,R5          ;GET ADDRESS OF FIRST QUAD FOR PRINTING
001170  012700  000010                  MOV     #10,R0          ;R0 COUNTS 8 WORDS IN TWO QUADS
001174  012503                  LWORD:  MOV     (R5)+,R3        ;FETCH NEXT WORD OF QUAD
001176  012701  000020                  MOV     #20,R1          ;R1 COUNTS 16 BITS IN WORD
001202  006103                  LBIT:   ROL     R3              ;ROTATE, HIGH BIT GOES TO CARRY
001204  103405                          BCS     LBIT1           ;SKIP AHEAD IF CARRY SET
001206  012762  000056  000002          MOV     #56,2(R2)       ;OTHERWISE OUTPUT '.'
001214  000167  000006                  JMP     LBIT2           ;AND SKIP AHEAD
001220  012762  000061  000002  LBIT1:  MOV     #61,2(R2)       ;OUTPUT '1'
001226  105712                  LBIT2:  TSTB    (R2)            ;CHECK XMIT CLEAR
001230  100376                          BPL     .-2             ;LOOP UNTIL SO
001232  077115                          SOB     R1,LBIT         ;LOOP OVER BITS IN WORD
001234  012762  000040  000002          MOV     #40,2(R2)       ;OUTPUT ' ' TO SEPARATE WORDS
001242  105712                          TSTB    (R2)            ;CHECK XMIT CLEAR
001244  100376                          BPL     .-2             ;LOOP UNTIL SO
001246  077026                          SOB     R0,LWORD        ;LOOP OVER WORDS IN QUAD
001250  012762  000015  000002          MOV     #15,2(R2)       ;OUTPUT '\R'
001256  105712                          TSTB    (R2)            ;CHECK XMIT CLEAR
001260  100376                          BPL     .-2             ;LOOP UNTIL SO
001262  012762  000012  000002          MOV     #12,2(R2)       ;OUTPUT '\N'
001270  105712                          TSTB    (R2)            ;CHECK XMIT CLEAR
001272  100376                          BPL     .-2             ;LOOP UNTIL SO
001274  020427  000071                  CMP     R4,#71          ;CHECK PASSES AGAINST 57
001300  100663                          BMI     NXTMUL          ;LESS: DO NEXT PASS
001302  001402                          BEQ     LSTPAS          ;EQUAL: DO LAST PASS
001304  000167  171470                  JMP     173000          ;GREATER: RETURN TO M9301 MONITOR
001310  005204                  LSTPAS: INC     R4              ;INDICATE 58TH PASS
001312  000167  177536                  JMP     LSTMUL          ;DO LAST PASS WITHOUT LOADING SC
001316  040200  000000  000000  MCND:   .WORD   040200, 000000, 000000, 000000
001324  000000
001326  040300  000300  000300  MLYR:   .WORD   040300, 000300, 000300, 000300
001334  000300
001336  000000  000000  000000  AR:     .FLT4   0
001344  000000
001346  000000  000000  000000  QR:     .FLT4   0
001354  000000
001356  000000  000000  000000  TEMP:   .FLT4   0
001364  000000
        001000                          .END    START

The idea here is to use the LDUB (load micro-break) and LDSC (load step-counter) maintenance instructions to cause a multiplication to halt partway through. STA0 and STQ0 (store AR, store QR) instructions, in conjunction with the MRS (maintenance right shift) instruction, allow retrieval of the internal fraction registers which are then printed out to the serial console. This is done repetitively, stopping each time one step further on, so the progression of the internal states of AR and QR over the course of the entire multiply may be observed.

A quick aside here on tooling: since I don't currently have any storage or an OS running on my PDP-11, I load and execute diagnostics with PDP11GUI to an M9301 boot monitor over a serial connection. This requires program binaries in LDA (absolute loader) format. For non-trivial MACRO-11 programs I have found it most convenient to use the actual vintage toolchain under RT-11 in the simh simulator, because the assembler and linker provided with PDP11GUI have some limitations. I copy files in and out via the simulated paper tape reader/punch. This is also how I produce the MACRO-11 listings seen on this blog.

Okay, back to the program above, running this on my machine very clearly illustrates the malfunction. Here's what the output looks like:

................ ................ ................ ................ .........11..... .........11..... .........11..... .........11..... ................ ................ ................ ................ ..........11.... ..........11.... ..........11.... ..........11.... ................ ................ ................ ................ ...........11... ...........11... ...........11... ...........11... ................ ................ ................ ................ ............11.. ............11.. ............11.. ............11.. ................ ................ ................ ................ .............11. .............11. .............11. .............11. ................ ................ ................ ................ ..............11 ..............11 ..............11 ..............11 ................ ................ ................ ................ ...............1 1..............1 1..............1 1..............1 .......11.111111 1111111111111111 1111111111111111 1111111111111111 ................ 11.............. 11.............. 11.............. .......111.11111 1111111111111111 1111111111111111 1111111111111111 ................ .11............. .11............. .11............. ..........1.1111 1111111111111111 1111111111111111 1111111111111111 ................ ..11............ ..11............ ..11............ ...........1.111 1111111111111111 1111111111111111 1111111111111111 ................ ...11........... ...11........... ...11........... ............1.11 1111111111111111 1111111111111111 1111111111111111 ................ ....11.......... ....11.......... ....11.......... .............1.1 1111111111111111 1111111111111111 1111111111111111 ................ .....11......... .....11......... .....11......... ..............1. 1111111111111111 1111111111111111 1111111111111111 ................ ......11........ ......11........ ......11........ ...............1 .111111111111111 1111111111111111 1111111111111111 ................ .......11....... .......11....... .......11....... ................ 1.11111111111111 1111111111111111 1111111111111111 ................ ........11...... ........11...... ........11...... ................ .1.1111111111111 1111111111111111 1111111111111111 ................ .........11..... .........11..... .........11..... ................ ..1.111111111111 1111111111111111 1111111111111111 ................ ..........11.... ..........11.... ..........11.... ................ ...1.11111111111 1111111111111111 1111111111111111 ................ ...........11... ...........11... ...........11... ................ ....1.1111111111 1111111111111111 1111111111111111 ................ ............11.. ............11.. ............11.. ................ .....1.111111111 1111111111111111 1111111111111111 ................ .............11. .............11. .............11. ................ ......1.11111111 1111111111111111 1111111111111111 ................ ..............11 ..............11 ..............11 ................ .......1.1111111 1111111111111111 1111111111111111 ................ ...............1 1..............1 1..............1 .......111...... ........1.111111 1111111111111111 1111111111111111 ................ ................ 11.............. 11.............. .......1111..... .........1.11111 1111111111111111 1111111111111111 ................ ................ .11............. .11............. ..........11.... ..........1.1111 1111111111111111 1111111111111111 ................ ................ ..11............ ..11............ ...........11... ...........1.111 1111111111111111 1111111111111111 ................ ................ ...11........... ...11........... ............11.. ............1.11 1111111111111111 1111111111111111 ................ ................ ....11.......... ....11.......... .............11. .............1.1 1111111111111111 1111111111111111 ................ ................ .....11......... .....11......... ..............11 ..............1. 1111111111111111 1111111111111111 ................ ................ ......11........ ......11........ ...............1 1..............1 .111111111111111 1111111111111111 ................ ................ .......11....... .......11....... ................ 11.............. 1.11111111111111 1111111111111111 ................ ................ ........11...... ........11...... ................ .11............. .1.1111111111111 1111111111111111 ................ ................ .........11..... .........11..... ................ ..11............ ..1.111111111111 1111111111111111 ................ ................ ..........11.... ..........11.... ................ ...11........... ...1.11111111111 1111111111111111 ................ ................ ...........11... ...........11... ................ ....11.......... ....1.1111111111 1111111111111111 ................ ................ ............11.. ............11.. ................ .....11......... .....1.111111111 1111111111111111 ................ ................ .............11. .............11. ................ ......11........ ......1.11111111 1111111111111111 ................ ................ ..............11 ..............11 ................ .......11....... .......1.1111111 1111111111111111 ................ ................ ...............1 1..............1 .......111...... ........11...... ........1.111111 1111111111111111 ................ ................ ................ 11.............. .......1111..... .........11..... .........1.11111 1111111111111111 ................ ................ ................ .11............. ..........11.... ..........11.... ..........1.1111 1111111111111111 ................ ................ ................ ..11............ ...........11... ...........11... ...........1.111 1111111111111111 ................ ................ ................ ...11........... ............11.. ............11.. ............1.11 1111111111111111 ................ ................ ................ ....11.......... .............11. .............11. .............1.1 1111111111111111 ................ ................ ................ .....11......... ..............11 ..............11 ..............1. 1111111111111111 ................ ................ ................ ......11........ ...............1 1..............1 1..............1 .111111111111111 ................ ................ ................ .......11....... ................ 11.............. 11.............. 1.11111111111111 ................ ................ ................ ........11...... ................ .11............. .11............. .1.1111111111111 ................ ................ ................ .........11..... ................ ..11............ ..11............ ..1.111111111111 ................ ................ ................ ..........11.... ................ ...11........... ...11........... ...1.11111111111 ................ ................ ................ ...........11... ................ ....11.......... ....11.......... ....1.1111111111 ................ ................ ................ ............11.. ................ .....11......... .....11......... .....1.111111111 ................ ................ ................ .............11. ................ ......11........ ......11........ ......1.11111111 ................ ................ ................ ..............11 ................ .......11....... .......11....... .......1.1111111 ................ ................ ................ ...............1 .......111...... ........11...... ........11...... ........1.111111 ................ ................ ................ ................ .......1111..... .........11..... .........11..... .........1.11111 ................ ................ ................ ................ .........11..... .........11..... .........11..... .........1.11111 ................ ................ ................ ................

The left half of the output above shows the contents of AR throughout the progress of the multiply, and the right half shows the contents of QR. The most significant 57 bits of each are shown, right justified in a 64-bit field.

In the FP11, as the multiplication proceeds, the multiplicand is held constant, while the multiplier (in QR) and partial product (in AR) are successively right shifted. The bits of the multiplier involved in the skip-over-ones-and-zeros sate macheine are QR3 and QR2. QR3 is the rightmost bit shown above. QR2, to its right, is not retrievable by software and thus not shown.

Since the multiplicand in the sample code is 1.0, the result left in AR (bottom row of left half) should be identical with the initial value of the multiplier in QR (top row of right half), but clearly something is amiss with the least significant bits of the result. We can also see that things go awry as the first string off consecutive 1s starts through the state machine (adjusting the values in the test program shows that this is always the case). So this looks like an issue with the state machine or the FALU control signals that derive from it. Taking a look with the logic analyzer shows this:

This is a portion of the multiply dealing with the a string of two consecutive 1s on the multiplier. The clocking and state machine state bits look correct (note that AR clocks falling edges). A four-cycle pause is inserted in the AR clock whenever the state-machine dictates either an add or a subtract is to occur, in order to allow for propagation time through the ALUs. The AR and ALU function selects also look correct: AR 1 for shift, 3 for load, and ALU 6 for subtract, 9 for add. Marker X here should be clocking in a subtraction at the start of the string, followed by two shifts, then an add at marker O at the end of the string.

But the ALU CIN control signal looks incorrect -- it is held high throughout the multiply, but should be driven low for the subtraction at marker X. This means the ALU function actually being selected is A-B-1 instead of A-B, which would produce the results seen above (the first subtract borrows an extra 1 all the way across the partial product, then subsequent subtracts borrow from the resulting 1s on the right). So it looks like the logic that generates CIN needs a look:

Stepping through the multiply with the KM11 in single-clock-transition mode, arriving at the first subtract, FRMH MUL SUB L is asserted low to pin 3 of E21, but pin 6 does not go high. Looks like a failed gate; pulled the part, put in a socket, and put a replacement 74H10 on order. All for now!