PDP-11/45: Parity error handling

Mon 25 May 2020 by Fritz Mueller

[A catch-up article, documenting events of Jan/Feb 2019.]

At the end of the previous article, a bunch of repairs had been made to my MS11-L memory board. The associated MAINDEC diagnostic ZQMC was able to run cleanly but only with parity tests disabled. When parity tests were enabled, the parity fault LED on the MS11 would light (expected) and the machine would halt with ADRS ERR lit (unexpected...)

So the first step is to read and research how memory parity handling is implemented on the KB11-A CPU. Immediately here we run into some trouble:

The 1973 edition of the 11/45 Processor Handbook has a section 2.5.6, "Memory Parity", which states: "Parity errors cause the Central Processor to either trap through location 4 or to halt." There is also an Appendix E, "Memory Parity", which details CSRs for parity memory:

It is stated there that there are 16 of these, at addresses 772110-772146, each corrsponding to an 8K word block of address space.

By the 1976 version of the processor handbook, however, all of this information had been expunged. The new Appendix A, "UNIBUS Addresses", lists range 772110-772136 simply as "UNIBUS memory parity". Here, trap 4 is listed as "CPU errors", and trap 114 is listed as "Memory system errors". All subsequent revisions of the handbook state unambiguously that parity errors generate a trap 114.
What do the KB11-A processor maintenance manuals have to offer? Paragraph 7.7.7 of the 1972 KB11-A maintenance manual states:

A Parity error on the Unibus A is indicated by BUSA PA L high and BUSA PB L low. The parity error causes UNI PERF (Unibus parity error flag) to be set when MSYN is cleared. UNI PERF (1) L asserts UBCB PARITY ERR SET L during the pause cycle, which sets the console (CONF) flag and halts the CPU.

The semiconductor memory control EHA and EHB (enable halt) flip-flops may be set under program control to assert SMCB PE HALT if a parity error is detected. This input also asserts UBCB PARITY ERR SET L, which sets the console flag and halts the CPU. Thus, if either a Unibus A parity error or SMCB PE HALT L is asserted, the processor will be vectored to trap when the CONT switch is pressed.

Note that this text addresses how the CPU handles detected parity errors in both Unibus (first paragraph) and fastbus (second paragraph) memory systems. Unibus parity errors are stated to set the CONF flag and halt the CPU, just as I am seeing on my system... Fastbus parity handling (halt first vs. direct trap) can further be mediated by EHA and EHB, called out here to drawing SMCB in the MS11-B/C fastbus semiconductor memory print set.

But here, too, by the time we get to the later revision 1976 KB11-A,D maintenance manual, this information is revised. The updated description makes no further mention of CONF, halting, or halt control, and seems to imply that all reported parity conditions trap directly through 114.
How about contemporaneous memory systems? The MS11-B/C solid state memory systems released with the 11/45 (note: not what I'm running; I have the much later MS11-L) consisted of either MOS or bipolar memory matrices with an associated controller card (the M8110). These supported both Unibus and fastbus interfaces. Here, in the 1972 schematics, we see the implementation of the EHA/EHB halt control bits, mentioned above, in the upper left of sheet SMCB:

We can see the bit assignments here match the CSR layout from the 1973 processor handbook, and the associated MS11 maintenance manual from 1973 also describes them in its table 3-12:

And once again, by the 1974 revision of the same maintenance manual, no surprise: descriptions of the halt control bits have been expunged from table 3-12. Okay, we're starting to get a consistent picture here...

I don't know much about the core memory systems that were configured with the early 11/45s? It would be interesting to know if anything other than the MS11-B/C ever supported this older CSR layout.
Let's have a look at the KB11-A engineering drawings themselves. The set I've been using during my restoration dates from 1974. The first, most obvious, place to look is trap vector generation; this is accomplished on the lower left of drawing DAPE:

This small combinational net feeds trap vector bits to the K1MX constant multiplexer. One non-obvious wrinkle noted elsewhere on the drawing: vectors generated for reserved instruction (004), EMT (014), and TRAP (016) are further left-shifted, downstream, by microcode (state RSD.10, drawing FLOWS 12) to result in 010, 030, and 034 respectively. That's not strictly relevant to the discussion at hand, but might be helpful if pondering the logic implemented in the diagram above.

This drawing is definitely from the "post 114" era. On a parity error, we'll have ~IOT and ~PIRQ and ~SEGT, together driving TV02 high; that's our traditional vector 004. But here we also see UBCB PE TRAP (1) L, active low, entering from the left. When driven low, we'll get TV03 and TV06 high as well, all together generating vector 114.

Here we can see some clues, too, of how the change to 114 might have been bodged in: as drawn, TV01, TV02, TV03, TV04 and TV05*07 proceed nicely in order from bottom to top. But TV06, needed by the change as the most-significant "1" in "114", looks like it was just wedged in out of order on the drawing... Presumably, it makes use of a previously unused section of hex inverter E11. The change to activate TV03 here as well would have been a cut/jump at the inputs of E7.

And sure enough, here we see differences with my actual hardware! Here's part of the layout of module DAP from the '74 engineering drawings, and a snap the same corner of my DAP spare which is same as the one I'm currently running in the machine:

Note particulary that R17, a pullup for UBCB PE TRAP (1) L, is missing on my board. A little further work with the beeper shows that on my boards E7 pin 1 is connected directly to E7 pin 13, and is not connected to edge connector AP1. E11 pin 3 appears to be NC. Furthermore, examination of the backplane shows that there is no wire wrapped in place at DAP AP1 to deliver signal UBCB PE TRAP (1) from the UBC board. So, I think I can conclude we're not looking at a bug or component failure here; my 11/45 simply pre-dates the change from vector 4 to vector 114.
Okay for the vector, but what about the halt behavior? Here, the text quoted earlier from the 1972 KB11-A maintenance manual has our clue where to look. The parity derived signal eventually resulting in the halt on either Unibus or fastbus parity error is UBCB PARITY ERR SET L (note "SET" in the signal name here, don't confuse with UBCB PARITY ERR L...) The 1974 drawings imply that a fastbus parity err, but not a Unibus parity error, will halt the machine, in conflict with this text. But looking here, we see another bodge clue: the hookup at E68 pins 4 and 5 as drawn looks a little suspicious...

And indeed, on my hardware, E68 pins 4 and 5 are not connected together; rather, E68 pin 5 is connected to E79 (Unibus parity error flag) pin 5. So, Unibus parity errors will also halt this version of the 11/45 hardware, by design.

Some other differences related to parity are also apparent looking at my version of the UBC board. E57, seen above generating UBC PE ABORT L, is not populated. This seems related to some further refinement of abort sequencing, but the cirumstances surrounding the need for this aren't clear to me at this point. Also, jumper W1 and associated logic to entirely disable Unibus parity error detection are not present:

So, what does all this mean? Well, for one thing, there apparently isn't anything actually in need of repair here -- as far as I can tell, this version of the hardware is functioning per design, such as it is.

And as it turns out, with a now properly repaired MS11-L, actual parity errors are few and far between (I've yet to see any that weren't intentionally created by diagnostics.) According to Noel, stock Unix V6 doesn't do anything whatsoever with parity. RSTS/E V06C boot code seems to be properly probing and identifying the CSR on my MS11-L. And good old RT11 has seemed happy enough in the past. So I just may not need a totally up-to-date parity implementation on my machine.

There is still the issue of more broadly tracking down and implementing outstanding ECOs for this machine. I have so far had limited success in locating these (more on this next time!) I'm certainly equipped here to implement field cuts and jumps, but it might get tricky to track down newer versions of boards for any ECOs that involved total swaps to updated etches. In any case, in the absence of complete information on the ECOs I'm hesitant to cherry pick changes such as those identified here unless I am really blocked without them; better by far not to leave the machine in an undocumented "in-between" state.

Footnotes: a lot of the discovery documented here took place in the context of the enthusiast community on the cctalk mailing list, and also in private communications. Noel Chiappa and Paul Koning were both particularly generous with their time (thanks, guys!) Here are some interesting related bits that didn't fit directly in the narrative above, for completeness and for future reference:

On RSTS parity CSR sniffing, from Paul:

From: Paul Koning
To: cctalk
Subject: Re: PDP-11/45 RSTS/E boot problem

Fritz Mueller wrote:

There is a lot of inconsistent and incomplete information in the documentation about memory CSRs. They appear to come in different flavors depending on memory hardware; some of the earlier ones support setting a bit to determine whether parity errors will halt or trap the CPU, while some of the later ones (like my MS11-L) simply have "enable" and don't distinguish between halt and trap. I'm curious how OS init code sniffs out what memory CSRs there are, determines their specific flavors and, in a heterogeneous system, determines how much address space is under the auspice of each CSR? Maybe Paul and Noel can comment here wrt. RSTS and Unix respectively?

I quickly skimmed some RSTS INIT code (for V10.1). Two things observed:

1. At boot, INIT determines the memory layout. It does this by writing 0 then -2 into each location to see if it works. If it gets an NXM trap (trap to 4) or a parity trap (trap to 114) it calls that 1kW block of memory non-existent. For the case of a parity error, it tells you that it saw a parity error and is disabling that block for that reason.

2. In the DEFAULT option (curiously enough) there is a routine that looks for up to 16 parity CSRs starting at 172100. This happens on entry to the memory layout option. You can display what it finds by using the PARITY command in response to the "Table suboption" prompt.

It checks if the bits 007750 are active in the parity CSR, if so it takes that to be an address/ECC parity CSR. It figures out the CSR to memory association by going through memory in 1 kW increments, writing 3, 5 to the first 2 words, then setting "write wrong parity" in each CSR (007044), then doing BIC #3,.. BIC #5,... to those two test words, then reading them both back. This should set bad parity, and it scans all the CSRs to see which one reports an error (top bit in the CSR). If no CSR has that set, it concludes the particular block is no-parity memory.

I probably got some of the details wrong, the above is from a fast skim of the code, but hopefully it will get you started.

My machine currently has one MS11-L, which has the newer CSR layout referred to by Paul above (different than the much older MS11-B/C CSR layout depicted at the top of this article; see MS11-L docs for further details). RSTS init defaults->memory->parity on my system reports (correctly):
```
 0K: 00000000 - 00757777 ( 124K) : 00
```
Presumeably, RSTS carries out this identification activity with the CSR report enable bits off, and the CSR error bits still function correctly in these circumstances; otherwise, per above, my machine would summarily halt during this process!
Noel, in some of his research, found Deeper magic from before the dawn of time re. evolution of the Unibus parity implementation before the era of the start of this article, bridging back to the KA11 (11/20) CPU. Quite interesting!

From: Noel Chiappa
Subject: Change in UNIBUS parity operation (Was: PDP-11/45 RSTS/E boot problem)
To: cctalk

Even better, it claims to be able to control whether the memory uses odd or even parity! (How, for UNIBUS memory, I don't know - there's no way to do this over the UNIBUS.

So this really confused me, as the UNIBUS spec says parity is wholly within the slave device, and only an error signal is transferred over the bus. E.g. from the 'pdp11 peripherals handbook', 1975 edition (pg. 5-8): "PA and PB are generated by a slave ... [it] negates PA and asserts PB to indicate a parity error ... both negated indicates no parity error. [other combinations] are conditions reserved for future use."

The answer is that originally the UNIBUS parity operation was different, and that sometime around the introduction of the PDP-11/45, they changed it, which is apparently why Appendix E, about parity in the /45, says what it does!

I found the first clue in the MM11-F Core Memory Manual (DEC-11-HMFA-D - which is not online, in fact no MM11-F stuff is online, I'll have to scan it all and send it to Al); I was looking in that to see if the parity version had a CSR or not (to reply to Paul Koning), and on the subject of parity it said this: "The data bits on the bus are called BUS DPB0 and BUS DPB1." And there is nothing else on how the two parity bits are used - the clear implication is that the memory just stores them, and hands them to someone else (the master) over the bus, for actual use.

Looking further, I found proof in the "unibus interface manual" - and moreover, the details differ between the first (DEC-11-HIAA-D) and second (DEC-11-HIAB-D) editions (both of which differ from the above)!

In the first, Table 2-1 has these entries for PA and PB: "Parity Available - PA ... Indicates paritied data" and "Parity Bit - PB ... Transmits parity bit"; at the bottom of page 2-4 we find "PA indicates that the data being transferred is to use parity, and PB transmits the parity bit. Neither line is used by the KA11 processor."

(Which explains why, when, after reading about parity in the MM11-F manual, I went looking for parity stuff in the KA11 which would use it, I couldn't find it!)

In the second, Table 2-1 has these entries for PA and PB: "Parity Bit Low - PA ... Transmits parity bit, low byte" and "Parity Bit High - PB ... Transmits parity bit, high byte"; at the top of page 2-5 we find wholly different text from the above, including "These lines are used by the MP11 Parity Option in conjunction with parity memories such as the MM11-FP."

I looked online for more about the MP11, but could find nothing. I wonder if any were made?

This later version seems to agree with that Appendix E. I tried to find an early -11/45 system manual, to see if it originally shipped with MM11-F's, but couldn't locate one - does anyone have one? The ones online (e.g. EK-1145-OP-001) are much later.

It's also interesting to speculate about reasons why these changes were made; I can think of several! :-)

All for now!

PDP-11/45: V6 Unix attempts & MS11-L repairs

Mon 21 January 2019 by Fritz Mueller

Following up on Noel's suggestion, I decided to give V6 Unix a try to see how it fared in comparison to the problems seen with RSTS/E V06C. I recently scored an additional RK05 pack from eBay, and decided to try and use that so I could keep my current RSTS/E pack intact.

Inspected the pack, and it looked in good shape, clean, with no apparent crashes on the media. Mounted it up and was able to do only a partial recovery. What I got looks like pretty generic RT-11/BASIC-11 stuff, so I'm not too concerned about attempting a complete recovery. Went ahead and reformatted the pack, after which I could read/write the entire pack with no bad sectors. So now I had two clean packs to work with.

Built a V6 Unix pack image from the Ken Wellsch tape under SIMH (using directions here). Transferred it over using PDP11GUI, and it did boot in single-user mode. However, it immediately dumped core on the first ls command... Tried a multi-user Unix boot (what's to lose?) and this actually fared a bit better; able to ls, but still dumped core when trying to run the C compiler or do anything else memory-intensive.

So, all of this taken together made me (and others collaborating on the troubleshooting on cctalk) think that I might have a memory issue in the machine. My machine has a 256KB MS11-L; I had previously spot-checked this from the front panel by manipulating the KT11-C mapping registers and trying some writes/and reads within each bank. This was enough to identify and repair a few major problems (see this older blog post) and to get me this far. But I had never thoroughly and substantially beat this card up after things seemed to be working with RT-11. There was still also nagging concern that none of the heavier-weight KT11, MS11, KB11 "exerciser" MAINDEC diagnostics had yet been run to completion on the restored machine either...

The recommended DEC diagnostic for the MS11-L is ZQMC, but it is complicated, takes a long time to download, and the available sources don't exactly match the binary. So, probably better to work up my own standalone diagnostic to catch and fix obvious things... Thus followed about a week of part-time work working up and successively refining the following test code, and repairing identified problems (failed DRAMs) on the MS11-L along the way. This code maps and tests every memory location on the MS11-L, using KT11 memory management. It relocates itself so it can test the lowest physical bank as well. Tests include all-ones, all-zeros, write address to location, and a "random" data test which just uses program code test sequence:

        KIPDR0=172300
        KIPDR1=172302
        KIPDR2=172304
        KIPDR3=172306
        KIPDR4=172310
        KIPDR5=172312
        KIPDR6=172314
        KIPDR7=172316

        KIPAR0=172340
        KIPAR1=172342
        KIPAR2=172344
        KIPAR3=172346
        KIPAR4=172350
        KIPAR5=172352
        KIPAR6=172354
        KIPAR7=172356

        SR0=177572

        XCSR=177564
        XBUF=177566

        .ASECT
        .=1000
START:
        MOV     #700,SP         ;INIT STACK POINTER

        ;----- INSTALL TRAP CATCHERS

TRPS:   CLR     R0              ;CURRENT VECTOR
        MOV     #2,R1           ;VECTOR TARGET
        CLR     R2              ;HALT INSTR
        MOV     #100,R3         ;END VECTOR
1$:     MOV     R1,(R0)+        ;STORE TARGET AND ADVANCE
        MOV     R2,(R0)+        ;STORE HALT AND ADVANCE
        ADD     #4,R1           ;UPDATE TARGET
        SOB     R3,1$           ;LOOP OVER VECTORS

        ;----- INIT AND ENABLE MEMORY MAPPING

INITM:  MOV     #IPDRS,R0       ;SRC PDR INIT TABLE
        MOV     #KIPDR0,R1      ;DST KIPDR0
        MOV     #10,R2          ;DO EIGHT PDRS
        MOV     (R0)+,(R1)+     ;COPY AND ADVANCE
        SOB     R2,.-2          ;LOOP OVER PDRS
        MOV     #IPARS,R0       ;SRC PAR INIT TABLE
        MOV     #KIPAR0,R1      ;DST KIPAR0
        MOV     #10,R2          ;DO EIGHT PARS
        MOV     (R0)+,(R1)+     ;COPY AND ADVANCE
        SOB     R2,.-2          ;LOOP OVER PARS
        MOV     #1,@#SR0        ;ENABLE MEMORY MGMT

        ;----- TEST 32K MS11 BANKS AT PA 100000 THRU 700000,
        ;      RELOCATE, THEN TEST BANK AT PA 000000

DOPASS: MOV     #1000,R0        ;PAR FOR PA 100000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #2000,R0        ;PAR FOR PA 200000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #3000,R0        ;PAR FOR PA 300000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #4000,R0        ;PAR FOR PA 400000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #5000,R0        ;PAR FOR PA 500000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #6000,R0        ;PAR FOR PA 600000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #7000,R0        ;PAR FOR PA 700000
        JSR     PC,DOBANK       ;TEST IT
        MOV     #1000,R5        ;RELOC TARGET PA:100000
        JSR     PC,RELOC        ;GO DO IT
        MOV     #0000,R0        ;PAR FOR PA 000000
        JSR     PC,DOBANK       ;TEST IT

        ;----- ALL DONE WITH PASS

        MOV     #0000,R5        ;RELOC TARGET PA:000000
        JSR     PC,RELOC        ;GO DO IT
        CLR     @#SR0           ;DISABLE MEMORY MGMT
        MOV     #PCOMPL,R5      ;GET PASS COMPLETE MSG
        JSR     PC,PRSTR        ;PRINT IT
        HALT                    ;ALL DONE

        ;----- MAP A SINGLE 32K BANK AT VA 20000

DOBANK: MOV     #KIPAR1,R1      ;WILL MAP USING KIPAR1 THRU KIPAR4
        MOV     #4,R3           ;FOUR KIPARS TO SET
        CMP     R0,#7000        ;UNLESS WE ARE IN PA 700000 BANK...      
        BNE     1$              ;IF NOT, SKIP AHEAD
        MOV     #3,R3           ;OTHERWISE, SCALE BACK TO 3 KIPARS
1$:     MOV     R0,(R1)+        ;SET A KIPAR AND ADVANCE
        ADD     #200,R0         ;INCREMENT VALUE FOR NEXT KIPAR
        SOB     R3,1$           ;LOOP OVER KIPARS

        ;----- CALCULATE END VA

        MOV     #120000,R1      ;MAPPED BANK END IS VA 120000
        CMP     @#KIPAR1,#7000  ;UNLESS WE ARE IN PA 700000 BANK...
        BNE     ZEROS           ;IF NOT, SKIP AHEAD
        MOV     #100000,R1      ;OTHERWISE, END IS VA 100000

        ;----- ZEROS TEST

ZEROS:  CLR     R2              ;EXPECTED VALUE IS 000000
        MOV     #20000,R0       ;START AT VA 20000
1$:     MOV     R2,(R0)+        ;CLEAR A WORD AND ADVANCE
        CMP     R0,R1           ;AT END?
        BNE     1$              ;IF NOT, LOOP
        MOV     #20000,R0       ;START AT VA 20000
2$:     TST     (R0)+           ;CHECK A WORD AND ADVANCE
        BEQ     3$              ;IF ZERO, SKIP AHEAD
        JSR     PC,PRERR        ;OTHERWISE, REPORT ERROR
3$:     CMP     R0,R1           ;AT END?
        BNE     2$              ;IF NOT, LOOP

        ;----- ONES TEST

ONES:   MOV     #177777,R2      ;EXPECTED VALUE US 177777       
        MOV     #20000,R0       ;START AT VA 20000
1$:     MOV     R2,(R0)+        ;WRITE A WORD AND ADVANCE
        CMP     R0,R1           ;AT END?
        BNE     1$              ;IF NOT, LOOP
        MOV     #20000,R0       ;START AT VA 20000
2$:     CMP     (R0)+,R2        ;CHECK A WORD AND ADVANCE
        BEQ     3$              ;IF EXPECTED VALUE, SKIP AHEAD
        JSR     PC,PRERR        ;OTHERWISE, REPORT ERROR
3$:     CMP     R0,R1           ;AT END?
        BNE     2$              ;IF NOT, LOOP

        ;----- WRITE LOCATION WITH ITS VA TEST

ADDRS:  MOV     #20000,R0       ;START AT VA 20000
1$:     MOV     R0,R2           ;USE VA AS TEST VALUE
        MOV     R2,(R0)+        ;WRITE A WORD AND ADVANCE
        CMP     R0,R1           ;AT END?
        BNE     1$              ;IF NOT, LOOP
        MOV     #20000,R0       ;START AT VA 20000
2$:     MOV     R0,R2           ;USE VA AS TEST VALUE
        CMP     (R0)+,R2        ;CHECK A WORD AND ADVANCE
        BEQ     3$              ;IF EXPECTED VALUE, SKIP AHEAD
        JSR     PC,PRERR        ;REPORT ERROR
3$:     CMP     R0,R1           ;AT END?
        BNE     2$

        ;----- "RANDOM" DATA TEST (PROGRAM AS TEST DATA)

RNDM:   MOV     #START,R2       ;SRC: START OF PROGRAM
        MOV     #20000,R0       ;DST: VA 20000
1$:     MOV     (R2)+,(R0)+     ;WRITE A WORD AND ADVANCE
        CMP     R0,R1           ;AT END?
        BEQ     2$              ;IF SO, SKIP AHEAD
        CMP     R2,#END         ;TIME TO RESET SRC?
        BLO     1$              ;IF NOT, GO DO ANOTHER
        MOV     #START,R2       ;OTHERWISE RESET SRC
        BR      1$              ;AND GO DO ANOTHER
2$:     MOV     #START,R2       ;SRC1: START OF PROGRAM
        MOV     #20000,R0       ;SRC2: VA 20000
3$:     CMP     (R2)+,(R0)+     ;COMPARE ONE WORD AND ADVANCE
        BEQ     4$              ;IF SAME, SKIP AHEAD
        MOV     R2,-(SP)        ;SAVE SRC1
        MOV     -2(R2),R2       ;FETCH EXPECTED VALUE
        JSR     PC,PRERR        ;REPORT ERROR
        MOV     (SP)+,R2        ;RESTORE SRC1
4$:     CMP     R0,R1           ;AT END?
        BEQ     5$              ;IF SO, SKIP AHEAD
        CMP     R2,#END         ;TIME TO RESET SRC1?
        BLO     3$              ;IF NOT, GO DO ANOTHER
        MOV     #START,R2       ;OTHERWISE RESET SRC1
        BR      3$              ;AND GO DO ANOTHER

5$:     RTS     PC              ;TESTS DONE, RETURN TO CALLER

        ;----- RELOCATE

RELOC:  MOV     R5,@#KIPAR1     ;MAP VA:020000 -> PA:(R5<<6) 
        CLR     R0              ;SRC VA:000000 
        MOV     #20000,R1       ;DST VA:020000
        MOV     R1,R2           ;FULL PAGE (4K WORDS)
        MOV     (R0)+,(R1)+     ;COPY A WORD
        SOB     R2,.-2          ;LOOP UNTIL DONE
        MOV     R5,@#KIPAR0     ;MAP VA:000000 -> PA:(R5<<6)
        MOV     #RELSTR,R5      ;GET RELOCATED STRING
        JSR     PC,PRSTR        ;PRINT IT
        MOV     @#KIPAR1,R5     ;GET RELOCATION TARGET
        ASHC    #6,R4           ;SHIFT OVER FOR PA IN R4:R5
        JSR     PC,PRW18        ;PRINT IT
        MOV     #CRLF,R5        ;GET CRLF
        JSR     PC,PRSTR        ;PRINT IT
        RTS     PC              ;RETURN TO CALLER

        ;----- REPORT AN ERROR

PRERR:  MOV     @#KIPAR1,R5     ;GET KIPAR FOR MAPPED BASE      
        ASHC    #6,R4           ;SHIFT OVER FOR PA IN R4:R5
        ADD     R0,R5           ;ADD IN ERROR VA
        ADC     R4              ;CARRY IF NECESSSARY
        SUB     #20002,R5       ;SUB VA OFFSET AND BACK UP ONE
        SBC     R4              ;BORROW IF NECESSARY
        JSR     PC,PRW18        ;PRINT PHYSICAL ADDRESS
        MOV     #DELIM1,R5      ;GET DELIMITER
        JSR     PC,PRSTR        ;PRINT IT
        MOV     R2,R5           ;GET EXPECTED VALUE
        JSR     PC,PRW16        ;PRINT IT
        MOV     #DELIM2,R5      ;GET DELIMETER
        JSR     PC,PRSTR        ;PRINT IT
        MOV     R0,R4           ;GET ADDRESS AFTER ERROR
        MOV     -(R4),R5        ;BACK UP AND GET ERROR VALUE
        JSR     PC,PRW16        ;PRINT IT
        MOV     #CRLF,R5        ;GET CRLF
        JSR     PC,PRSTR        ;PRINT IT
        RTS     PC              ;RETURN TO CALLER

        ;----- PRINT SIX DIGIT OCTAL NUMBER

PRW16:  CLR     R4              ;CLEAR UPPER WORD
PRW18:  MOV     #6,R3           ;SIX DIGITS TO PRINT
        ASHC    #1,R4           ;SHIFT IN MSB OF LOW WORD
1$:     ADD     #60,R4          ;MAKE INTO ASCII DIGIT
        MOV     R4,@#XBUF       ;PRINT IT
        TSTB    @#XCSR          ;CHECK IF XMIT DONE
        BPL     .-4             ;LOOP UNTIL SO
        CLR     R4              ;RESET OUTPUT CHAR
        ASHC    #3,R4           ;SHIFT IN NEXT THREE BITS
        SOB     R3,1$           ;LOOP DIGITS
        RTS     PC              ;RETURN TO CALLER

        ;----- PRINT NULL-TERMINATED STRING

PRSTR:  MOVB    (R5)+,@#XBUF    ;PRINT ONE CHAR AND ADVANCE
        TSTB    @#XCSR          ;CHECK IF XMIT DONE
        BPL     .-4             ;LOOP UNTIL SO
        TSTB    @R5             ;CHECK IF END OF STRING
        BNE     PRSTR           ;LOOP IF NOT
        RTS     PC              ;ELSE RETURN TO CALLER

IPDRS:  .WORD   077406,077406,077406,077406
        .WORD   077406,000000,000000,077406

IPARS:  .WORD   000000,000200,000400,000600
        .WORD   001000,000000,000000,007600

DELIM1: .ASCIZ  /: /            ;POST-ADDRESS DELIMETER
DELIM2: .ASCIZ  / /             ;POST-CRC DELIMETER
CRLF:   .ASCIZ  <15><12>        ;LINE DELIMETER
RELSTR: .ASCIZ  /RELOCATED TO PA:/
PCOMPL: .ASCIZ  /PASS COMPLETED/<15><12><15><12>

END:    .END    START

The code above is the end result of quite a bit of successive refinement. Things learned along the way:

At first the tests consisted only of writing and checking all-ones and all-zeros to each location. This did uncover one more bank with a stuck bit at only some addresses, that my previous spot-checking had missed. Lesson: you really gotta check every byte. Removed, socketed, and replaced the implicated DRAM, and my tests passed.
Maybe I fixed it, so after this I invested the download time to try the the DEC ZQMC diagnostic again. It ran better than I had seen before, proceeding through a few subtests, but soon started flagging a lot of errors that my diagnostic missed. Hmmm. Inspecting the DEC code, it seemed to be writing and checking random data at the time, not just all ones an zeros. Went ahead and implemented "random" data test in my diagnostic, and it immediately started implicating the same chips. Lesson: all-ones, all-zeros isn't good enough...
While I was at it, I implemented an additional "write/check each byte with its virtual address" test. Interestingly, this found most, but not all of the same chips as the random data test. Lesson: all-ones, all-zeros, and address in each word isn't good enough, either; you really do gotta have that "random" data test, too. At this point, went ahead and replaced three more implicated DRAMS, and my tests once again passed clean...
In the meantime, I did some more code reading on the DEC diagnostic, and found that various features could be enabled/disabled via the front panel switches. With some care, the diagnostic might also be restartable without having to wait for an entire re-download, if stopped carefully and in the right place. So I spent the time to re-download, and found with experimentation that the DEC diagnostic would now pass all banks of memory cleanly, as long as parity checking was disabled. Hmm...
Moving back to my diagnostic, I noticed that while it ran clean now on all banks, on a fresh power-up it would usually light the parity-error LED on the MS11-L on its first pass. Subsequent passes, after every location had been written at least once, were fine. Since the MS11-L doesn't have any fancy power-up init logic, it would make sense to see this if the program read locations without writing them first, but I didn't think my code did that. Enabling parity traps let me catch it in the act, and it was happening on a CLR instruction that I was using to init memory! Lesson: on an 11/45, CLR is implemented like other single- operand, modifying instructions, and actually does a DATIP bus cycle from the destination before writing back a zero! So use MOV instead of CLR to init memory if you are worried about tripping parity errors... Cleaned this up in my code, and my diagnostic now runs clean on my machine in all circumstances without ever tripping a parity error.

So, a lot of issues found and repaired on the MS11-L. Maybe still some issues with parity error handling, which seems to be halting the machine instead of taking a trap. Figured it might be worth a shot to try the operating systems again, so mounted the respective disks and tried both, and... exact same failures in both cases! Womp, womp...

Well, might as well continue to look into the parity error handling, since some things still seem fishy there. The DEC documentation is a bit murky here; various versions of the KB11-A maintenance manual and 11/45 processor handbook say different and somewhat contradictory things; some info in earlier editions is also removed from later ones. The available engineering drawings for the relevant parts of the KB11-A CPU look to have some significant differences from the actual boards I have on hand, and there are more than a few ECO's for these boards listed as relating specifically to parity handling, but for which no other information is available. And Noel has uncovered evidence that even the Unibus signaling related to parity may have been changed by DEC around the times of the early 11/45. Could be interesting...

PDP-11/45: RSTS/E V06C attempts

Mon 07 January 2019 by Fritz Mueller

Okay, back in action after replacing the failed nand at B26 in the RK11-C. MAINDEC ZRKK now passes reliably. Wish I'd been able to get to the bottom of this at the show, but it was really hard to effectively debug on the floor there while the show was in progress -- you naturally want to stop and chat with everybody who drops by to take a look, so its hard to get into a good technical flow.

Since one of the RK11-C diagnostics I needed to use writes a pack destructively, I had to sacrifice my working RT-11 pack along the way. Rather than go back to the same old RT-11 image, I figured maybe time to try something different? RSTS/E would probably be more fun with multiple terminals and the DZ11 that I have anyway, and I've never actually played around with RSTS. So decided to give that a go...

Did some poking around looking at various available versions, and V06C seemed like a pretty good starting point: it's new enough to explicitly support all of my hardware (including the DZ11 and excepting the VT100), but old enough to still have relatively modest storage requirements so I can hope to run it with the single RK05 that I currently have working. There is also complete distribution tape available at rsts.org, and a fairly complete set of documentation at bitsavers.

Spent a bit of time reading the sysgen manual, and managed to sysgen under simh and generate a bootable RK05 image for my hardware. I then transferred this image to my single working pack using PDP11GUI; this is frustratingly slow (~3 hours to write a 2.5mb pack)!! I had forgotten how bad this is. I'm not quite sure why it is as slow as it is; it shouldn't take much more than 45 minutes to push that much data through a DL11 at 9600 baud, even without compression, and the PDP-11 disk subsystem can easily keep up with that. I'm not sure if PDP11GUI is spending a lot of time turning around the serial line, or has a bunch of per-character overhead, or...? In any case, I'm motivated to do something about it; more on this some other time soon.

So, unfortunately, the RSTS image which works under simh fails to completely boot on the real hardware. It runs through the initial "Option:" menu without trouble, and upon start the RSTS light chaser runs in the data lights on the front panel. Characters are echoed on the console terminal, but it never reaches code to print the banner or prompt for the initial control file. The system appears to be in a loop reading the same section of disk repetitively, and the display register shows a continuously increasing count.

Got a lot of help from folks over on the cctalk mailing list on this one, since I'm a newbie to RSTS. Paul K. provided some useful tips:

RSTS displays an error count in the display register, so that's why I see an increasing count there.
The "fancy" idle pattern that includes both the address and data lights apparently shows up in a later release of RSTS and requires a particular sysgen option, so its not surprising that I only see the pattern in the data lights on my machine.
The ODT debugger may be loaded with RSTS for startup debugging by configuring it using an undocumented option in the change memory layout section of the "DEFAULT" command at the boot prompt. Enter "ODT" there, and provide a space for it in the memory map. After that, at ^P at the console will take you to the ODT prompt.
```
Memory allocation table:

  0K: 00000000 - 00123777 (  21K) : EXEC
 21K: 00124000 - 00213777 (  14K) : RTS (BASIC)
 35K: 00214000 - 00227777 (   3K) : ODT
 38K: 00230000 - 00757777 (  86K) : USER
124K: 00760000 - End              : NXM
```

A handy way to query the RSTS symbol table is to use the "PATCH" command at the boot prompt (one can also look through the .MAP files generated during sysgen):

Option: PA
File to patch? 
Module name? 
Base address? ERL
Offset address? 
 Base   Offset  Old     New?
041314  000000  005267  ? ^Z
Offset address? ^Z
Base address?

Paul also provided this procedure for triggering a crash dump from an ODT breakpoint under RSTS:

1. Make sure crash dump is enabled (in the "default" option). Start the system. Let it run for at least one minute. (I'm not entirely sure about older versions, but I think that a crash within one minute of startup is handled differently and doesn't do all the usual dump and restart machinery.)

2. Set the data switches all UP. (In SIMH, enter "D SR 177777".)

3. Set a breakpoint.

4. When you hit the breakpoint, change the PC to 52, like this:

0B:055244
_$7/055244 52
_P

(you enter only "$7/" and "52" and "P", the rest is output from ODT.)

The system will write the crashdump and then automatically restart.

5. You should now have the crash dump in [0,1]CRASH.SYS

Further experiments coordinated by Paul then led to the conclusion that an error like this could reasonably be expected to be triggered by a corrupted INIT.BAC or BASIC.RTS file. This led me to wish to verify that the disk pack contents really matched the image file I was running successfully under simh. Some standalone code to dump a CRC of every sector on the pack seemed like it would be useful in this regard, so I coded up the following:

        RKDS=177400
        RKER=177402
        RKCS=177404
        RKWC=177406
        RKBA=177410
        RKDA=177412

        XCSR=177564
        XBUF=177566

        .ASECT
        .=1000
START:  
        MOV     #700,SP         ;INIT STACK POINTER

        ;----- INIT CRC LOOKUP TABLE

        MOV     #10041,R0       ;CRC POLYNOMIAL 
        MOV     #CRCTBL,R1      ;LOOKUP TABLE TO FILL
        ADD     #1000,R1        ;START FILLING FROM END (+256 WORDS)
        MOV     #377,R2         ;COUNT DOWN FROM INDEX 255
L0:     MOV     R2,R4           ;GET COPY OF INDEX
        SWAB    R4              ;MOVE TO UPPER BYTE
        MOV     #10,R3          ;LOOP OVER EIGHT BITS OF INDEX  
L1:     ASL     R4              ;SHIFT, MSB TO CARRY FLAG
        BCC     L2              ;IF MSB NOT SET SKIP AHEAD
        XOR     R0,R4           ;ELSE XOR IN POLYNOMIAL
L2:     SOB     R3,L1           ;LOOP OVER BITS
        MOV     R4,-(R1)        ;SAVE RESULT IN LOOKUP TABLE
        DEC     R2              ;COUNT DOWN
        BPL     L0              ;LOOP OVER TABLE ENTRIES

        CLR     R5              ;INIT SECTOR COUNTER

        ;----- PRINT START OF LINE

L3:     MOV     R5,R0           ;GET SECTOR COUNTER
        JSR     PC,PRNW         ;PRINT IT
        MOV     #DELIM1,R0      ;GET POST-SECTOR DELIMETER
        JSR     PC,PRNSTR       ;PRINT IT

        ;----- READ 8 SECTORS FROM DISK

L4:     MOV     R5,@#RKDA       ;SET START SECTOR
        MOV     #DBUF,@#RKBA    ;SET TARGET ADDRESS
        MOV     #-4000,@#RKWC   ;READ 8 SECTORS (2K WORDS)
        MOV     #5,@#RKCS       ;READ + GO
        TSTB    @#RKCS          ;CHECK RKCS RDY BIT
        BPL     .-4             ;LOOP IF BUSY

        ;----- HANDLE ERROR IF ANY

        BIT     #100000,@#RKCS  ;CHECK FOR ERROR
        BEQ     L5              ;SKIP AHEAD IF NOT
        MOV     #ERRSTR,R0      ;POINT TO ERROR INDICATOR
        JSR     PC,PRNSTR       ;PRINT IT
        MOV     @#RKER,R0       ;GET ERROR REG
        JSR     PC,PRNW         ;PRINT IT
        BR      L8              ;MOVE ON TO NEXT 8 SECTORS

L5:     MOV     #DBUF,R4        ;POINT TO START OF DATA JUST READ

        ;----- RUN CRC FOR ONE SECTOR.  FOR EACH INPUT BYTE CH:
        ;       CRC = CRCTBL[((CRC >> 8) ^ CH) & 255] ^ (CRC << 8)

L6:     CLR     R0              ;RESET CRC
        MOV     #1000,R1        ;LOOP OVER ONE SECTOR (256 WORDS)
L7:     MOV     R0,R2           ;GET COPY OF CRC
        SWAB    R2              ;MOVE HIGH BYTE DOWN
        MOVB    (R4)+,R3        ;GET NEXT INPUT BYTE TO PROCESS
        XOR     R3,R2           ;XOR ONTO MUNGED CRC
        BIC     #177400,R2      ;MASK OFF HIGH BYTE
        ASL     R2              ;TIMES TWO INDEX INTO LOOKUP TABLE
        MOV     CRCTBL(R2),R3   ;LOOKUP VALUE
        SWAB    R0              ;MOVE LOW BYTE OF CRC UP
        CLRB    R0              ;MASK OFF THE BOTTOM
        XOR     R3,R0           ;XOR IN THE LOOKED UP VALUE
        SOB     R1,L7           ;LOOP OVER BYTES

        ;----- PRINT CRC, DELIMIT AND LOOP

        JSR     PC,PRNW         ;PRINT CRC, ALREADY IN R0
        CMP     R4,#DBUF+10000  ;END OF DISK BUFFER?
        BEQ     L8              ;IF SO, EXIT LOOP
        MOV     #DELIM2,R0      ;ELSE POINT TO POST-CRC DELIMETER
        JSR     PC,PRNSTR       ;PRINT IT
        BR      L6              ;GO DO ANOTHER SECTOR

        ;----- DELIMIT END OF LINE, LOOP

L8:     MOV     #CRLF,R0        ;POINT TO LINE DELIMETER
        JSR     PC,PRNSTR       ;PRINT IT
        ADD     #10,R5          ;MOVE AHEAD 8 SECTORS
        CMP     R5,#11410       ;AT END OF PACK?
        BLT     L3              ;IF NOT, GO DO THE NEXT 8

        HALT                    ;ALL DONE!

        ;----- PRINT A WORD IN OCTAL

PRNW:   MOV     #6,R2           ;SIX DIGITS TO PRINT
        MOV     R0,R1           ;MOVE OUTPUT WORD OVER TO R1
        CLR     R0              ;RESET OUTPUT CHAR
        ASHC    #1,R0           ;AND SHIFT IN MSB TO START
L9:     ADD     #60,R0          ;MAKE INTO ASCII DIGIT
        MOV     R0,@#XBUF       ;PRINT IT
        TSTB    @#XCSR          ;CHECK IF XMIT DONE
        BPL     .-4             ;LOOP UNTIL SO
        CLR     R0              ;RESET OUTPUT CHAR
        ASHC    #3,R0           ;SHIFT IN NEXT THREE BITS
        SOB     R2,L9           ;LOOP DIGITS
        RTS     PC              ;RETURN TO CALLER

        ;----- PRINT A NULL-TERMINATED STRING

PRNSTR: MOVB    (R0)+,@#XBUF    ;PRINT ONE CHAR AND ADVANCE
        TSTB    @#XCSR          ;CHECK IF XMIT DONE
        BPL     .-4             ;LOOP UNTIL SO
        TSTB    @R0             ;CHECK IF END OF STRING
        BNE     PRNSTR          ;LOOP IF NOT
        RTS     PC              ;ELSE RETURN TO CALLER

DELIM1: .ASCIZ  /: /            ;POST-SECTOR DELIMETER
DELIM2: .ASCIZ  / /             ;POST-CRC DELIMETER
CRLF:   .ASCIZ  <15><12>        ;LINE DELIMETER
ERRSTR: .ASCIZ  /ERROR: /       ;ERROR INDICATOR

CRCTBL: .BLKW   400             ;CRC LOOKUP TABLE
DBUF:   .BLKW   4000            ;DISK DATA BUFFER

        .END    START

Running this indicated the RSTS pack was in good shape, and not corrupt. So, maybe I have had a lurking hardware bug in my memory system (a 256KB MS11-L), which never tripped up RT-11 and so has to date gone undiagnosed?

At this point, Noel suggested on cctalk that I give Release 6 Unix a try as well, and see if it suffers similarly. Worth a shot! Out of time for now, and back to the day gig tomorrow after holiday break. Happy new year, all!

PDP-11/45 Behaving Badly

Sun 09 December 2018 by Fritz Mueller

Wow, a year to the day since the previous post here! Not a lot of PDP-11 work this past year, with lots of other stuff like home improvements going on, but a few things worth catching up on here.

Mainly, I got brave this past year and decided to actually rent a van and take the 11/45 out of the basement to the VCF West show at the Computer History Museum in Mountain View. This was a lot more physical work than I had anticipated. Working on this thing a piece at a time, sitting in one place in the basement, you get kind of used to it and forget how much iron it actually is... But breaking it down, loading it into a van, unloading into the show, reassembling, then reversing the whole process at the end of the show is a stark reminder, both of the size of the machine and of my advancing age, ha! A huge thank-you to my workmate Brian, who selflessly gave up a weekend, a vacation day, and some mileage on his back to give me a hand. He has already informed me that "the answer for next year is 'no'." :-)

Reassembly on the VCF West show floor Running diagnostics and consulting the RK11-C prints... Pavl stops by with an RX02 and controller to help out with debug. Replacement KM11 debug board visible in the upper diagnostic port of the RK11-C. With Pete Richert, an old friend from Digidesign days!

I suppose I should have expected it, but in the course of transportation to the show something shook loose resulting in a machine that wouldn't boot RT-11 when reassembled on the show floor (stupid bumpy rental van!) So my show became a two day live-troubleshooting exhibit. This was fine, and I think a lot of folks had fun jumping in and helping with troubleshooting (thanks, all!) There was a lot of interest and reminiscence about the machine and I met a lot of nice people. Still, a little disappointing, because I really had wanted people to be able to sit down and use the machine, and also because my head ended up in the machine the whole time I really didn't get to see the rest of the show or talk to other people about their exhibits! Ah well. In the end I did cajole a successful boot out of it, 15 mins. before the show closed, so at least a couple people got to sit down and play Adventure. Placed 3rd in the restoration category :-)

So, what went wrong? At the show I managed to isolate the problem to something intermittent related to interrupts from the RK11-C controller. I was still able to boot the RKDP diagnostic pack, since its bootstrap and monitor make very conservative use of processor and device interface features. Running through the diagnostics, managed to narrow down the problem to RK11-C completion polling after overlapped seeks. I guess RT-11 makes use of this feature.

I got the machine home and reassembled, and verified that the problem was still manifesting. Then many months passed, until I found some time to dig deeper into the problem just last night. The relevant failing diagnostic is ZRKK test 37, and the output is:

DRIVE 0

RK11 DIDN'T INTRUPT AFTER SK COMPLETED
  PC     RKCS    RKER    RKDS
014476  000310  000000  004713


SCP DIDN'T SET AFTER SEEK WAS DONE
  PC   RKCS
014526  000310


RK11 DIDN'T INTRUPT AFTER SK COMPLETED
  PC     RKCS    RKER    RKDS
014476  000310  000000  004712


SCP DIDN'T SET AFTER SEEK WAS DONE
  PC   RKCS
014526  000310


TIMOUT,PC=004536

And the relevant bit of the diagnostic listing:

014362  2$:     MOV     RKVEC,R1
014366          MOV     #3$,(R1)+               ;SET UP VECTOR ADRES FOR RK11 INTERUPT
014372          MOV     #340,(R1)               ;SET UP PSW ON INTERRUPT
014376          BIS     #40,@RKDA               ;ADRES CYLINDER #1
014404          MOV     #111,@R0                ;SEEK, GO WITH IDE SET
014410          WAT.INT ,300                    ;WAIT FOR THE DRIVE TO
                                                ;INTERRUPT AFTER ADRES WAS RECVD
                                                ;WAITING TIME= 1.4 MS FOR 11/20
                                                ;280 US FOR 11/45
                                                ;ERROR, IF INTERUPT DID NOT OCCUR
                                                ;BY NOW
014414          MOV     #BADINT,@RKVEC          ;RESTORE UNEXPECTED RK11 INTERRUPT
014422          MOV     @R0,$REG0               ;GET RKCS
014426          ERROR   75                      ;INTERRUPT DID NOT OCCUR AFTER
                                                ;SEEK WAS INITIATED WITH IDE SET
014430          BR      3$+4
014432  3$:     CMP     (SP)+,(SP)+             ;OK, IF RK11 INTERRUPTED TO THIS
                                                ;RESTORE STACK POINTER (FROM RK11 INTERRUPT)
014434          CMP     (SP)+,(SP)+             ;RESTORE STACK POINTER (FROM
                                                ;WAT.INT)
014436          MOV     #5$,@RKVEC              ;SET UP NEW VECTOR ADRES FOR RK11
014444          BIT     #20000,@R0              ;IS SCP CLEAR
014450          BEQ     4$                      ;YES, BRANCH
014452          MOV     @R0,$REG0               ;GET RKCS
014456          ERROR   76                      ;SCP SET BEFORE SEEK TO LAST
                                                ;CYLINDER WAS DONE
014460  4$:     WAT.INT ,56700                  ;WAIT FOR DRIVE TO INTERRUPT
                                                ;AFTER SEEK WAS COMPLETED
                                                ;WAITING TIME=180 MS FOR 11/20
                                                ;36 MS FOR 11/45
014464          MOV     #BADINT,@RKVEC          :IT'S AN ERROR IF BY THIS TIME
                                                ;INTERRUPT HAS NOT OCCURERED
014472          JSR     PC,GT3RG                ;GO GET RKCS, ER, DS
014476          ERROR   77                      ;RK11 DID NOT INTERRUPT AFTER SEEK (TO
                                                ;LAST CYLINDER) WAS DONE WITH IDE SET
014500          BR      5$+2
014502  5$:     CMP     (SP)+,(SP)+             ;OK, IF RK11 INTERUPTED TO THIS AFTER
                                                ;SEEK WAS COMPLETED. RESTORE
                                                ;STACK POINTER (FROM RK11 INTERRUPT)
014504          CMP     (SP)+,(SP)+             ;RESTORE STACK POINTER (FROM
                                                ;WAT.INT)
014506          MOV     #BADINT,@RKVEC          ;RESTORE RK11 INTERRUPT VECTOR ADRES
                                                ;FOR UNEXPECTED INTERUTS
014514          BIT     #20000,@R0              ;DID SCP BIT SET?
014520          BNE     6$                      ;YES, BRANCH
014522          MOV     @R0,$REG0               ;GET RKCS
014526          ERROR   53                      ;SCP DID NOT SET AFTER RK11 INTERRUPTED
                                                ;INDICATING SEEK WAS

So, based on the fact that we don't hit error 75 from 14426 (and the fact that the previous test, #36, in this diagnostic is passing) unlike some previous issues the RK11 here is able to generate interrupts and the 11/45 CPU is fielding them. The issue seems related to the seek completion polling circuitry on the RK11.

This circuitry is described in section 3.3.2 of the RK11-C manual, and is detailed in engineering drawings D-BS-RK11-C-12, sheets 1 and 2. When a seek or reset is in progress for any drive and the IDE bit in the controller is set, the controller will poll all drives for completion when it is otherwise idle. When polling is active, a pulse train which drives a count through the polled drives should be visible at B27F2:

RK11-C polling clock

A quick look with the 'scope shows no joy here. This clock is initiated by signal POLL; which doesn't seem to be being asserted. Checking the origin of that signal takes us to B26 and A26:

RK11-C polling clock enable

Hmmm, one of these gates (the inverter at A26) is one that had failed and that I had repaired sometime last year... Reseated the socketed replacement on A26, reloaded the diagnostic, but still no go. Well at least it wasn't my repair job! Went ahead and pulled A26 and B26 and bench tested the gates. The 8-input nand that outputs to B26J2 does look fishy. Pulled and socketed the piece, and put a replacement and some spares on order at Jameco where I can pick them up on my way in to work tomorrow. All for now!

PDP-11/45: LA30 repair IV

Sat 09 December 2017 by Fritz Mueller

Received replacement components for the blown G380 solenoid driver channel. After this repair, all pins are firing and printing correctly. Calibrated left margin. Checked pin drive signal, which was within specifications and required no adjustment.

Went to check carriage return pulse timing calibrations, but as it turns out the G396 clock accelerator card in this LA30 has not had ECO 2 applied and therefore has no timing calibration pots. Carriage return seems to be functioning correctly and reliably after the left margin adjustment in any case.

Inspected and cleaned the M7910 interface card; it appears to be in good shape. Rejumpered the base address and interrupt vector for console operation. Slotted it into the PDP-11 in place of the DL11 I had been using up until now, and cabled up to the LA30. Booted to the M9301 monitor and then on into RT-11, and everything seems to be working fine! Here's a short video of the RT-11 boot, followed by the start of a session of Adventure:

PDP-11/45: LA30 repair III

Sat 02 December 2017 by Fritz Mueller

Digging in on the flip-flops identified as potentially problematic in the previous post, found that E5 had failed. Pulled, socketed, and replaced; character generator now correctly clocks all five character columns:

After this repair, characters were printing full width, but two problems remained: about half of the characters printed in response to typing on the keyboard were the wrong character, and the top row was not printing at all on any character:

LA30 second print attempt -- incorrect characters and top row missing

Looking at the incorrect characters problem first, it was clear that bit 4 was not being received by the character generator correctly. I was a bit worried that the SMC KR2376-17 scanner/ROM on the keyboard assembly might be at fault, since Mattis had had some trouble with his. This is a pretty cool part; a combined scanner and code translator, with an internal oscillator, rollover logic, debounce delay, and flexible interfacing:

KR2376-17 keyboard scanner/encoder internal schematic

...not to mention the very cool vintage ceramic/gold packaging (see below.) Fortunately, inspection with an oscilloscope showed that the outputs from the scanner were just fine; chasing downstream, the problem was found to be just a loose pin (SS) on the keyboard cable Berg connector. With that sorted, we now have this:

LA30 third print attempt -- characters correct, but still missing top row

For the final issue with the top row not printing, verified that the problem followed a particular G380 solenoid driver card when swapping them around, and that with a functional G380 in the appropriate backplane slot pin 1 fires and prints correctly. Inspection of the problematic G380 revealed a failed power transistor and blown associated micro-fuse; replacement parts on order.

For the ribbon advance issue, I pulled the ribbon motors and disassembled their top-side reduction gear cases in order to gain access to the upper rotor bearings. Cleaning and lubrication of these bearings, plus a few more taps with a mallet after reassembly, achieved an improved bearing alignment. With the increased output torque, the ribbon now advances reliably.

Other minor items: Replacement vibration isolators arrived, and were installed. Threaded inserts in the fiberglass top shell that had pulled out were reattached with epoxy.

Have some more travel coming up for work, so may not be able to get back to this for a bit. Next steps will be repair of the failed solenoid driver channel, calibrations, then any debug necessary on the M7910 interface card for the PDP-11.

The SMC KR2736-17 keyboard scanner/encoder in the LA30 G380 solenoid driver card from the LA30, with failed parts pulled

PDP-11/45: LA30 repair II

Sun 26 November 2017 by Fritz Mueller

Okay, first thing to debug today is the ready light. This is lit by RDY LITE L, pin A16D2 (lower right of sheet M7721-0-1 in the LA30 engineering drawings). Logic probe showed this was correctly asserted low. Pulled the lamp and checked it with bench supply, and it checked fine. Verified +10 and ground at the lamp socket as well, so why isn't it lit? Turns out it is polarized, and the socket is soldered in backwards (?!). Corrected the socket, and ready light is working.

Noticed that ribbon is stalling occasionally. Ribbon tension seems to be overcoming the clutch on the take-up side. It seems tensioning drag on the inactive side is too high. Not sure what to do about this one yet; there is not much adjustable within the clutches, and the service manual only recommends replacement if they are out of spec (yeah, good luck finding one!)

Repeated the experiment with loopback jumper (A15R2 to A15C2). Turns out I had miscounted backplane pins the first time. With jumper placed correctly, I am now get a printing response to the keyboard. Not quite right, but definite progress:

LA30 first print attempt

Here I had typed :;L, and then some other letters. You can see evidence of correct pins firing for the first three characters, though either head movement or pin timing is off. Letter spacing appears more or less correct for 80 characters per line.

Okay, hooked up the logic analyzer, and started to take a look at the character generator clocking to see if all the the columns are being clocked out correctly. The logic analyzer shows a malfunction consistent with the print behavior: character column clocking resets after two columns rather than proceeding through all five:

This signaling is mediated by a ripple counter on the M7724 character generator card:

LA30 character generator clocking partial schematic.

So it looks like one or more of these 7474 quad flops has failed. I note on my chargen board that these are early 70's Nat Semi parts; Mattis had a very similar issue (search "7474" on this page) on his LA30 chargen with the same parts.

All I have time for this weekend; next time I'll get the chip clip on these for a closer look, then pull and replace the baddies.

PDP-11/45: LA30 repair

Sat 25 November 2017 by Fritz Mueller

Once again its been a little while since I've had to work on PDP-11 stuff or put any updates here; the day gig has been pretty intense lately.

Recent efforts have been focused on restoration of an LA30 printing terminal. This was really filthy (including a mouse nest, yuck) so in addition to the usual electronics work it had to be completely disassembled for proper cleaning and lubrication.

First off, the H735 power supply. This is a pretty straightforward supply, but has an oil cap in the ferro- resonant circuit that is listed as a PCB-containing component; replaced this with a modern equivalent. Also pulled and reformed all the large electrolytic caps on the bench per usual. No real trouble or surprises with this supply.

Logic assembly looks good; everything is there (mine is an LA30-P, the parallel interface version) with no obvious scorches or toast. Backplane intact and chip pin corrosion doesn't look too bad. Needed some compressed air to blow out all the dust bunnies.

Print head also looks to be in decent shape; all of the pins fire freely when activated momentarily with a 15 VDC bench supply.

Most of the work here was involved in disassembling the top section of the terminal, including the keyboard and carriage assembles, where most of the filth had accumulated. There are a lot of parts and pieces, with castings, bearings, machined shafts, stainless and brass throughout. This thing was really well built!

The ribbon-like paper drag springs were all either torn, mangled, or cracked/cracking; I fashioned some replacements by cutting and drilling 1/2" x 3" strips of .002 steel shim stock. The rubber shock isolation mounts for the carriage assembly had also hardened and decayed. These look very close to the original; I put some on order. Replaced the bumpers on the carriage rails with some less expensive 3/8" chassis grommets. After cleaning, hit the slide rails with dry film silicone lubricant (Molykote 557) and pivot and carriage cam plates pins with a good lithium grease (Molykote BR2 Plus).

Ribbon drive motor bearings were very gummy, and one of the ribbon motors had seized. These motors are quite serviceable though; you can pull the bottom bearing cap and remove the rotor, clean the rotor shaft and bearings of old lubricants, apply fresh and reassemble. These are self-aligning bearings, so don't forget to give the assembly a few taps all around with a mallet after reassembly to shake them into true.

Consumables: compatible ribbons are still plentiful on eBay, so I ordered a few. Paper is an unusual width at 9-7/8". A few vendors on Amazon still seem to carry it, but it might be wise to lay in stock of a carton or two while it is still obtainable.

Fired it up after reassembly. No smoke (good!) and it feeps once reassuringly at power on. Ribbon motors and clutches seem to be working, and the ribbon advances. Activating the ribbon reverse switches manually reverses the ribbon movement per expectation.

If the carriage is closed with paper loaded, the print head will home left and then move right after about a second or so (per expectation; "last character visibility" feature) and ribbon advance halts. Local line feed from the front panel switch works. All of this indicates a good deal of the logic and the motor drive are already working correctly.

However: the front panel "ready" indicator does not light, and a quick loopback test (jumper A15R2 to A15C2 on the backplane) does not print any characters in response to the keyboard. Will pick up here with logic debug next time.

PCB-containing cap from ferro-resonant supply; to be replaced LA30 H735 supply, pulled to bench for clean/refurb LA30 internal controller card cage LA30 print head LA30 carriage, disassembled for clean, repairs, and lube. A mangled paper drag spring is visible on the print bar assembly. LA30 cleaned and reassembled

PDP-11/45: VT52 Keyboard Repair

Sat 15 July 2017 by Fritz Mueller

The VT52 had a broken ESC key, and with RT-11 up and running I was motivated to dig in and fix it (you need that ESC key if you are going to run the K52 editor). Pulling the keycap and giving things a look over, the leaf contacts and the plastic plunger that activate the key looked fine. Need to get at the keyboard module itself to troubleshoot, and on a VT52 that means opening the thing all the way up and pulling the main boards. In we go...

Extracted the keyboard module, powered it from my bench supply, and used a breadboard and some jumpers to drive the key select decoders. Key closure on the back of the module was intermittent, but some flexure of the entire keyboard PCB seemed to be affecting it. Replaced/reflowed the solder on the back of the key switch and that seems to have fixed it.

Back together, working well now. Test drove it for a while under RT-11...

VT52 with logic boards and keyboard removed VT52 keyboard under test, powered from bench supply VT52 test drive with RT-11 Adventure

PDP-11/45: Data Recovery

Sun 09 July 2017 by Fritz Mueller

Okay, the system is working well enough now to start attempting recovery and archive of the dozen or so RK05 packs that I have on-hand. These were all obtained (along with the RK05 drives, controller and power supply) in a surplus auction downstream of Stanford's Hansen Experimental Physics Lab, sometime in the early '90s.

The packs date from the mid-70's to early-80's, and the labels indicate contents related to experiments and research projects taking place at the lab at that time. One particular pack seems associated with the Stanford Gravity Wave Project, which built early resonant mass detectors. Other packs labeled "FEL" would be related to early free-electron laser research. Many of the names on the packs are are readily found on related scientific publications from the time.

The process for dealing with these packs involves opening them up for inspection and cleaning before mounting them, with hopes of avoiding beating up or destroying the drive heads and/or media with head crashes. Much has been written about pack cleaning on the classiccmp mailing lists and in the vcfed forums, but briefly the process involves some clean-room gloves, lint-free wipes, and anhydrous isopropyl alcohol. The outside of the pack is first cleaned of dust and grime, then the packs are opened and inspected and the disk surfaces given a scrub with the wipes and alcohol. If a pack seems in good enough shape to mount, it is spun up and run in a drive with head load disabled for about a half-hour. This gets a good air-flow over the disk to blow out remaining loose particulates and also lets the disk come up to thermal equilibrium. After that the heads are loaded, with a finger standing by on the unload switch in case there are any bad noises...

I've already been dealing with two of the packs extensively during the restoration: one is an RKDP diagnostics pack, and the other was a backup pack of same. I was able to capture a complete, error-free image of the RKDP pack using PDP11GUI. This seems to be an earlier version of this disk that what is already available on bitsavers; I've sent the bits to Al Kossow, but as I understand it his project has a big backlog at the moment so it may be a while before he can consider my submission. In the meantime, for those interested, the disk image is available here.

The RKDP backup disk was used as a test pack during the RK11/RK05 restoration work, and thus was overwritten by the RK05 diagnostic read/write tests. It now contains a bootable RT-11 image, written via PDP11GUI. Mixed results on the other packs so far: some have had severe head crashes (see pic below) or are otherwise damaged to the point that I am hesitant to mount them. Some have been mysteriously unreadable. It looks like I can expect about 50% recovery. Results so far are tabulated here. I hope to be able to make other recovered images available soon, but since they contain original research materials I am trying to contact the authors for permission first.

Serial #	Label	OS	Notes
ZO 50511	MAINDEC-11-DZZAA-J-HB 9/21/74 M XXDP RKDP RK11 DIAGNOSTIC PACKAGE	DZQUD-A RKDP-RK11 MONITOR	[1974] MAINDEC diagnostics for PDP-11/40/45 CPU/MMU/FPU, MS11, DL11, DR11, RK11, LC11/LA30, KW11-L/P. Full recovery.
B1-75814	RKDB Backup	unknown	[unknown] Presumed to be backup of ZO 50511; used as test disk and overwritten.
B1-28320	Gravitational Radiation Experiment Boughn, Hollenhorst, Paik, Sears, Taber MSA	DOS/BATCH V09-20	[1976-77] Fortran and MACRO-11 codes, mostly calculations relating to resonant mass detector design. Full recovery.
AD-21279	BLAZQUEZ RT-11 AUG 83	RT-11FB (S) V04.00L	[1983] Fortran and MACRO-11 codes relating to image processing and display. Device driver code for DeAnza Systems ID-2000 display and Calcomp plotter. Names: Ken Dinwiddie (DeAnza codes), Art Vetter (Calcomp codes). Full recovery.
BAK 9069 A	W. COLSON	DOS/BATCH V9-20C	[1977-78] Full recovery.
AE 61745	FEL L.ELIAS	DOS/BATCH V9-20C	[1974-78] Head crash on read (ouch!) Partial recovery. Unrecovered data looks to be mostly OS files; may be patchable.
ZO 50399	TRANSPORT + DATA 1/18/80	DOS/BATCH V9-20C	[1980-83] Minor corrosion spot. Partial recovery. Data files and Fortran programs.
E172140	M. O'Halloran Ray Tracing R. RAND BBU PROGRAMS	TBD	[TBD]
B1-45441	RT 11	TBD	[TBD] Minor head crashes on media.
B1-24056	RDAS	DOS/BATCH V9-20C	[1970-78, 1983] Minor head crashes on media. Partial recovery. FEL related Fortran codes.
AE 20116	DEACON FEL	unknown	[unknown] Many corrosion spots on media; did not mount.
19177	Transport DOS/BATCH-9 V 20C	unknown	[unknown] Major head crash on media; did not mount.
B1-44898	RDAS9 - V20C	unknown	[unknown] Medium head crashes on media; did not mount.

Some of the RK05 packs obtained in a surplus auction downstream of Stanford's Hansen Experimental Physics Lab An RK05 disk platter with obvious head crashes RK05 pack spinning in drive with heads loaded