PDP-11/45: Parity error handling
Mon 25 May 2020 by Fritz Mueller[A catch-up article, documenting events of Jan/Feb 2019.]
At the end of the previous article, a bunch of repairs had been made to my MS11-L memory board. The associated MAINDEC diagnostic ZQMC was able to run cleanly but only with parity tests disabled. When parity tests were enabled, the parity fault LED on the MS11 would light (expected) and the machine would halt with ADRS ERR lit (unexpected...)
So the first step is to read and research how memory parity handling is implemented on the KB11-A CPU. Immediately here we run into some trouble:
-
The 1973 edition of the 11/45 Processor Handbook has a section 2.5.6, "Memory Parity", which states: "Parity errors cause the Central Processor to either trap through location 4 or to halt." There is also an Appendix E, "Memory Parity", which details CSRs for parity memory:
It is stated there that there are 16 of these, at addresses 772110-772146, each corrsponding to an 8K word block of address space.
By the 1976 version of the processor handbook, however, all of this information had been expunged. The new Appendix A, "UNIBUS Addresses", lists range 772110-772136 simply as "UNIBUS memory parity". Here, trap 4 is listed as "CPU errors", and trap 114 is listed as "Memory system errors". All subsequent revisions of the handbook state unambiguously that parity errors generate a trap 114.
-
What do the KB11-A processor maintenance manuals have to offer? Paragraph 7.7.7 of the 1972 KB11-A maintenance manual states:
A Parity error on the Unibus A is indicated by BUSA PA L high and BUSA PB L low. The parity error causes UNI PERF (Unibus parity error flag) to be set when MSYN is cleared. UNI PERF (1) L asserts UBCB PARITY ERR SET L during the pause cycle, which sets the console (CONF) flag and halts the CPU.
The semiconductor memory control EHA and EHB (enable halt) flip-flops may be set under program control to assert SMCB PE HALT if a parity error is detected. This input also asserts UBCB PARITY ERR SET L, which sets the console flag and halts the CPU. Thus, if either a Unibus A parity error or SMCB PE HALT L is asserted, the processor will be vectored to trap when the CONT switch is pressed.
Note that this text addresses how the CPU handles detected parity errors in both Unibus (first paragraph) and fastbus (second paragraph) memory systems. Unibus parity errors are stated to set the CONF flag and halt the CPU, just as I am seeing on my system... Fastbus parity handling (halt first vs. direct trap) can further be mediated by EHA and EHB, called out here to drawing SMCB in the MS11-B/C fastbus semiconductor memory print set.
But here, too, by the time we get to the later revision 1976 KB11-A,D maintenance manual, this information is revised. The updated description makes no further mention of CONF, halting, or halt control, and seems to imply that all reported parity conditions trap directly through 114.
-
How about contemporaneous memory systems? The MS11-B/C solid state memory systems released with the 11/45 (note: not what I'm running; I have the much later MS11-L) consisted of either MOS or bipolar memory matrices with an associated controller card (the M8110). These supported both Unibus and fastbus interfaces. Here, in the 1972 schematics, we see the implementation of the EHA/EHB halt control bits, mentioned above, in the upper left of sheet SMCB:
We can see the bit assignments here match the CSR layout from the 1973 processor handbook, and the associated MS11 maintenance manual from 1973 also describes them in its table 3-12:
And once again, by the 1974 revision of the same maintenance manual, no surprise: descriptions of the halt control bits have been expunged from table 3-12. Okay, we're starting to get a consistent picture here...
I don't know much about the core memory systems that were configured with the early 11/45s? It would be interesting to know if anything other than the MS11-B/C ever supported this older CSR layout.
-
Let's have a look at the KB11-A engineering drawings themselves. The set I've been using during my restoration dates from 1974. The first, most obvious, place to look is trap vector generation; this is accomplished on the lower left of drawing DAPE:
This small combinational net feeds trap vector bits to the K1MX constant multiplexer. One non-obvious wrinkle noted elsewhere on the drawing: vectors generated for reserved instruction (004), EMT (014), and TRAP (016) are further left-shifted, downstream, by microcode (state RSD.10, drawing FLOWS 12) to result in 010, 030, and 034 respectively. That's not strictly relevant to the discussion at hand, but might be helpful if pondering the logic implemented in the diagram above.
This drawing is definitely from the "post 114" era. On a parity error, we'll have ~IOT and ~PIRQ and ~SEGT, together driving TV02 high; that's our traditional vector 004. But here we also see UBCB PE TRAP (1) L, active low, entering from the left. When driven low, we'll get TV03 and TV06 high as well, all together generating vector 114.
Here we can see some clues, too, of how the change to 114 might have been bodged in: as drawn, TV01, TV02, TV03, TV04 and TV05*07 proceed nicely in order from bottom to top. But TV06, needed by the change as the most-significant "1" in "114", looks like it was just wedged in out of order on the drawing... Presumably, it makes use of a previously unused section of hex inverter E11. The change to activate TV03 here as well would have been a cut/jump at the inputs of E7.
And sure enough, here we see differences with my actual hardware! Here's part of the layout of module DAP from the '74 engineering drawings, and a snap the same corner of my DAP spare which is same as the one I'm currently running in the machine:
Note particulary that R17, a pullup for UBCB PE TRAP (1) L, is missing on my board. A little further work with the beeper shows that on my boards E7 pin 1 is connected directly to E7 pin 13, and is not connected to edge connector AP1. E11 pin 3 appears to be NC. Furthermore, examination of the backplane shows that there is no wire wrapped in place at DAP AP1 to deliver signal UBCB PE TRAP (1) from the UBC board. So, I think I can conclude we're not looking at a bug or component failure here; my 11/45 simply pre-dates the change from vector 4 to vector 114.
-
Okay for the vector, but what about the halt behavior? Here, the text quoted earlier from the 1972 KB11-A maintenance manual has our clue where to look. The parity derived signal eventually resulting in the halt on either Unibus or fastbus parity error is UBCB PARITY ERR SET L (note "SET" in the signal name here, don't confuse with UBCB PARITY ERR L...) The 1974 drawings imply that a fastbus parity err, but not a Unibus parity error, will halt the machine, in conflict with this text. But looking here, we see another bodge clue: the hookup at E68 pins 4 and 5 as drawn looks a little suspicious...
And indeed, on my hardware, E68 pins 4 and 5 are not connected together; rather, E68 pin 5 is connected to E79 (Unibus parity error flag) pin 5. So, Unibus parity errors will also halt this version of the 11/45 hardware, by design.
Some other differences related to parity are also apparent looking at my version of the UBC board. E57, seen above generating UBC PE ABORT L, is not populated. This seems related to some further refinement of abort sequencing, but the cirumstances surrounding the need for this aren't clear to me at this point. Also, jumper W1 and associated logic to entirely disable Unibus parity error detection are not present:
So, what does all this mean? Well, for one thing, there apparently isn't anything actually in need of repair here -- as far as I can tell, this version of the hardware is functioning per design, such as it is.
And as it turns out, with a now properly repaired MS11-L, actual parity errors are few and far between (I've yet to see any that weren't intentionally created by diagnostics.) According to Noel, stock Unix V6 doesn't do anything whatsoever with parity. RSTS/E V06C boot code seems to be properly probing and identifying the CSR on my MS11-L. And good old RT11 has seemed happy enough in the past. So I just may not need a totally up-to-date parity implementation on my machine.
There is still the issue of more broadly tracking down and implementing outstanding ECOs for this machine. I have so far had limited success in locating these (more on this next time!) I'm certainly equipped here to implement field cuts and jumps, but it might get tricky to track down newer versions of boards for any ECOs that involved total swaps to updated etches. In any case, in the absence of complete information on the ECOs I'm hesitant to cherry pick changes such as those identified here unless I am really blocked without them; better by far not to leave the machine in an undocumented "in-between" state.
Footnotes: a lot of the discovery documented here took place in the context of the enthusiast community on the cctalk mailing list, and also in private communications. Noel Chiappa and Paul Koning were both particularly generous with their time (thanks, guys!) Here are some interesting related bits that didn't fit directly in the narrative above, for completeness and for future reference:
-
On RSTS parity CSR sniffing, from Paul:
From: Paul Koning
To: cctalk
Subject: Re: PDP-11/45 RSTS/E boot problemFritz Mueller wrote:
There is a lot of inconsistent and incomplete information in the documentation about memory CSRs. They appear to come in different flavors depending on memory hardware; some of the earlier ones support setting a bit to determine whether parity errors will halt or trap the CPU, while some of the later ones (like my MS11-L) simply have "enable" and don't distinguish between halt and trap. I'm curious how OS init code sniffs out what memory CSRs there are, determines their specific flavors and, in a heterogeneous system, determines how much address space is under the auspice of each CSR? Maybe Paul and Noel can comment here wrt. RSTS and Unix respectively?
I quickly skimmed some RSTS INIT code (for V10.1). Two things observed:
1. At boot, INIT determines the memory layout. It does this by writing 0 then -2 into each location to see if it works. If it gets an NXM trap (trap to 4) or a parity trap (trap to 114) it calls that 1kW block of memory non-existent. For the case of a parity error, it tells you that it saw a parity error and is disabling that block for that reason.
2. In the DEFAULT option (curiously enough) there is a routine that looks for up to 16 parity CSRs starting at 172100. This happens on entry to the memory layout option. You can display what it finds by using the PARITY command in response to the "Table suboption" prompt.
It checks if the bits 007750 are active in the parity CSR, if so it takes that to be an address/ECC parity CSR. It figures out the CSR to memory association by going through memory in 1 kW increments, writing 3, 5 to the first 2 words, then setting "write wrong parity" in each CSR (007044), then doing BIC #3,.. BIC #5,... to those two test words, then reading them both back. This should set bad parity, and it scans all the CSRs to see which one reports an error (top bit in the CSR). If no CSR has that set, it concludes the particular block is no-parity memory.
I probably got some of the details wrong, the above is from a fast skim of the code, but hopefully it will get you started.
My machine currently has one MS11-L, which has the newer CSR layout referred to by Paul above (different than the much older MS11-B/C CSR layout depicted at the top of this article; see MS11-L docs for further details). RSTS init defaults->memory->parity on my system reports (correctly):
0K: 00000000 - 00757777 ( 124K) : 00
Presumeably, RSTS carries out this identification activity with the CSR report enable bits off, and the CSR error bits still function correctly in these circumstances; otherwise, per above, my machine would summarily halt during this process!
-
Noel, in some of his research, found Deeper magic from before the dawn of time re. evolution of the Unibus parity implementation before the era of the start of this article, bridging back to the KA11 (11/20) CPU. Quite interesting!
From: Noel Chiappa
Subject: Change in UNIBUS parity operation (Was: PDP-11/45 RSTS/E boot problem)
To: cctalkEven better, it claims to be able to control whether the memory uses odd or even parity! (How, for UNIBUS memory, I don't know - there's no way to do this over the UNIBUS.
So this really confused me, as the UNIBUS spec says parity is wholly within the slave device, and only an error signal is transferred over the bus. E.g. from the 'pdp11 peripherals handbook', 1975 edition (pg. 5-8): "PA and PB are generated by a slave ... [it] negates PA and asserts PB to indicate a parity error ... both negated indicates no parity error. [other combinations] are conditions reserved for future use."
The answer is that originally the UNIBUS parity operation was different, and that sometime around the introduction of the PDP-11/45, they changed it, which is apparently why Appendix E, about parity in the /45, says what it does!
I found the first clue in the MM11-F Core Memory Manual (DEC-11-HMFA-D - which is not online, in fact no MM11-F stuff is online, I'll have to scan it all and send it to Al); I was looking in that to see if the parity version had a CSR or not (to reply to Paul Koning), and on the subject of parity it said this: "The data bits on the bus are called BUS DPB0 and BUS DPB1." And there is nothing else on how the two parity bits are used - the clear implication is that the memory just stores them, and hands them to someone else (the master) over the bus, for actual use.
Looking further, I found proof in the "unibus interface manual" - and moreover, the details differ between the first (DEC-11-HIAA-D) and second (DEC-11-HIAB-D) editions (both of which differ from the above)!
In the first, Table 2-1 has these entries for PA and PB: "Parity Available - PA ... Indicates paritied data" and "Parity Bit - PB ... Transmits parity bit"; at the bottom of page 2-4 we find "PA indicates that the data being transferred is to use parity, and PB transmits the parity bit. Neither line is used by the KA11 processor."
(Which explains why, when, after reading about parity in the MM11-F manual, I went looking for parity stuff in the KA11 which would use it, I couldn't find it!)
In the second, Table 2-1 has these entries for PA and PB: "Parity Bit Low - PA ... Transmits parity bit, low byte" and "Parity Bit High - PB ... Transmits parity bit, high byte"; at the top of page 2-5 we find wholly different text from the above, including "These lines are used by the MP11 Parity Option in conjunction with parity memories such as the MM11-FP."
I looked online for more about the MP11, but could find nothing. I wonder if any were made?
This later version seems to agree with that Appendix E. I tried to find an early -11/45 system manual, to see if it originally shipped with MM11-F's, but couldn't locate one - does anyone have one? The ones online (e.g. EK-1145-OP-001) are much later.
It's also interesting to speculate about reasons why these changes were made; I can think of several! :-)
All for now!
PDP-11/45: V6 Unix attempts & MS11-L repairs
Mon 21 January 2019 by Fritz MuellerFollowing up on Noel's suggestion, I decided to give V6 Unix a try to see how it fared in comparison to the problems seen with RSTS/E V06C. I recently scored an additional RK05 pack from eBay, and decided to try and use that so I could keep my current RSTS/E pack intact.
Inspected the pack, and it looked in good shape, clean, with no apparent crashes on the media. Mounted it up and was able to do only a partial recovery. What I got looks like pretty generic RT-11/BASIC-11 stuff, so I'm not too concerned about attempting a complete recovery. Went ahead and reformatted the pack, after which I could read/write the entire pack with no bad sectors. So now I had two clean packs to work with.
Built a V6 Unix pack image from the Ken Wellsch tape under SIMH (using directions here). Transferred it over using PDP11GUI, and it
did boot in single-user mode. However, it immediately dumped core on the first ls
command... Tried a
multi-user Unix boot (what's to lose?) and this actually fared a bit better; able to ls
, but still dumped
core when trying to run the C compiler or do anything else memory-intensive.
So, all of this taken together made me (and others collaborating on the troubleshooting on cctalk) think that I might have a memory issue in the machine. My machine has a 256KB MS11-L; I had previously spot-checked this from the front panel by manipulating the KT11-C mapping registers and trying some writes/and reads within each bank. This was enough to identify and repair a few major problems (see this older blog post) and to get me this far. But I had never thoroughly and substantially beat this card up after things seemed to be working with RT-11. There was still also nagging concern that none of the heavier-weight KT11, MS11, KB11 "exerciser" MAINDEC diagnostics had yet been run to completion on the restored machine either...
The recommended DEC diagnostic for the MS11-L is ZQMC, but it is complicated, takes a long time to download, and the available sources don't exactly match the binary. So, probably better to work up my own standalone diagnostic to catch and fix obvious things... Thus followed about a week of part-time work working up and successively refining the following test code, and repairing identified problems (failed DRAMs) on the MS11-L along the way. This code maps and tests every memory location on the MS11-L, using KT11 memory management. It relocates itself so it can test the lowest physical bank as well. Tests include all-ones, all-zeros, write address to location, and a "random" data test which just uses program code test sequence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | KIPDR0=172300
KIPDR1=172302
KIPDR2=172304
KIPDR3=172306
KIPDR4=172310
KIPDR5=172312
KIPDR6=172314
KIPDR7=172316
KIPAR0=172340
KIPAR1=172342
KIPAR2=172344
KIPAR3=172346
KIPAR4=172350
KIPAR5=172352
KIPAR6=172354
KIPAR7=172356
SR0=177572
XCSR=177564
XBUF=177566
.ASECT
.=1000
START:
MOV #700,SP ;INIT STACK POINTER
;----- INSTALL TRAP CATCHERS
TRPS: CLR R0 ;CURRENT VECTOR
MOV #2,R1 ;VECTOR TARGET
CLR R2 ;HALT INSTR
MOV #100,R3 ;END VECTOR
1$: MOV R1,(R0)+ ;STORE TARGET AND ADVANCE
MOV R2,(R0)+ ;STORE HALT AND ADVANCE
ADD #4,R1 ;UPDATE TARGET
SOB R3,1$ ;LOOP OVER VECTORS
;----- INIT AND ENABLE MEMORY MAPPING
INITM: MOV #IPDRS,R0 ;SRC PDR INIT TABLE
MOV #KIPDR0,R1 ;DST KIPDR0
MOV #10,R2 ;DO EIGHT PDRS
MOV (R0)+,(R1)+ ;COPY AND ADVANCE
SOB R2,.-2 ;LOOP OVER PDRS
MOV #IPARS,R0 ;SRC PAR INIT TABLE
MOV #KIPAR0,R1 ;DST KIPAR0
MOV #10,R2 ;DO EIGHT PARS
MOV (R0)+,(R1)+ ;COPY AND ADVANCE
SOB R2,.-2 ;LOOP OVER PARS
MOV #1,@#SR0 ;ENABLE MEMORY MGMT
;----- TEST 32K MS11 BANKS AT PA 100000 THRU 700000,
; RELOCATE, THEN TEST BANK AT PA 000000
DOPASS: MOV #1000,R0 ;PAR FOR PA 100000
JSR PC,DOBANK ;TEST IT
MOV #2000,R0 ;PAR FOR PA 200000
JSR PC,DOBANK ;TEST IT
MOV #3000,R0 ;PAR FOR PA 300000
JSR PC,DOBANK ;TEST IT
MOV #4000,R0 ;PAR FOR PA 400000
JSR PC,DOBANK ;TEST IT
MOV #5000,R0 ;PAR FOR PA 500000
JSR PC,DOBANK ;TEST IT
MOV #6000,R0 ;PAR FOR PA 600000
JSR PC,DOBANK ;TEST IT
MOV #7000,R0 ;PAR FOR PA 700000
JSR PC,DOBANK ;TEST IT
MOV #1000,R5 ;RELOC TARGET PA:100000
JSR PC,RELOC ;GO DO IT
MOV #0000,R0 ;PAR FOR PA 000000
JSR PC,DOBANK ;TEST IT
;----- ALL DONE WITH PASS
MOV #0000,R5 ;RELOC TARGET PA:000000
JSR PC,RELOC ;GO DO IT
CLR @#SR0 ;DISABLE MEMORY MGMT
MOV #PCOMPL,R5 ;GET PASS COMPLETE MSG
JSR PC,PRSTR ;PRINT IT
HALT ;ALL DONE
;----- MAP A SINGLE 32K BANK AT VA 20000
DOBANK: MOV #KIPAR1,R1 ;WILL MAP USING KIPAR1 THRU KIPAR4
MOV #4,R3 ;FOUR KIPARS TO SET
CMP R0,#7000 ;UNLESS WE ARE IN PA 700000 BANK...
BNE 1$ ;IF NOT, SKIP AHEAD
MOV #3,R3 ;OTHERWISE, SCALE BACK TO 3 KIPARS
1$: MOV R0,(R1)+ ;SET A KIPAR AND ADVANCE
ADD #200,R0 ;INCREMENT VALUE FOR NEXT KIPAR
SOB R3,1$ ;LOOP OVER KIPARS
;----- CALCULATE END VA
MOV #120000,R1 ;MAPPED BANK END IS VA 120000
CMP @#KIPAR1,#7000 ;UNLESS WE ARE IN PA 700000 BANK...
BNE ZEROS ;IF NOT, SKIP AHEAD
MOV #100000,R1 ;OTHERWISE, END IS VA 100000
;----- ZEROS TEST
ZEROS: CLR R2 ;EXPECTED VALUE IS 000000
MOV #20000,R0 ;START AT VA 20000
1$: MOV R2,(R0)+ ;CLEAR A WORD AND ADVANCE
CMP R0,R1 ;AT END?
BNE 1$ ;IF NOT, LOOP
MOV #20000,R0 ;START AT VA 20000
2$: TST (R0)+ ;CHECK A WORD AND ADVANCE
BEQ 3$ ;IF ZERO, SKIP AHEAD
JSR PC,PRERR ;OTHERWISE, REPORT ERROR
3$: CMP R0,R1 ;AT END?
BNE 2$ ;IF NOT, LOOP
;----- ONES TEST
ONES: MOV #177777,R2 ;EXPECTED VALUE US 177777
MOV #20000,R0 ;START AT VA 20000
1$: MOV R2,(R0)+ ;WRITE A WORD AND ADVANCE
CMP R0,R1 ;AT END?
BNE 1$ ;IF NOT, LOOP
MOV #20000,R0 ;START AT VA 20000
2$: CMP (R0)+,R2 ;CHECK A WORD AND ADVANCE
BEQ 3$ ;IF EXPECTED VALUE, SKIP AHEAD
JSR PC,PRERR ;OTHERWISE, REPORT ERROR
3$: CMP R0,R1 ;AT END?
BNE 2$ ;IF NOT, LOOP
;----- WRITE LOCATION WITH ITS VA TEST
ADDRS: MOV #20000,R0 ;START AT VA 20000
1$: MOV R0,R2 ;USE VA AS TEST VALUE
MOV R2,(R0)+ ;WRITE A WORD AND ADVANCE
CMP R0,R1 ;AT END?
BNE 1$ ;IF NOT, LOOP
MOV #20000,R0 ;START AT VA 20000
2$: MOV R0,R2 ;USE VA AS TEST VALUE
CMP (R0)+,R2 ;CHECK A WORD AND ADVANCE
BEQ 3$ ;IF EXPECTED VALUE, SKIP AHEAD
JSR PC,PRERR ;REPORT ERROR
3$: CMP R0,R1 ;AT END?
BNE 2$
;----- "RANDOM" DATA TEST (PROGRAM AS TEST DATA)
RNDM: MOV #START,R2 ;SRC: START OF PROGRAM
MOV #20000,R0 ;DST: VA 20000
1$: MOV (R2)+,(R0)+ ;WRITE A WORD AND ADVANCE
CMP R0,R1 ;AT END?
BEQ 2$ ;IF SO, SKIP AHEAD
CMP R2,#END ;TIME TO RESET SRC?
BLO 1$ ;IF NOT, GO DO ANOTHER
MOV #START,R2 ;OTHERWISE RESET SRC
BR 1$ ;AND GO DO ANOTHER
2$: MOV #START,R2 ;SRC1: START OF PROGRAM
MOV #20000,R0 ;SRC2: VA 20000
3$: CMP (R2)+,(R0)+ ;COMPARE ONE WORD AND ADVANCE
BEQ 4$ ;IF SAME, SKIP AHEAD
MOV R2,-(SP) ;SAVE SRC1
MOV -2(R2),R2 ;FETCH EXPECTED VALUE
JSR PC,PRERR ;REPORT ERROR
MOV (SP)+,R2 ;RESTORE SRC1
4$: CMP R0,R1 ;AT END?
BEQ 5$ ;IF SO, SKIP AHEAD
CMP R2,#END ;TIME TO RESET SRC1?
BLO 3$ ;IF NOT, GO DO ANOTHER
MOV #START,R2 ;OTHERWISE RESET SRC1
BR 3$ ;AND GO DO ANOTHER
5$: RTS PC ;TESTS DONE, RETURN TO CALLER
;----- RELOCATE
RELOC: MOV R5,@#KIPAR1 ;MAP VA:020000 -> PA:(R5<<6)
CLR R0 ;SRC VA:000000
MOV #20000,R1 ;DST VA:020000
MOV R1,R2 ;FULL PAGE (4K WORDS)
MOV (R0)+,(R1)+ ;COPY A WORD
SOB R2,.-2 ;LOOP UNTIL DONE
MOV R5,@#KIPAR0 ;MAP VA:000000 -> PA:(R5<<6)
MOV #RELSTR,R5 ;GET RELOCATED STRING
JSR PC,PRSTR ;PRINT IT
MOV @#KIPAR1,R5 ;GET RELOCATION TARGET
ASHC #6,R4 ;SHIFT OVER FOR PA IN R4:R5
JSR PC,PRW18 ;PRINT IT
MOV #CRLF,R5 ;GET CRLF
JSR PC,PRSTR ;PRINT IT
RTS PC ;RETURN TO CALLER
;----- REPORT AN ERROR
PRERR: MOV @#KIPAR1,R5 ;GET KIPAR FOR MAPPED BASE
ASHC #6,R4 ;SHIFT OVER FOR PA IN R4:R5
ADD R0,R5 ;ADD IN ERROR VA
ADC R4 ;CARRY IF NECESSSARY
SUB #20002,R5 ;SUB VA OFFSET AND BACK UP ONE
SBC R4 ;BORROW IF NECESSARY
JSR PC,PRW18 ;PRINT PHYSICAL ADDRESS
MOV #DELIM1,R5 ;GET DELIMITER
JSR PC,PRSTR ;PRINT IT
MOV R2,R5 ;GET EXPECTED VALUE
JSR PC,PRW16 ;PRINT IT
MOV #DELIM2,R5 ;GET DELIMETER
JSR PC,PRSTR ;PRINT IT
MOV R0,R4 ;GET ADDRESS AFTER ERROR
MOV -(R4),R5 ;BACK UP AND GET ERROR VALUE
JSR PC,PRW16 ;PRINT IT
MOV #CRLF,R5 ;GET CRLF
JSR PC,PRSTR ;PRINT IT
RTS PC ;RETURN TO CALLER
;----- PRINT SIX DIGIT OCTAL NUMBER
PRW16: CLR R4 ;CLEAR UPPER WORD
PRW18: MOV #6,R3 ;SIX DIGITS TO PRINT
ASHC #1,R4 ;SHIFT IN MSB OF LOW WORD
1$: ADD #60,R4 ;MAKE INTO ASCII DIGIT
MOV R4,@#XBUF ;PRINT IT
TSTB @#XCSR ;CHECK IF XMIT DONE
BPL .-4 ;LOOP UNTIL SO
CLR R4 ;RESET OUTPUT CHAR
ASHC #3,R4 ;SHIFT IN NEXT THREE BITS
SOB R3,1$ ;LOOP DIGITS
RTS PC ;RETURN TO CALLER
;----- PRINT NULL-TERMINATED STRING
PRSTR: MOVB (R5)+,@#XBUF ;PRINT ONE CHAR AND ADVANCE
TSTB @#XCSR ;CHECK IF XMIT DONE
BPL .-4 ;LOOP UNTIL SO
TSTB @R5 ;CHECK IF END OF STRING
BNE PRSTR ;LOOP IF NOT
RTS PC ;ELSE RETURN TO CALLER
IPDRS: .WORD 077406,077406,077406,077406
.WORD 077406,000000,000000,077406
IPARS: .WORD 000000,000200,000400,000600
.WORD 001000,000000,000000,007600
DELIM1: .ASCIZ /: / ;POST-ADDRESS DELIMETER
DELIM2: .ASCIZ / / ;POST-CRC DELIMETER
CRLF: .ASCIZ <15><12> ;LINE DELIMETER
RELSTR: .ASCIZ /RELOCATED TO PA:/
PCOMPL: .ASCIZ /PASS COMPLETED/<15><12><15><12>
END: .END START
|
The code above is the end result of quite a bit of successive refinement. Things learned along the way:
-
At first the tests consisted only of writing and checking all-ones and all-zeros to each location. This did uncover one more bank with a stuck bit at only some addresses, that my previous spot-checking had missed. Lesson: you really gotta check every byte. Removed, socketed, and replaced the implicated DRAM, and my tests passed.
-
Maybe I fixed it, so after this I invested the download time to try the the DEC ZQMC diagnostic again. It ran better than I had seen before, proceeding through a few subtests, but soon started flagging a lot of errors that my diagnostic missed. Hmmm. Inspecting the DEC code, it seemed to be writing and checking random data at the time, not just all ones an zeros. Went ahead and implemented "random" data test in my diagnostic, and it immediately started implicating the same chips. Lesson: all-ones, all-zeros isn't good enough...
-
While I was at it, I implemented an additional "write/check each byte with its virtual address" test. Interestingly, this found most, but not all of the same chips as the random data test. Lesson: all-ones, all-zeros, and address in each word isn't good enough, either; you really do gotta have that "random" data test, too. At this point, went ahead and replaced three more implicated DRAMS, and my tests once again passed clean...
-
In the meantime, I did some more code reading on the DEC diagnostic, and found that various features could be enabled/disabled via the front panel switches. With some care, the diagnostic might also be restartable without having to wait for an entire re-download, if stopped carefully and in the right place. So I spent the time to re-download, and found with experimentation that the DEC diagnostic would now pass all banks of memory cleanly, as long as parity checking was disabled. Hmm...
-
Moving back to my diagnostic, I noticed that while it ran clean now on all banks, on a fresh power-up it would usually light the parity-error LED on the MS11-L on its first pass. Subsequent passes, after every location had been written at least once, were fine. Since the MS11-L doesn't have any fancy power-up init logic, it would make sense to see this if the program read locations without writing them first, but I didn't think my code did that. Enabling parity traps let me catch it in the act, and it was happening on a
CLR
instruction that I was using to init memory! Lesson: on an 11/45,CLR
is implemented like other single- operand, modifying instructions, and actually does a DATIP bus cycle from the destination before writing back a zero! So useMOV
instead ofCLR
to init memory if you are worried about tripping parity errors... Cleaned this up in my code, and my diagnostic now runs clean on my machine in all circumstances without ever tripping a parity error.
So, a lot of issues found and repaired on the MS11-L. Maybe still some issues with parity error handling, which seems to be halting the machine instead of taking a trap. Figured it might be worth a shot to try the operating systems again, so mounted the respective disks and tried both, and... exact same failures in both cases! Womp, womp...
Well, might as well continue to look into the parity error handling, since some things still seem fishy there. The DEC documentation is a bit murky here; various versions of the KB11-A maintenance manual and 11/45 processor handbook say different and somewhat contradictory things; some info in earlier editions is also removed from later ones. The available engineering drawings for the relevant parts of the KB11-A CPU look to have some significant differences from the actual boards I have on hand, and there are more than a few ECO's for these boards listed as relating specifically to parity handling, but for which no other information is available. And Noel has uncovered evidence that even the Unibus signaling related to parity may have been changed by DEC around the times of the early 11/45. Could be interesting...
PDP-11/45: RSTS/E V06C attempts
Mon 07 January 2019 by Fritz MuellerOkay, back in action after replacing the failed nand at B26 in the RK11-C. MAINDEC ZRKK now passes reliably. Wish I'd been able to get to the bottom of this at the show, but it was really hard to effectively debug on the floor there while the show was in progress -- you naturally want to stop and chat with everybody who drops by to take a look, so its hard to get into a good technical flow.
Since one of the RK11-C diagnostics I needed to use writes a pack destructively, I had to sacrifice my working RT-11 pack along the way. Rather than go back to the same old RT-11 image, I figured maybe time to try something different? RSTS/E would probably be more fun with multiple terminals and the DZ11 that I have anyway, and I've never actually played around with RSTS. So decided to give that a go...
Did some poking around looking at various available versions, and V06C seemed like a pretty good starting point: it's new enough to explicitly support all of my hardware (including the DZ11 and excepting the VT100), but old enough to still have relatively modest storage requirements so I can hope to run it with the single RK05 that I currently have working. There is also complete distribution tape available at rsts.org, and a fairly complete set of documentation at bitsavers.
Spent a bit of time reading the sysgen manual, and managed to sysgen under simh and generate a bootable RK05 image for my hardware. I then transferred this image to my single working pack using PDP11GUI; this is frustratingly slow (~3 hours to write a 2.5mb pack)!! I had forgotten how bad this is. I'm not quite sure why it is as slow as it is; it shouldn't take much more than 45 minutes to push that much data through a DL11 at 9600 baud, even without compression, and the PDP-11 disk subsystem can easily keep up with that. I'm not sure if PDP11GUI is spending a lot of time turning around the serial line, or has a bunch of per-character overhead, or...? In any case, I'm motivated to do something about it; more on this some other time soon.
So, unfortunately, the RSTS image which works under simh fails to completely boot on the real hardware. It runs through the initial "Option:" menu without trouble, and upon start the RSTS light chaser runs in the data lights on the front panel. Characters are echoed on the console terminal, but it never reaches code to print the banner or prompt for the initial control file. The system appears to be in a loop reading the same section of disk repetitively, and the display register shows a continuously increasing count.
Got a lot of help from folks over on the cctalk mailing list on this one, since I'm a newbie to RSTS. Paul K. provided some useful tips:
-
RSTS displays an error count in the display register, so that's why I see an increasing count there.
-
The "fancy" idle pattern that includes both the address and data lights apparently shows up in a later release of RSTS and requires a particular sysgen option, so its not surprising that I only see the pattern in the data lights on my machine.
-
The ODT debugger may be loaded with RSTS for startup debugging by configuring it using an undocumented option in the change memory layout section of the "DEFAULT" command at the boot prompt. Enter "ODT" there, and provide a space for it in the memory map. After that, at ^P at the console will take you to the ODT prompt.
Memory allocation table: 0K: 00000000 - 00123777 ( 21K) : EXEC 21K: 00124000 - 00213777 ( 14K) : RTS (BASIC) 35K: 00214000 - 00227777 ( 3K) : ODT 38K: 00230000 - 00757777 ( 86K) : USER 124K: 00760000 - End : NXM
-
A handy way to query the RSTS symbol table is to use the "PATCH" command at the boot prompt (one can also look through the .MAP files generated during sysgen):
Option: PA File to patch? Module name? Base address? ERL Offset address? Base Offset Old New? 041314 000000 005267 ? ^Z Offset address? ^Z Base address?
-
Paul also provided this procedure for triggering a crash dump from an ODT breakpoint under RSTS:
1. Make sure crash dump is enabled (in the "default" option). Start the system. Let it run for at least one minute. (I'm not entirely sure about older versions, but I think that a crash within one minute of startup is handled differently and doesn't do all the usual dump and restart machinery.)
2. Set the data switches all UP. (In SIMH, enter "D SR 177777".)
3. Set a breakpoint.
4. When you hit the breakpoint, change the PC to 52, like this:
0B:055244
_$7/055244 52
_P(you enter only "$7/" and "52
" and "P", the rest is output from ODT.) The system will write the crashdump and then automatically restart.
5. You should now have the crash dump in [0,1]CRASH.SYS
Further experiments coordinated by Paul then led to the conclusion that an error like this could reasonably be expected to be triggered by a corrupted INIT.BAC or BASIC.RTS file. This led me to wish to verify that the disk pack contents really matched the image file I was running successfully under simh. Some standalone code to dump a CRC of every sector on the pack seemed like it would be useful in this regard, so I coded up the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | RKDS=177400
RKER=177402
RKCS=177404
RKWC=177406
RKBA=177410
RKDA=177412
XCSR=177564
XBUF=177566
.ASECT
.=1000
START:
MOV #700,SP ;INIT STACK POINTER
;----- INIT CRC LOOKUP TABLE
MOV #10041,R0 ;CRC POLYNOMIAL
MOV #CRCTBL,R1 ;LOOKUP TABLE TO FILL
ADD #1000,R1 ;START FILLING FROM END (+256 WORDS)
MOV #377,R2 ;COUNT DOWN FROM INDEX 255
L0: MOV R2,R4 ;GET COPY OF INDEX
SWAB R4 ;MOVE TO UPPER BYTE
MOV #10,R3 ;LOOP OVER EIGHT BITS OF INDEX
L1: ASL R4 ;SHIFT, MSB TO CARRY FLAG
BCC L2 ;IF MSB NOT SET SKIP AHEAD
XOR R0,R4 ;ELSE XOR IN POLYNOMIAL
L2: SOB R3,L1 ;LOOP OVER BITS
MOV R4,-(R1) ;SAVE RESULT IN LOOKUP TABLE
DEC R2 ;COUNT DOWN
BPL L0 ;LOOP OVER TABLE ENTRIES
CLR R5 ;INIT SECTOR COUNTER
;----- PRINT START OF LINE
L3: MOV R5,R0 ;GET SECTOR COUNTER
JSR PC,PRNW ;PRINT IT
MOV #DELIM1,R0 ;GET POST-SECTOR DELIMETER
JSR PC,PRNSTR ;PRINT IT
;----- READ 8 SECTORS FROM DISK
L4: MOV R5,@#RKDA ;SET START SECTOR
MOV #DBUF,@#RKBA ;SET TARGET ADDRESS
MOV #-4000,@#RKWC ;READ 8 SECTORS (2K WORDS)
MOV #5,@#RKCS ;READ + GO
TSTB @#RKCS ;CHECK RKCS RDY BIT
BPL .-4 ;LOOP IF BUSY
;----- HANDLE ERROR IF ANY
BIT #100000,@#RKCS ;CHECK FOR ERROR
BEQ L5 ;SKIP AHEAD IF NOT
MOV #ERRSTR,R0 ;POINT TO ERROR INDICATOR
JSR PC,PRNSTR ;PRINT IT
MOV @#RKER,R0 ;GET ERROR REG
JSR PC,PRNW ;PRINT IT
BR L8 ;MOVE ON TO NEXT 8 SECTORS
L5: MOV #DBUF,R4 ;POINT TO START OF DATA JUST READ
;----- RUN CRC FOR ONE SECTOR. FOR EACH INPUT BYTE CH:
; CRC = CRCTBL[((CRC >> 8) ^ CH) & 255] ^ (CRC << 8)
L6: CLR R0 ;RESET CRC
MOV #1000,R1 ;LOOP OVER ONE SECTOR (256 WORDS)
L7: MOV R0,R2 ;GET COPY OF CRC
SWAB R2 ;MOVE HIGH BYTE DOWN
MOVB (R4)+,R3 ;GET NEXT INPUT BYTE TO PROCESS
XOR R3,R2 ;XOR ONTO MUNGED CRC
BIC #177400,R2 ;MASK OFF HIGH BYTE
ASL R2 ;TIMES TWO INDEX INTO LOOKUP TABLE
MOV CRCTBL(R2),R3 ;LOOKUP VALUE
SWAB R0 ;MOVE LOW BYTE OF CRC UP
CLRB R0 ;MASK OFF THE BOTTOM
XOR R3,R0 ;XOR IN THE LOOKED UP VALUE
SOB R1,L7 ;LOOP OVER BYTES
;----- PRINT CRC, DELIMIT AND LOOP
JSR PC,PRNW ;PRINT CRC, ALREADY IN R0
CMP R4,#DBUF+10000 ;END OF DISK BUFFER?
BEQ L8 ;IF SO, EXIT LOOP
MOV #DELIM2,R0 ;ELSE POINT TO POST-CRC DELIMETER
JSR PC,PRNSTR ;PRINT IT
BR L6 ;GO DO ANOTHER SECTOR
;----- DELIMIT END OF LINE, LOOP
L8: MOV #CRLF,R0 ;POINT TO LINE DELIMETER
JSR PC,PRNSTR ;PRINT IT
ADD #10,R5 ;MOVE AHEAD 8 SECTORS
CMP R5,#11410 ;AT END OF PACK?
BLT L3 ;IF NOT, GO DO THE NEXT 8
HALT ;ALL DONE!
;----- PRINT A WORD IN OCTAL
PRNW: MOV #6,R2 ;SIX DIGITS TO PRINT
MOV R0,R1 ;MOVE OUTPUT WORD OVER TO R1
CLR R0 ;RESET OUTPUT CHAR
ASHC #1,R0 ;AND SHIFT IN MSB TO START
L9: ADD #60,R0 ;MAKE INTO ASCII DIGIT
MOV R0,@#XBUF ;PRINT IT
TSTB @#XCSR ;CHECK IF XMIT DONE
BPL .-4 ;LOOP UNTIL SO
CLR R0 ;RESET OUTPUT CHAR
ASHC #3,R0 ;SHIFT IN NEXT THREE BITS
SOB R2,L9 ;LOOP DIGITS
RTS PC ;RETURN TO CALLER
;----- PRINT A NULL-TERMINATED STRING
PRNSTR: MOVB (R0)+,@#XBUF ;PRINT ONE CHAR AND ADVANCE
TSTB @#XCSR ;CHECK IF XMIT DONE
BPL .-4 ;LOOP UNTIL SO
TSTB @R0 ;CHECK IF END OF STRING
BNE PRNSTR ;LOOP IF NOT
RTS PC ;ELSE RETURN TO CALLER
DELIM1: .ASCIZ /: / ;POST-SECTOR DELIMETER
DELIM2: .ASCIZ / / ;POST-CRC DELIMETER
CRLF: .ASCIZ <15><12> ;LINE DELIMETER
ERRSTR: .ASCIZ /ERROR: / ;ERROR INDICATOR
CRCTBL: .BLKW 400 ;CRC LOOKUP TABLE
DBUF: .BLKW 4000 ;DISK DATA BUFFER
.END START
|
Running this indicated the RSTS pack was in good shape, and not corrupt. So, maybe I have had a lurking hardware bug in my memory system (a 256KB MS11-L), which never tripped up RT-11 and so has to date gone undiagnosed?
At this point, Noel suggested on cctalk that I give Release 6 Unix a try as well, and see if it suffers similarly. Worth a shot! Out of time for now, and back to the day gig tomorrow after holiday break. Happy new year, all!
PDP-11/45 Behaving Badly
Sun 09 December 2018 by Fritz MuellerWow, a year to the day since the previous post here! Not a lot of PDP-11 work this past year, with lots of other stuff like home improvements going on, but a few things worth catching up on here.
Mainly, I got brave this past year and decided to actually rent a van and take the 11/45 out of the basement to the VCF West show at the Computer History Museum in Mountain View. This was a lot more physical work than I had anticipated. Working on this thing a piece at a time, sitting in one place in the basement, you get kind of used to it and forget how much iron it actually is... But breaking it down, loading it into a van, unloading into the show, reassembling, then reversing the whole process at the end of the show is a stark reminder, both of the size of the machine and of my advancing age, ha! A huge thank-you to my workmate Brian, who selflessly gave up a weekend, a vacation day, and some mileage on his back to give me a hand. He has already informed me that "the answer for next year is 'no'." :-)
I suppose I should have expected it, but in the course of transportation to the show something shook loose resulting in a machine that wouldn't boot RT-11 when reassembled on the show floor (stupid bumpy rental van!) So my show became a two day live-troubleshooting exhibit. This was fine, and I think a lot of folks had fun jumping in and helping with troubleshooting (thanks, all!) There was a lot of interest and reminiscence about the machine and I met a lot of nice people. Still, a little disappointing, because I really had wanted people to be able to sit down and use the machine, and also because my head ended up in the machine the whole time I really didn't get to see the rest of the show or talk to other people about their exhibits! Ah well. In the end I did cajole a successful boot out of it, 15 mins. before the show closed, so at least a couple people got to sit down and play Adventure. Placed 3rd in the restoration category :-)
So, what went wrong? At the show I managed to isolate the problem to something intermittent related to interrupts from the RK11-C controller. I was still able to boot the RKDP diagnostic pack, since its bootstrap and monitor make very conservative use of processor and device interface features. Running through the diagnostics, managed to narrow down the problem to RK11-C completion polling after overlapped seeks. I guess RT-11 makes use of this feature.
I got the machine home and reassembled, and verified that the problem was still manifesting. Then many months passed, until I found some time to dig deeper into the problem just last night. The relevant failing diagnostic is ZRKK test 37, and the output is:
DRIVE 0
RK11 DIDN'T INTRUPT AFTER SK COMPLETED
PC RKCS RKER RKDS
014476 000310 000000 004713
SCP DIDN'T SET AFTER SEEK WAS DONE
PC RKCS
014526 000310
RK11 DIDN'T INTRUPT AFTER SK COMPLETED
PC RKCS RKER RKDS
014476 000310 000000 004712
SCP DIDN'T SET AFTER SEEK WAS DONE
PC RKCS
014526 000310
TIMOUT,PC=004536
And the relevant bit of the diagnostic listing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | 014362 2$: MOV RKVEC,R1
014366 MOV #3$,(R1)+ ;SET UP VECTOR ADRES FOR RK11 INTERUPT
014372 MOV #340,(R1) ;SET UP PSW ON INTERRUPT
014376 BIS #40,@RKDA ;ADRES CYLINDER #1
014404 MOV #111,@R0 ;SEEK, GO WITH IDE SET
014410 WAT.INT ,300 ;WAIT FOR THE DRIVE TO
;INTERRUPT AFTER ADRES WAS RECVD
;WAITING TIME= 1.4 MS FOR 11/20
;280 US FOR 11/45
;ERROR, IF INTERUPT DID NOT OCCUR
;BY NOW
014414 MOV #BADINT,@RKVEC ;RESTORE UNEXPECTED RK11 INTERRUPT
014422 MOV @R0,$REG0 ;GET RKCS
014426 ERROR 75 ;INTERRUPT DID NOT OCCUR AFTER
;SEEK WAS INITIATED WITH IDE SET
014430 BR 3$+4
014432 3$: CMP (SP)+,(SP)+ ;OK, IF RK11 INTERRUPTED TO THIS
;RESTORE STACK POINTER (FROM RK11 INTERRUPT)
014434 CMP (SP)+,(SP)+ ;RESTORE STACK POINTER (FROM
;WAT.INT)
014436 MOV #5$,@RKVEC ;SET UP NEW VECTOR ADRES FOR RK11
014444 BIT #20000,@R0 ;IS SCP CLEAR
014450 BEQ 4$ ;YES, BRANCH
014452 MOV @R0,$REG0 ;GET RKCS
014456 ERROR 76 ;SCP SET BEFORE SEEK TO LAST
;CYLINDER WAS DONE
014460 4$: WAT.INT ,56700 ;WAIT FOR DRIVE TO INTERRUPT
;AFTER SEEK WAS COMPLETED
;WAITING TIME=180 MS FOR 11/20
;36 MS FOR 11/45
014464 MOV #BADINT,@RKVEC :IT'S AN ERROR IF BY THIS TIME
;INTERRUPT HAS NOT OCCURERED
014472 JSR PC,GT3RG ;GO GET RKCS, ER, DS
014476 ERROR 77 ;RK11 DID NOT INTERRUPT AFTER SEEK (TO
;LAST CYLINDER) WAS DONE WITH IDE SET
014500 BR 5$+2
014502 5$: CMP (SP)+,(SP)+ ;OK, IF RK11 INTERUPTED TO THIS AFTER
;SEEK WAS COMPLETED. RESTORE
;STACK POINTER (FROM RK11 INTERRUPT)
014504 CMP (SP)+,(SP)+ ;RESTORE STACK POINTER (FROM
;WAT.INT)
014506 MOV #BADINT,@RKVEC ;RESTORE RK11 INTERRUPT VECTOR ADRES
;FOR UNEXPECTED INTERUTS
014514 BIT #20000,@R0 ;DID SCP BIT SET?
014520 BNE 6$ ;YES, BRANCH
014522 MOV @R0,$REG0 ;GET RKCS
014526 ERROR 53 ;SCP DID NOT SET AFTER RK11 INTERRUPTED
;INDICATING SEEK WAS
|
So, based on the fact that we don't hit error 75 from 14426 (and the fact that the previous test, #36, in this diagnostic is passing) unlike some previous issues the RK11 here is able to generate interrupts and the 11/45 CPU is fielding them. The issue seems related to the seek completion polling circuitry on the RK11.
This circuitry is described in section 3.3.2 of the RK11-C manual, and is detailed in engineering drawings D-BS-RK11-C-12, sheets 1 and 2. When a seek or reset is in progress for any drive and the IDE bit in the controller is set, the controller will poll all drives for completion when it is otherwise idle. When polling is active, a pulse train which drives a count through the polled drives should be visible at B27F2:
A quick look with the 'scope shows no joy here. This clock is initiated by signal POLL; which doesn't seem to be being asserted. Checking the origin of that signal takes us to B26 and A26:
Hmmm, one of these gates (the inverter at A26) is one that had failed and that I had repaired sometime last year... Reseated the socketed replacement on A26, reloaded the diagnostic, but still no go. Well at least it wasn't my repair job! Went ahead and pulled A26 and B26 and bench tested the gates. The 8-input nand that outputs to B26J2 does look fishy. Pulled and socketed the piece, and put a replacement and some spares on order at Jameco where I can pick them up on my way in to work tomorrow. All for now!
PDP-11/45: LA30 repair IV
Sat 09 December 2017 by Fritz MuellerReceived replacement components for the blown G380 solenoid driver channel. After this repair, all pins are firing and printing correctly. Calibrated left margin. Checked pin drive signal, which was within specifications and required no adjustment.
Went to check carriage return pulse timing calibrations, but as it turns out the G396 clock accelerator card in this LA30 has not had ECO 2 applied and therefore has no timing calibration pots. Carriage return seems to be functioning correctly and reliably after the left margin adjustment in any case.
Inspected and cleaned the M7910 interface card; it appears to be in good shape. Rejumpered the base address and interrupt vector for console operation. Slotted it into the PDP-11 in place of the DL11 I had been using up until now, and cabled up to the LA30. Booted to the M9301 monitor and then on into RT-11, and everything seems to be working fine! Here's a short video of the RT-11 boot, followed by the start of a session of Adventure:
PDP-11/45: LA30 repair III
Sat 02 December 2017 by Fritz MuellerDigging in on the flip-flops identified as potentially problematic in the previous post, found that E5 had failed. Pulled, socketed, and replaced; character generator now correctly clocks all five character columns:
After this repair, characters were printing full width, but two problems remained: about half of the characters printed in response to typing on the keyboard were the wrong character, and the top row was not printing at all on any character:
Looking at the incorrect characters problem first, it was clear that bit 4 was not being received by the character generator correctly. I was a bit worried that the SMC KR2376-17 scanner/ROM on the keyboard assembly might be at fault, since Mattis had had some trouble with his. This is a pretty cool part; a combined scanner and code translator, with an internal oscillator, rollover logic, debounce delay, and flexible interfacing:
...not to mention the very cool vintage ceramic/gold packaging (see below.) Fortunately, inspection with an oscilloscope showed that the outputs from the scanner were just fine; chasing downstream, the problem was found to be just a loose pin (SS) on the keyboard cable Berg connector. With that sorted, we now have this:
For the final issue with the top row not printing, verified that the problem followed a particular G380 solenoid driver card when swapping them around, and that with a functional G380 in the appropriate backplane slot pin 1 fires and prints correctly. Inspection of the problematic G380 revealed a failed power transistor and blown associated micro-fuse; replacement parts on order.
For the ribbon advance issue, I pulled the ribbon motors and disassembled their top-side reduction gear cases in order to gain access to the upper rotor bearings. Cleaning and lubrication of these bearings, plus a few more taps with a mallet after reassembly, achieved an improved bearing alignment. With the increased output torque, the ribbon now advances reliably.
Other minor items: Replacement vibration isolators arrived, and were installed. Threaded inserts in the fiberglass top shell that had pulled out were reattached with epoxy.
Have some more travel coming up for work, so may not be able to get back to this for a bit. Next steps will be repair of the failed solenoid driver channel, calibrations, then any debug necessary on the M7910 interface card for the PDP-11.
PDP-11/45: LA30 repair II
Sun 26 November 2017 by Fritz MuellerOkay, first thing to debug today is the ready light. This is lit by RDY LITE L, pin A16D2 (lower right of sheet M7721-0-1 in the LA30 engineering drawings). Logic probe showed this was correctly asserted low. Pulled the lamp and checked it with bench supply, and it checked fine. Verified +10 and ground at the lamp socket as well, so why isn't it lit? Turns out it is polarized, and the socket is soldered in backwards (?!). Corrected the socket, and ready light is working.
Noticed that ribbon is stalling occasionally. Ribbon tension seems to be overcoming the clutch on the take-up side. It seems tensioning drag on the inactive side is too high. Not sure what to do about this one yet; there is not much adjustable within the clutches, and the service manual only recommends replacement if they are out of spec (yeah, good luck finding one!)
Repeated the experiment with loopback jumper (A15R2 to A15C2). Turns out I had miscounted backplane pins the first time. With jumper placed correctly, I am now get a printing response to the keyboard. Not quite right, but definite progress:
Here I had typed :;L
, and then some other letters. You can see evidence of correct pins firing for the
first three characters, though either head movement or pin timing is off. Letter spacing appears more or less
correct for 80 characters per line.
Okay, hooked up the logic analyzer, and started to take a look at the character generator clocking to see if all the the columns are being clocked out correctly. The logic analyzer shows a malfunction consistent with the print behavior: character column clocking resets after two columns rather than proceeding through all five:
This signaling is mediated by a ripple counter on the M7724 character generator card:
So it looks like one or more of these 7474 quad flops has failed. I note on my chargen board that these are early 70's Nat Semi parts; Mattis had a very similar issue (search "7474" on this page) on his LA30 chargen with the same parts.
All I have time for this weekend; next time I'll get the chip clip on these for a closer look, then pull and replace the baddies.
PDP-11/45: LA30 repair
Sat 25 November 2017 by Fritz MuellerOnce again its been a little while since I've had to work on PDP-11 stuff or put any updates here; the day gig has been pretty intense lately.
Recent efforts have been focused on restoration of an LA30 printing terminal. This was really filthy (including a mouse nest, yuck) so in addition to the usual electronics work it had to be completely disassembled for proper cleaning and lubrication.
First off, the H735 power supply. This is a pretty straightforward supply, but has an oil cap in the ferro- resonant circuit that is listed as a PCB-containing component; replaced this with a modern equivalent. Also pulled and reformed all the large electrolytic caps on the bench per usual. No real trouble or surprises with this supply.
Logic assembly looks good; everything is there (mine is an LA30-P, the parallel interface version) with no obvious scorches or toast. Backplane intact and chip pin corrosion doesn't look too bad. Needed some compressed air to blow out all the dust bunnies.
Print head also looks to be in decent shape; all of the pins fire freely when activated momentarily with a 15 VDC bench supply.
Most of the work here was involved in disassembling the top section of the terminal, including the keyboard and carriage assembles, where most of the filth had accumulated. There are a lot of parts and pieces, with castings, bearings, machined shafts, stainless and brass throughout. This thing was really well built!
The ribbon-like paper drag springs were all either torn, mangled, or cracked/cracking; I fashioned some replacements by cutting and drilling 1/2" x 3" strips of .002 steel shim stock. The rubber shock isolation mounts for the carriage assembly had also hardened and decayed. These look very close to the original; I put some on order. Replaced the bumpers on the carriage rails with some less expensive 3/8" chassis grommets. After cleaning, hit the slide rails with dry film silicone lubricant (Molykote 557) and pivot and carriage cam plates pins with a good lithium grease (Molykote BR2 Plus).
Ribbon drive motor bearings were very gummy, and one of the ribbon motors had seized. These motors are quite serviceable though; you can pull the bottom bearing cap and remove the rotor, clean the rotor shaft and bearings of old lubricants, apply fresh and reassemble. These are self-aligning bearings, so don't forget to give the assembly a few taps all around with a mallet after reassembly to shake them into true.
Consumables: compatible ribbons are still plentiful on eBay, so I ordered a few. Paper is an unusual width at 9-7/8". A few vendors on Amazon still seem to carry it, but it might be wise to lay in stock of a carton or two while it is still obtainable.
Fired it up after reassembly. No smoke (good!) and it feeps once reassuringly at power on. Ribbon motors and clutches seem to be working, and the ribbon advances. Activating the ribbon reverse switches manually reverses the ribbon movement per expectation.
If the carriage is closed with paper loaded, the print head will home left and then move right after about a second or so (per expectation; "last character visibility" feature) and ribbon advance halts. Local line feed from the front panel switch works. All of this indicates a good deal of the logic and the motor drive are already working correctly.
However: the front panel "ready" indicator does not light, and a quick loopback test (jumper A15R2 to A15C2 on the backplane) does not print any characters in response to the keyboard. Will pick up here with logic debug next time.
PDP-11/45: VT52 Keyboard Repair
Sat 15 July 2017 by Fritz MuellerThe VT52 had a broken ESC key, and with RT-11 up and running I was motivated to dig in and fix it (you need that ESC key if you are going to run the K52 editor). Pulling the keycap and giving things a look over, the leaf contacts and the plastic plunger that activate the key looked fine. Need to get at the keyboard module itself to troubleshoot, and on a VT52 that means opening the thing all the way up and pulling the main boards. In we go...
Extracted the keyboard module, powered it from my bench supply, and used a breadboard and some jumpers to drive the key select decoders. Key closure on the back of the module was intermittent, but some flexure of the entire keyboard PCB seemed to be affecting it. Replaced/reflowed the solder on the back of the key switch and that seems to have fixed it.
Back together, working well now. Test drove it for a while under RT-11...
PDP-11/45: Data Recovery
Sun 09 July 2017 by Fritz MuellerOkay, the system is working well enough now to start attempting recovery and archive of the dozen or so RK05 packs that I have on-hand. These were all obtained (along with the RK05 drives, controller and power supply) in a surplus auction downstream of Stanford's Hansen Experimental Physics Lab, sometime in the early '90s.
The packs date from the mid-70's to early-80's, and the labels indicate contents related to experiments and research projects taking place at the lab at that time. One particular pack seems associated with the Stanford Gravity Wave Project, which built early resonant mass detectors. Other packs labeled "FEL" would be related to early free-electron laser research. Many of the names on the packs are are readily found on related scientific publications from the time.
The process for dealing with these packs involves opening them up for inspection and cleaning before mounting them, with hopes of avoiding beating up or destroying the drive heads and/or media with head crashes. Much has been written about pack cleaning on the classiccmp mailing lists and in the vcfed forums, but briefly the process involves some clean-room gloves, lint-free wipes, and anhydrous isopropyl alcohol. The outside of the pack is first cleaned of dust and grime, then the packs are opened and inspected and the disk surfaces given a scrub with the wipes and alcohol. If a pack seems in good enough shape to mount, it is spun up and run in a drive with head load disabled for about a half-hour. This gets a good air-flow over the disk to blow out remaining loose particulates and also lets the disk come up to thermal equilibrium. After that the heads are loaded, with a finger standing by on the unload switch in case there are any bad noises...
I've already been dealing with two of the packs extensively during the restoration: one is an RKDP diagnostics pack, and the other was a backup pack of same. I was able to capture a complete, error-free image of the RKDP pack using PDP11GUI. This seems to be an earlier version of this disk that what is already available on bitsavers; I've sent the bits to Al Kossow, but as I understand it his project has a big backlog at the moment so it may be a while before he can consider my submission. In the meantime, for those interested, the disk image is available here.
The RKDP backup disk was used as a test pack during the RK11/RK05 restoration work, and thus was overwritten by the RK05 diagnostic read/write tests. It now contains a bootable RT-11 image, written via PDP11GUI. Mixed results on the other packs so far: some have had severe head crashes (see pic below) or are otherwise damaged to the point that I am hesitant to mount them. Some have been mysteriously unreadable. It looks like I can expect about 50% recovery. Results so far are tabulated here. I hope to be able to make other recovered images available soon, but since they contain original research materials I am trying to contact the authors for permission first.
Serial # | Label | OS | Notes |
---|---|---|---|
ZO 50511 | MAINDEC-11-DZZAA-J-HB 9/21/74 M XXDP RKDP RK11 DIAGNOSTIC PACKAGE |
DZQUD-A RKDP-RK11 MONITOR | [1974] MAINDEC diagnostics for PDP-11/40/45 CPU/MMU/FPU, MS11, DL11, DR11, RK11, LC11/LA30, KW11-L/P. Full recovery. |
B1-75814 | RKDB Backup | unknown | [unknown] Presumed to be backup of ZO 50511; used as test disk and overwritten. |
B1-28320 | Gravitational Radiation Experiment Boughn, Hollenhorst, Paik, Sears, Taber MSA |
DOS/BATCH V09-20 | [1976-77] Fortran and MACRO-11 codes, mostly calculations relating to resonant mass detector design. Full recovery. |
AD-21279 | BLAZQUEZ RT-11 AUG 83 | RT-11FB (S) V04.00L | [1983] Fortran and MACRO-11 codes relating to image processing and display. Device driver code for DeAnza Systems ID-2000 display and Calcomp plotter. Names: Ken Dinwiddie (DeAnza codes), Art Vetter (Calcomp codes). Full recovery. |
BAK 9069 A | W. COLSON | DOS/BATCH V9-20C | [1977-78] Full recovery. |
AE 61745 | FEL L.ELIAS | DOS/BATCH V9-20C | [1974-78] Head crash on read (ouch!) Partial recovery. Unrecovered data looks to be mostly OS files; may be patchable. |
ZO 50399 | TRANSPORT + DATA 1/18/80 |
DOS/BATCH V9-20C | [1980-83] Minor corrosion spot. Partial recovery. Data files and Fortran programs. |
E172140 | M. O'Halloran Ray Tracing R. RAND BBU PROGRAMS |
TBD | [TBD] |
B1-45441 | RT 11 | TBD | [TBD] Minor head crashes on media. |
B1-24056 | RDAS | DOS/BATCH V9-20C | [1970-78, 1983] Minor head crashes on media. Partial recovery. FEL related Fortran codes. |
AE 20116 | DEACON FEL | unknown | [unknown] Many corrosion spots on media; did not mount. |
19177 | Transport DOS/BATCH-9 V 20C | unknown | [unknown] Major head crash on media; did not mount. |
B1-44898 | RDAS9 - V20C | unknown | [unknown] Medium head crashes on media; did not mount. |