PDP-11/45: Diagnostics VII - KT11 MMU

Sat 13 August 2016 by Fritz Mueller

Moving on to the KT11 MMU: running the first diagnostic in the CKT suite, got error reports at 010340, 010560, and 011000. Consulted the diagnostic listings, and these particular tests have to do with D-space translations from kernel, supervisor, and user modes. The D-space logic is largely on module SSR, so I swapped this out for a spare. After that, I was able to pass the full suite of basic MMU tests:

Diagnostic	BEL	Description	Status
CKTAB0.BIC	017412	KT11-C basic logic part 1	pass
CKTBC0.BIC	015674	KT11-C basic logic part 2	pass
CKTCA0.BIC	023304	KT11-C access keys	pass
CKTDA0.BIC	016360	KT11-C MTPD and MTPI	pass
CKTEB0.BIC	015310	KT11-C MFPD and MFPI	pass
CKTFD0.BIC	016422	KT11-C aborts	pass

Put the failing SSE module in the repair queue along with the other failed spares I've identified along the way, and will return to troubleshoot/repair it later. For now, things are looking pretty good with the CPU! I still need to run and pass the more heavyweight diagnostics: the 11/45 instruction exerciser, KT11 exerciser, and MS11-L exerciser. All three of these still seem to have halts, but they are quite complicated diagnostics in comparison to the rest, making use of additional peripherals, etc. I'll need to study these a bit before I can be sure I am using them correctly. I have also skipped the power fail diagnostics for now as I will need to restore some core memory in order for these to work correctly.

Next up will be to work on the FPU...

PDP-11/45: Diagnostics VI - GRA ALU PROM repair

Sun 07 August 2016 by Fritz Mueller

Data I/O Series 22 PROM programmer from eBay showed up, as well as some unprogrammed Signetics 82S123. Punched in the subsidiary ALU control ROM contents from the listing on GRAK in the 11/45 engineering drawings and burnt a new PROM. Put a socket and the new PROM in place of the failed part on my original GRA, slotted it into the CPU, and success! Diagnostic CKBOA0 now passes. I will probably return to the other faulty GRA at a later point, as it is partially diagnosed and I like to have spares working and ready to go.

Next time I'll be moving on to the CKT series of tests for the KT11 memory management cards...

Data I/O series 22 PROM programmer, arrived from eBay GRA board with replaced ALU PROM

PDP-11/45: Diagnostics V - D0AA0-D0MA0, CKBOA0

Sun 31 July 2016 by Fritz Mueller

The day gig has been keeping me pretty busy for the last couple of weeks, but had some time to work on the PDP-11 again this weekend, so here's an update.

Looking a little deeper at the diagnostics database over on retrocmp.com, I realized that I had skipped the entire set of generic 11-family "D0" tests. Downloaded and ran these via PDP11GUI and they all passed. BEL character patch locations, as described previously, are summarized here for future reference:

Diagnostic	BEL	Description	Status
D0AA0.BIN	014212	Branch	pass
D0BA0.BIN	004336	Con branch	pass
D0CA0.BIN	005526	Unary	pass
D0DA0.BIN	016370	Binary	pass
D0EA0.BIN	010562	Rotate/shift	pass
D0FA0.BIN	017224	CMP equality	pass
D0GA0.BIN	013650	CMP non-equality	pass
D0HA0.BIN	013434	Move	pass
D0IA0.BIN	014126	Bit set clear test	pass
D0JA0.BIN	007472	Add	pass
D0KA0.BIN	007124	Subtract	pass
D0LA0.BIN	015722	Jump	pass
D0MA0.BIN	003250	JSR RTS RTI	pass

Of the "CKB" series of tests, CKBOA0 (11/45 states) is the only one I that is not yet passing. Looking into this a little further, the first failing sub-test is T65:

010540 010701                  T65:    SCOPE                    ;
010542 012737  030000  177776          MOV     #PUM,@#PSW       ;KERNEL MODE, PREV USER MODE
010550 012706  000500                  MOV     #KPTR,KSP        ;SET KERNEL STACK POINTER
010554 012716  000700                  MOV     #UPTR,(KSP)
010560 106606                          MTPD    USP              ;SET USER STATCK POINTER
010562 005067  170110                  CLR     UPTR-2
010566 052737  140000  177776          BIS     #UM,@#PSW        ;USER MODE, PREV USER MODE
010574 106506                          MFPD    USP              ;PUSH USER STACK POINTER ONTO USER STACK
010576 042737  140000  177776          BIC     #UM,@#PSW        ;KERNEL MODE, PREV USER MODE
010604 106506                          MFPD    USP              ;PUSH USER STACK POINTER ONTO KERNEL STACK
010606 022716  000676                  CMP     #UPTR-2,(KSP)    ;CHECK THAT USER STACK POINTER WAS
010612 001401                          BEQ     .+4              ;PUSHED PROPERLY (ONCE)
010614 000000                          HLT                      ;ERROR!
010616 022767  000700  170052          CMP     #UPTR,UPTR-2     ;CHECK THAT USER STACK POINTER IS ON THE
010624 001401                          BEQ     .+4              ;USERS STACK
010626 000000                          HLT                      ;ERROR!

This runs amok on the MFPD instruction at 010574, which should push the user stack pointer onto the user stack. Instead, the user stack pointer is pushed to memory at an incorrect address; 010676 instead of 000676. This actually overwrites subsequent test code. Since the value pushed is 000700, a hard-coded loop is created that prevents the test from completing the pass even if resumed from halt.

The relevant states in the microcode flow here are MFP.80, MFP.90, and MFP.10:

Stopping at T2 of MFP.10 using the KM11, I can see that the correct value 000700 was fetched to DR (as displayed by the console address lights), but the incorrect value of 010676 is appearing at the output of the ALU/shifter (as displayed by the console data lights when set to data paths). Throwing the DAP card out on extenders and taking a look around with a logic probe revealed that the errant bit 12 is sourcing from the ALU. At each slice of the ALU, function selectors S3-S0 are correct, CIN is correct, and overall B-mux constant value "2" is correct. The errant bit is arriving to the ALU from the A-mux...

Chasing this upstream, A-mux selectors S1,S0 are correct, but the bad bit arrives to the mux input on GRAH SR12. Hmmm, maybe this is one of the things the "BAD" sticker on the GRA is referring to... Next step is to throw the GRA on the extender, and chase the signal back towards SR and the register files. However, here I hit a snag: the M9301 monitor does not run correctly when the GRA is on the extender! That's pretty weird. Some investigation with the KM11 and some hand-toggled instructions revealed that at least the Z status bit is not set correctly/reliably when the card is on the extender. Some of the Z bit logic lives on the GRA also, so I can take a look at that, but I am now out of time for this weekend. Next time!

A few other miscellaneous notes in wrap-up:

I have been running with the spare GRA marked "BAD" because the first GRA I was using turned out to have a failed ALU subsidiary PROM. In the meantime I tracked down a PROM programmer and some compatible parts on eBay -- these should arrive sometime this week at which point I should be able to repair the original board and give it another try.
The uPB feature of my home-brew KM11 really doesn't work quite right. It often stops the machine at the requested micro-state but on the wrong instruction (skipping the first occurrence of the target state seemingly). This caused me a great deal of confusion today, as I was stepping through flows at a different program location than I had assumed, until I finally noticed the address lights on the console.
ESC key on the VT52 is non-functional, making it impractical to use for RT-11. The key mechanism looks okay from the top (thanks for more helpful advice from the vcfed forum!). I think I'll need to pull the keyboard PCB and re-flow the solder on the affected mechanism as a next step.
Looking forward to checking out Vintage Computer Fest West sometime next weekend!

PDP-11/45: Diagnostics IV - CKBME0

Sun 17 July 2016 by Fritz Mueller

Some progress with the CKBME0 diagnostic mentioned previously. It seems the concern with how the test behaves wrt. preconditions of the serial interface was well founded.

In order to debug more easily, I extracted the failing test and built a small loop around it, with a pass counter and display register update, etc. In the original test suite, a RESET instruction is executed immediately prior to the failing test, and it takes some time to come around the failing test on each pass, so I included a RESET and a delay loop in my test code as well. I then got failure modes and rates consistent with the original test suite.

The experiments previously described had indicated timing sensitivity (e.g. running on the RC maintenance clock at 50% clock speed changed the pass failure rate from ~50% to 100%) so I began to think more seriously about timing between the processor and the serial card, and how the time taken to circulate the entire suite of tests could affect the precondition of the serial interface when entering the test in subsequent passes. A re-read of the DL11 documentation showed that the transmit data is also double-buffered; if the transmit shift register is empty, a character written to the output buffer will be latched to the transmit shift register causing the output buffer to go ready again almost immediately.

I inserted the following code before the BIS/WAIT sequence in the original diagnostic (listed previously), which establishes consistent preconditions (shift register full, buffer empty) before the BIS. Success rate went to 100%:

1
2
3

        MOV     #40,@#177566    ;ENSURE XMIT SHIFT REGISTER HAS SOMETHING TO CHEW ON
L0:     TSTB    @#TTCSR         ;CHECK XMIT BUFFER
        BPL     L0              ;LOOP UNTIL READY, ENSURES INT IMMEDIATELY AFTER BIS

I then further verified that the unmodified original diagnostic suite passes 100% if I turn the M7800 down to 4800 Baud. Worth noting when trying to run these older diagnostics!

PDP-11/45: VT52 repair

Sat 16 July 2016 by Fritz Mueller

Replacement oscillator arrived for the VT52, so spent some time getting it back going again. Things got much better with a stable timing chain, but some glitchiness remained -- tracked this down to the socketed microcode ROMS which just required a reseat.

Here you can see the new oscillator fitted (silver rectangular can with tie-wrap). The microcode ROMS are the four socketed chips towards the right in the picture. Interestingly, the schematic I have calls for 8 ROMS of half the size of the ones that are in here, and indeed you can see the unpopulated spaces for these on the board.

VT52 open for repairs with new clock oscillator installed

PDP-11/45: Diagnostics III - CKBME0

Sat 09 July 2016 by Fritz Mueller

Okay, digging into the CKBME0 traps diagnostic now in more detail. Here I've transcribed the source of the failing test from the older available diagnostic listing, then re-assembled it at the address matching the more modern binary. This makes it a little easier to follow along while debugging the newer binary:

013604 000230                          SPL     0               ;SET PRIORITY LEVEL 0
013606 012706  000500                  MOV     #STKPTR,%6      ;SET STACK PTR
013612 012737  013650  000064          MOV     #TTY7A,@#TPVEC  ;LOAD TTY INTERRUPT VECTOR
013620 012737  013644  000014          MOV     #TTY7B,@#BPTVEC ;LOAD 'T' BIT TRAP VECTOR
013626 005002                          CLR     %2              ;CLEAR INDICATOR
013630 052737  000100  177564          BIS     #100,@#TTCSR    ;ALLOW INTERRUPT INTERRUPT OCCURS AFTER
                                                               ;THIS INSTRUCTION & BEFORE NEXT
013636 000001                  WAIT1:  WAIT                    ;WAIT FOR AN INTERRUPT
013640 005202                          INC     %2              ;INCREMENT INDICATOR
013642 000000                          HLT                     ;ERROR! NO 'T' TRAP AFTER INTERRUPT
013644 000000                  TTY7B:  HLT                     ;ERROR! 'T' BIT TRAPPED OUT OF WAIT
013646 000424                          BR      TTY7EX          ;EXIT TEST
013650 012737  000040  177566  TTY7A:  MOV     #40,@#177566    ;TYPE SPACE CHAR
013656 012737  013674  000064          MOV     #TTY7C,@#TPVEC  ;REPOSITION TTY INT VECTOR
013664 012766  000020  000002          MOV     #20,2(6)        ;PUT 'T' BIT IN RETURN STATUS
013672 000006                          RTT                     ;RETURN TO WAIT WITH 'T' BIT SET
                                                               ;AND WAIT FOR TTY INTERRUPT WHEN NULL
                                                               ;CHARACTER IS TYPED
013674 012737  013716  000014  TTY7C:  MOV     #TTY7D,@#BPTVEC ;REPOINT 'T' BIT TRAP VECTOR AFTER
                                                               ;TTY HAS INTERRUPTED
013702 005037  177564                  CLR     @#TTCSR         ;DISABLE INTERRUPT ENABLE
013706 012737  000015  177566          MOV     #15,@#177566
013714 000006                          RTT                     ;RETURN TO INST FOLLOWING WAIT WITH 'T'
                                                               ;BIT SET
013716 000240                  TTY7D:  NOP
013720 012737  000016  000014  TTY7EX: MOV     #BPTVEC+2,@#BPTVEC;RESTORE VECTORS TO HALT AT
013726 012737  000066  000064          MOV     #66,@#TPVEC     ;VECTOR +2
013734 005302                          DEC     %2              ;CHECK INDICATOR
013736 001401                          BEQ     .+4
013740 000000                          HLT                     ;ERROR! DID NOT DO INC INST AFTER INTERRUPT

This test seems to be designed to return from an interrupt handler to a WAIT instruction, with the T bit set in the PSW and a serial xmit interrupt pending. It verifies that the WAIT still "waits" in this circumstance. It also verifies that a trace trap does occur after the immediately following INC instruction when the xmit interrupt subsequently terminates the WAIT.

One potential problem with this test concerns the apparent assumption that enabling the xmit interrupt will cause an immediate trap before the subsequent WAIT instruction. This is true if the serial transmitter is empty, but if the transmitter is ever full/busy when this code is entered this assumption may not hold. Not sure yet if this will ever be a problem for this test given the surrounding code.

In any case, as currently written this routine fails about 50% of the time on my 11/45. The failure mode is that the processor sits at the WAIT instruction, (address display 013640, PC+2). Intervention with the console halt switch (halt then back to enabl) breaks the WAIT microcode loop; console cont then takes us to the halt at 013740 (address display 013742, PC+2).

The fact that the routine is tailing out through the halt at 013740 without hitting the halts at 013642 or 013644 is interesting; this implies that the second serial xmit interrupt to TTY7C has executed. This is verified by examining the break trap from the console after the test hangs up on the WAIT -- in the failure case, it is already reset to point to TTY7D. So the failure mode seems to be that the return from the second xmit interrupt sometimes goes to the WAIT instead of the subsequent INC.

Here is the microcode flow around the WAIT instruction. The horizontal line across the top is the A fork:

Using the KM11, in the failure case I can see the T bit set and the microcode looping through states WAT.00, WAT.20, WAT.30, WAT.11, which seems expected. I have also verified that executing a WAIT without the T bit set loops through states WAT.00 and WAT.10.

Lastly, running on the RC maintenance clock at about half the usual clock frequency makes the failure case happen almost 100% of the time.

Next I'll be needing to learn more about the BRQ logic, and in particular the mechanism by which the second xmit interrupt nominally causes INTRF to be asserted. Understanding that should lead me to some things to check with the logic probe and analyzer...

PDP-11/45: Diagnostics II

Mon 04 July 2016 by Fritz Mueller

Investigated some of the halted diagnostics a bit today. CKBGB0 (SPL instruction test) was halting at 001404. Looking at the sources, the diagnostic was waiting at this point for a transmit interrupt from the DL11 that didn't seem to be arriving. Some troubleshooting turned up that the vector address on my DL11 was jumpered incorrectly. Fixed this, and the diagnostic now passes.

CKBME0 (11/45 traps test) is a bit more complicated. The halt address of 005320 here indicates that the floating point coprocessor is detected but not trapping per design. Pulled the floating point cards for now; the diagnostic now runs through several passes successfully, but regularly hangs up at 013640. Hitting the halt switch when it is hung up displays 000342 in the address lights, then with a couple continues it will start up again, run a bunch more passes, but sooner or later hang up at 013640 again with the same behavior. This behavior is a little more difficult to decode because the diagnostic itself is more complicated, and also the binary available from classicmp is a later revision than the available source code so the addresses don't quite match up. So I'll need to spend a little more time reading the diagnostic sources and examining the disassembly in PDP11GUI to make sense of this one. And it looks like there will be some downstream work to debug the floating point unit as well; I haven't studied its design yet at all.

PDP-11/45: Diagnostics with PDP11GUI

Sun 03 July 2016 by Fritz Mueller

Okay, now that serial is straightened out, on to running diagnostic tests via PDP11GUI. PDP11GUI itself, as well as a thorough and useful database of DEC diagnostic programs, are available at http://www.retrocmp.com/tools.

Since my home computer is a MacBook, I had intended to run PDP11GUI under Wine. I ran into a problem with this where PDP11GUI under Wine could not generate its pre-processed machine description temp file, seemingly because of some incompatible behavior wrt. multiple backslashes in pathnames. Rather than fight with this too long, I just sprang for a Windows 10 license and installed a Windows 10 VM; it will come in handy for other Windows-only tools that have been tweaky to use under Wine as well.

With PDP11GUI up and running under Windows, results of the initial set of 11/45 CPU diagnostics were very encouraging! In summary:

Diagnostic	BEL	Description	Status
CKBAB0.BIC	002562	SXT instruction	pass
CKBBB0.BIC	003604	SOB instruction	pass
CKBCB0.BIC	007410	XOR instruction	pass
CKBDC0.BIC	007262	MARK instruction	pass
CKBEC0.BIC	002212	RTI/RTT instructions	pass
CKBFD0.BIC	002272	stack limit	pass
CKBGB0.BIC	001446	SPL instruction	halt 001404
CKBHB0.BIC	003762	11/45 registers	pass
CKBIB0.BIC	013746	ASH instruction	pass
CKBJA0.BIC	014722	ASHC instruction	pass
CKBKA0.BIC	014430	MUL instruction	pass
CKBLA0.BIC	011574	DIV instruction	pass
CKBME0.BIC	016000	11/45 traps	halt 005320
CKBNC0.BIC	004702	PIRQ instruction	pass
CKBOA0.BIC	013640	11/45 states	halt 000610

Note that these tests are written to output an ASCII BEL to the console on each successful pass. The terminal built in to PDP11GUI doesn't sound when given a BEL, however, so it is convenient to patch the BEL literal in the programs to a visible character (e.g. 000052, ASCII '*') before running them. This may be done in the memory loader window in PDP11GUI after "Load" but before "Deposit all". The patch address I identified for each diagnostic is listed in the table above as well for convenience.

So, out of this initial set of tests, only three halts to investigate. I'll be posting more information here as I look further into these.

Regarding the backplane SPC issue discussed in the previous post: Marty from the vcfed.org forum did some investigation of his 11/45, and reports that he does have +15V (actually +12V on his system) distributed to pin CU1 on slots 26-28. It seems to be wired over from slot 15 on his system, but it is not clear whether this was a factory wire, an ECO, or a user mod.

Marty's 11/45 also has no power distributed to CA1 on this slots, so it's really looking to me like the reference to that on page 5-10 of the 11/45 maintenance manual is a misprint. Thanks to Marty for checking all this out! I'd be curious to hear from any other 11/45 owners out there regarding wiring of CU1 and CA1 on slots 26-28 in their systems.

PDP11GUI hooked up and running; ready to start downloading and running diagnostics

PDP-11/45: Serial console and backplane SPC slots

Sun 26 June 2016 by Fritz Mueller

Hit a snag on the way to getting PDP11GUI hooked up: while the M9301 console emulator was working fine with the VT52, I could not get serial communication to my laptop (MacBook Pro + Keyspan USA-19HS USB serial) to work as expected. Some detective work showed that the voltages from the EIA output drivers on the DL11 were way out of whack (+3V for mark, which should have been a negative voltage). Somehow the VT52 was able to still make sense out of this signaling, though the laptop was not.

Some investigation of power to the DL11, which was sitting in one of the backplane SPC slots (26-28), showed that there was no distribution of +15V to pin CU1 of these slots where the DL11 was expecting it. So that explained the bad driver output voltages. Moved the DL11 over to the DD11 expansion backplane which does have +15V to that pin, and serial to/from the laptop started working fine.

So this raises a bit of a question about the SPC slots on the 11/45 backplane. Was EIA console serial from these slots ever supported? The configurations listed in the 11/45 engineering prints call out only DL11-A, the 20mA current loop version, which doesn't have EIA drivers and thus doesn't need the +15V supply, so maybe not. Was +15V distribution perhaps added to these slots in subsequent revisions or via an ECO? I'd like to track down a wire list for this or later revision 11/45 backplanes, and/or a comprehensive list of KB11-A ECOs, but so far haven't seen traces of either anywhere out there.

One other curiosity of these SPC slots that came up while investigating this: the power distribution table in the 11/45 maintenance manual, EK-11045-MM-007, page 5-10, implies that +15V should be distributed to the SPC slots on CA1. This is suspicious to me (maybe a typo?) because all other SPC pinouts that I have seen use this pin and CB1 as NPR in/out. And in checking my backplane, there is no power distribution to those pins. But slots 27 and 28 (Unibus B) do have their CA1 pins bridged to one another, and their CB1 pins bridged to one another, with what look like factory installed wire wraps. This also seems unusual for NPR/NPG. So, some mysteries remain about these slots...

In other news, the clock oscillator on the VT52 has given out, so that's down now until I can find a replacement. They are out of production and aren't easy to track down, but I do have one lead to follow so far.

Also, I pulled the suspected failed subsidiary ALU control ROM, tested it in isolation, and verified that it had indeed failed. This card is just a spare for me, but I'd like to go ahead and repair it since the fault is isolated. With some help from the classiccmp mailing list (thanks guys!) I have a recommendation for some vintage PROM programmers to stalk on eBay, and some compatible parts, that would allow me to blow a replacement and make the repair.

PDP-11/45: Running the M9301 console emulator

Thu 23 June 2016 by Fritz Mueller

Replacement DRAMs showed up. Pulled and replaced the two faulty ones on the MS11. Pic below -- you can see the replacements are socketed, and are the TI parts instead of the original ITT. Full address space is working now! Now that bank 0 is repaired, trap vectors can conceivably work.

Jumpered and configured a DL11-E serial card for use as console, slotted in an M9301-YB bootstrap terminator, connected up the VT52, powered up, and off it goes straight to the console emulator! That means the basic instruction set tests in the boot ROM are passing as well, which is great news.

Next step will be to hook up PDP11GUI and load some more in-depth diagnostics, in order to shake out any remaining bugs with the CPU and memory system. Will slot in the FPU at that point for testing and debug as well.

MS11 with two bad 4116 RAMs pulled, socketed, and replaced VT52 attached, and running now in the M9301 bootstrap monitor!