Okay, digging into the CKBME0 traps diagnostic now in more detail. Here I've transcribed the source of the failing test from the older available diagnostic listing, then re-assembled it at the address matching the more modern binary. This makes it a little easier to follow along while debugging the newer binary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
013604 000230                          SPL     0               ;SET PRIORITY LEVEL 0
013606 012706  000500                  MOV     #STKPTR,%6      ;SET STACK PTR
013612 012737  013650  000064          MOV     #TTY7A,@#TPVEC  ;LOAD TTY INTERRUPT VECTOR
013620 012737  013644  000014          MOV     #TTY7B,@#BPTVEC ;LOAD 'T' BIT TRAP VECTOR
013626 005002                          CLR     %2              ;CLEAR INDICATOR
013630 052737  000100  177564          BIS     #100,@#TTCSR    ;ALLOW INTERRUPT INTERRUPT OCCURS AFTER
                                                               ;THIS INSTRUCTION & BEFORE NEXT
013636 000001                  WAIT1:  WAIT                    ;WAIT FOR AN INTERRUPT
013640 005202                          INC     %2              ;INCREMENT INDICATOR
013642 000000                          HLT                     ;ERROR! NO 'T' TRAP AFTER INTERRUPT
013644 000000                  TTY7B:  HLT                     ;ERROR! 'T' BIT TRAPPED OUT OF WAIT
013646 000424                          BR      TTY7EX          ;EXIT TEST
013650 012737  000040  177566  TTY7A:  MOV     #40,@#177566    ;TYPE SPACE CHAR
013656 012737  013674  000064          MOV     #TTY7C,@#TPVEC  ;REPOSITION TTY INT VECTOR
013664 012766  000020  000002          MOV     #20,2(6)        ;PUT 'T' BIT IN RETURN STATUS
013672 000006                          RTT                     ;RETURN TO WAIT WITH 'T' BIT SET
                                                               ;AND WAIT FOR TTY INTERRUPT WHEN NULL
                                                               ;CHARACTER IS TYPED
013674 012737  013716  000014  TTY7C:  MOV     #TTY7D,@#BPTVEC ;REPOINT 'T' BIT TRAP VECTOR AFTER
                                                               ;TTY HAS INTERRUPTED
013702 005037  177564                  CLR     @#TTCSR         ;DISABLE INTERRUPT ENABLE
013706 012737  000015  177566          MOV     #15,@#177566
013714 000006                          RTT                     ;RETURN TO INST FOLLOWING WAIT WITH 'T'
                                                               ;BIT SET
013716 000240                  TTY7D:  NOP
013720 012737  000016  000014  TTY7EX: MOV     #BPTVEC+2,@#BPTVEC;RESTORE VECTORS TO HALT AT
013726 012737  000066  000064          MOV     #66,@#TPVEC     ;VECTOR +2
013734 005302                          DEC     %2              ;CHECK INDICATOR
013736 001401                          BEQ     .+4
013740 000000                          HLT                     ;ERROR! DID NOT DO INC INST AFTER INTERRUPT

This test seems to be designed to return from an interrupt handler to a WAIT instruction, with the T bit set in the PSW and a serial xmit interrupt pending. It verifies that the WAIT still "waits" in this circumstance. It also verifies that a trace trap does occur after the immediately following INC instruction when the xmit interrupt subsequently terminates the WAIT.

One potential problem with this test concerns the apparent assumption that enabling the xmit interrupt will cause an immediate trap before the subsequent WAIT instruction. This is true if the serial transmitter is empty, but if the transmitter is ever full/busy when this code is entered this assumption may not hold. Not sure yet if this will ever be a problem for this test given the surrounding code.

In any case, as currently written this routine fails about 50% of the time on my 11/45. The failure mode is that the processor sits at the WAIT instruction, (address display 013640, PC+2). Intervention with the console halt switch (halt then back to enabl) breaks the WAIT microcode loop; console cont then takes us to the halt at 013740 (address display 013742, PC+2).

The fact that the routine is tailing out through the halt at 013740 without hitting the halts at 013642 or 013644 is interesting; this implies that the second serial xmit interrupt to TTY7C has executed. This is verified by examining the break trap from the console after the test hangs up on the WAIT -- in the failure case, it is already reset to point to TTY7D. So the failure mode seems to be that the return from the second xmit interrupt sometimes goes to the WAIT instead of the subsequent INC.

Here is the microcode flow around the WAIT instruction. The horizontal line across the top is the A fork:

Using the KM11, in the failure case I can see the T bit set and the microcode looping through states WAT.00, WAT.20, WAT.30, WAT.11, which seems expected. I have also verified that executing a WAIT without the T bit set loops through states WAT.00 and WAT.10.

Lastly, running on the RC maintenance clock at about half the usual clock frequency makes the failure case happen almost 100% of the time.

Next I'll be needing to learn more about the BRQ logic, and in particular the mechanism by which the second xmit interrupt nominally causes INTRF to be asserted. Understanding that should lead me to some things to check with the logic probe and analyzer...


Investigated some of the halted diagnostics a bit today. CKBGB0 (SPL instruction test) was halting at 001404. Looking at the sources, the diagnostic was waiting at this point for a transmit interrupt from the DL11 that didn't seem to be arriving. Some troubleshooting turned up that the vector address on my DL11 was jumpered incorrectly. Fixed this, and the diagnostic now passes.

CKBME0 (11/45 traps test) is a bit more complicated. The halt address of 005320 here indicates that the floating point coprocessor is detected but not trapping per design. Pulled the floating point cards for now; the diagnostic now runs through several passes successfully, but regularly hangs up at 013640. Hitting the halt switch when it is hung up displays 000342 in the address lights, then with a couple continues it will start up again, run a bunch more passes, but sooner or later hang up at 013640 again with the same behavior. This behavior is a little more difficult to decode because the diagnostic itself is more complicated, and also the binary available from classicmp is a later revision than the available source code so the addresses don't quite match up. So I'll need to spend a little more time reading the diagnostic sources and examining the disassembly in PDP11GUI to make sense of this one. And it looks like there will be some downstream work to debug the floating point unit as well; I haven't studied its design yet at all.


Okay, now that serial is straightened out, on to running diagnostic tests via PDP11GUI. PDP11GUI itself, as well as a thorough and useful database of DEC diagnostic programs, are available at http://www.retrocmp.com/tools.

Since my home computer is a MacBook, I had intended to run PDP11GUI under Wine. I ran into a problem with this where PDP11GUI under Wine could not generate its pre-processed machine description temp file, seemingly because of some incompatible behavior wrt. multiple backslashes in pathnames. Rather than fight with this too long, I just sprang for a Windows 10 license and installed a Windows 10 VM; it will come in handy for other Windows-only tools that have been tweaky to use under Wine as well.

With PDP11GUI up and running under Windows, results of the initial set of 11/45 CPU diagnostics were very encouraging! In summary:

DiagnosticBELDescriptionStatus
CKBAB0.BIC002562SXT instructionpass
CKBBB0.BIC003604SOB instructionpass
CKBCB0.BIC007410XOR instructionpass
CKBDC0.BIC007262MARK instructionpass
CKBEC0.BIC002212RTI/RTT instructionspass
CKBFD0.BIC002272stack limitpass
CKBGB0.BIC001446SPL instructionhalt 001404
CKBHB0.BIC00376211/45 registerspass
CKBIB0.BIC013746ASH instructionpass
CKBJA0.BIC014722ASHC instructionpass
CKBKA0.BIC014430MUL instructionpass
CKBLA0.BIC011574DIV instructionpass
CKBME0.BIC01600011/45 trapshalt 005320
CKBNC0.BIC004702PIRQ instructionpass
CKBOA0.BIC01364011/45 stateshalt 000610

Note that these tests are written to output an ASCII BEL to the console on each successful pass. The terminal built in to PDP11GUI doesn't sound when given a BEL, however, so it is convenient to patch the BEL literal in the programs to a visible character (e.g. 000052, ASCII '*') before running them. This may be done in the memory loader window in PDP11GUI after "Load" but before "Deposit all". The patch address I identified for each diagnostic is listed in the table above as well for convenience.

So, out of this initial set of tests, only three halts to investigate. I'll be posting more information here as I look further into these.

Regarding the backplane SPC issue discussed in the previous post: Marty from the vcfed.org forum did some investigation of his 11/45, and reports that he does have +15V (actually +12V on his system) distributed to pin CU1 on slots 26-28. It seems to be wired over from slot 15 on his system, but it is not clear whether this was a factory wire, an ECO, or a user mod.

Marty's 11/45 also has no power distributed to CA1 on this slots, so it's really looking to me like the reference to that on page 5-10 of the 11/45 maintenance manual is a misprint. Thanks to Marty for checking all this out! I'd be curious to hear from any other 11/45 owners out there regarding wiring of CU1 and CA1 on slots 26-28 in their systems.


Hit a snag on the way to getting PDP11GUI hooked up: while the M9301 console emulator was working fine with the VT52, I could not get serial communication to my laptop (MacBook Pro + Keyspan USA-19HS USB serial) to work as expected. Some detective work showed that the voltages from the EIA output drivers on the DL11 were way out of whack (+3V for mark, which should have been a negative voltage). Somehow the VT52 was able to still make sense out of this signaling, though the laptop was not.

Some investigation of power to the DL11, which was sitting in one of the backplane SPC slots (26-28), showed that there was no distribution of +15V to pin CU1 of these slots where the DL11 was expecting it. So that explained the bad driver output voltages. Moved the DL11 over to the DD11 expansion backplane which does have +15V to that pin, and serial to/from the laptop started working fine.

So this raises a bit of a question about the SPC slots on the 11/45 backplane. Was EIA console serial from these slots ever supported? The configurations listed in the 11/45 engineering prints call out only DL11-A, the 20mA current loop version, which doesn't have EIA drivers and thus doesn't need the +15V supply, so maybe not. Was +15V distribution perhaps added to these slots in subsequent revisions or via an ECO? I'd like to track down a wire list for this or later revision 11/45 backplanes, and/or a comprehensive list of KB11-A ECOs, but so far haven't seen traces of either anywhere out there.

One other curiosity of these SPC slots that came up while investigating this: the power distribution table in the 11/45 maintenance manual, EK-11045-MM-007, page 5-10, implies that +15V should be distributed to the SPC slots on CA1. This is suspicious to me (maybe a typo?) because all other SPC pinouts that I have seen use this pin and CB1 as NPR in/out. And in checking my backplane, there is no power distribution to those pins. But slots 27 and 28 (Unibus B) do have their CA1 pins bridged to one another, and their CB1 pins bridged to one another, with what look like factory installed wire wraps. This also seems unusual for NPR/NPG. So, some mysteries remain about these slots...

In other news, the clock oscillator on the VT52 has given out, so that's down now until I can find a replacement. They are out of production and aren't easy to track down, but I do have one lead to follow so far.

Also, I pulled the suspected failed subsidiary ALU control ROM, tested it in isolation, and verified that it had indeed failed. This card is just a spare for me, but I'd like to go ahead and repair it since the fault is isolated. With some help from the classiccmp mailing list (thanks guys!) I have a recommendation for some vintage PROM programmers to stalk on eBay, and some compatible parts, that would allow me to blow a replacement and make the repair.


Replacement DRAMs showed up. Pulled and replaced the two faulty ones on the MS11. Pic below -- you can see the replacements are socketed, and are the TI parts instead of the original ITT. Full address space is working now! Now that bank 0 is repaired, trap vectors can conceivably work.

Jumpered and configured a DL11-E serial card for use as console, slotted in an M9301-YB bootstrap terminator, connected up the VT52, powered up, and off it goes straight to the console emulator! That means the basic instruction set tests in the boot ROM are passing as well, which is great news.

Next step will be to hook up PDP11GUI and load some more in-depth diagnostics, in order to shake out any remaining bugs with the CPU and memory system. Will slot in the FPU at that point for testing and debug as well.


Tracked down the source of the inverted result after register-to-register move problem on GRA: outputs of the subsidiary ALU control ROM E74 on pins 6 and 7 are floating. Will need some closer inspection to determine if this is a board fault or a chip fault. In the meantime, I have a spare GRA which I had been reluctant to try because it has a "bad" sticker on it... Decided to give it a try anyway, and it seems to be working much better than the one I have been debugging.

Now have enough of the CPU debugged to toggle in and run a simple light chaser program:

1
2
3
4
5
6
7
8
000000  013700  177570  L0:     MOV     @#177570, R0    ;LOAD COUNT FROM SWITCH REGISTER
000004  005300          L1:     DEC     R0              ;COUNT DOWN
000006  001376                  BNE     L1              ;LOOP UNTIL ZERO
000010  006301                  ASL     R1              ;SHIFT DISPLAY VALUE
000012  001002                  BNE     L2              ;SKIP AHEAD IF NOT SHIFTED OUT
000014  012701  000001          MOV     #1,R1           ;ELSE RELOAD
000020  010137  177570  L2:     MOV     R1,@#177570     ;STORE TO DISPLAY REGISTER
000024  000765                  BR      L0              ;REPEAT FROM THE TOP

Some notes on the program and video above since I've received some questions:

  • The listing here is shown assembled at location 000000, but the program is relocatable and can be toggled in at any convenient address (000000, on top of the trap vectors, probably isn't the best choice!)

  • Data display should be on "DISPLAY REGISTER" to see the chase.

  • The front panel toggles are loaded into a counter to control the speed of the chase. Without some of the most significant bits set the chase may go too fast to see, especially on older 11's with incandescent indicators. All toggles off is a special case: this will be the slowest chase, since as written the counter wraps around when decremented before being checked for zero. The video has toggles 15 and 14 up.

  • If you look at the address lights in the video, you can see that I ran this program from address 100000. This was because at the time I had a fault in the first 16KW of memory on my MS11-L so I couldn't execute any code at lower addresses.


After addressing the -15V problem on the MS11, most of the bad behaviors seem to have cleared up except a stuck (on) bit 6 in the first 16K words of address space (000000-077776). Hooked up the new logic analyzer, and it has been very useful in troubleshooting the board -- can easily capture and inspect the timing of complete memory cycles. Definitely worth the investment!

Using the analyzer, I was able to verify the refresh and chip select logic on the board, then track down the stuck bit to what seems to be a single failed DRAM chip (E96 on the MS11-L engineering drawings). I'd like to test the entire card before ordering replacement parts, but need to set up address translation to get beyond the first two banks from the console.

Here is the address translation register setup that I used for testing. This was followed by a deposit of 000001 to KT11 SR0 (777572) to enable translation. KT11 SR3 was left all zeros to keep D space disabled. This setup allows console access to physical addresses in banks 1 through 7 by appropriate settings of virtual address bits 13 through 15. I wanted to reserve PAR7 to map I/O space, so I left out bank 0 since it was one of the two already tested.

Kernel I PAR
772340001000
772342002000
772344003000
772346004000
772350005000
772352006000
772354007000
772356007600
Kernel I PDR
772300077406
772302077406
772304077406
772306077406
772310077406
772312077406
772314077406
772316077406

This worked as expected according to panel PROG PHY and the logic analyzer, so the KT11 option which I had not previously tested is at least working for kernel I space. Tested each bank on the MS11 from the front panel using this setup, and uncovered that bank 4 bit 10 also has a stuck on condition. Since bank 1 is working now, I can use that as work space for the time being in order to continue the CPU debug while awaiting some replacement DRAM chips in the mail.

Pics here of the logic analyzer setup, and captured traces of a write and subsequent read to one of the misbehaving chips:


Made some progress on the inverted result after register-to-register move problem: with the help of the KM11, extender card, and a logic probe I was able to track down that signal ALUM L coming onto the DAP is not asserted when it should be for a MOV instruction. This means the ALU is performing in arithmetic instead of logic mode and thus the incorrect result.

I next moved the extender card over to GRA, where this signal originates from a subsidiary ROM, but unfortunately at that point the MS11-L memory behavior got even worse, putting and end to these experiments. So I'll have to tackle that first...

Moved the M792 ROM over to the expansion backplane where the MS11-L resides, and it works fine there. So it doesn't seem to be a bus wiring or jumper problem onto the expansion backplane. Checked the power input pins on the backplane behind the MS11-L. 5V was a little low there; trimmed this up. Probably need to clean or replace the Molex contacts on the power distribution board in the cabinet, as it seems a few mV are being shed there needlessly compared to the output of the same regulator on the main backplane, but things seem within stated tolerances for now.

The -15V input to the MS11-L was missing entirely. Removed the DD11 expansion backplane, and added jumpers between the battery backup supply inputs and the corresponding main supply inputs, per documentation. Now have -15 to the MS11-L, but still no joy.

Will need to go deeper into the MS11-L next time...


Received the boards and components for the KM11 replica; stuffed and soldered, and it appears to work! There are some photos below. I can easily single-step microcode, clock states, and bus cycles now, which should really help with the CPU debug.

Swapped DAP for a spare, and this has fixed the stuck PC issue. Memory issues remain, but by choosing a working memory range, I can start to toggle in and attempt to execute very simple programs.

The simplest possible program, unconditional branch to self, seems to execute correctly:

1
001000 000777         BR      .-0

A register to register data move test does not however:

1
2
001000 010203         MOV     R2,R3
001002 000776         BR      .-2

Control flow is as expected, but the value that ends up in R3 seems to be negated. Still, pretty good progress! Now that I can step machine states, the next step will be to put the DAP out on an extender card and start tracking down signals with a logic probe.

The HP1662A logic analyzer from eBay has also arrived; should come in handy in investigating the memory issue.


Received and installed the replacement lamps for the -15V regulators. Pic below shows what the power supply looks like with all the lamps functioning.

Verified backplane DC voltages and ripple currents again, and re-trimmed all the DC regulators. Verified AC LO and DC LO deasserted and free of glitches. Found some harness wiring mistakes to the DD11 expansion backplane; corrected these.

Tried some CPU board-swaps looking for a quick win, but broken console behavior didn't change significantly with different boards.

Investigated the timing generator board, and found that the crystal oscillator wasn't oscillating. Tracked this down to inductor L1 which looked as if it had been partially sheared away from the board at some point during installation/removal/storage. Repaired this. Success! Able to load addresses from the front console now. Switches are mirrored in the BR when halted in console.

Address bit 0 seems stuck. Swapped PDR from spare board back to the original. Can now examine and modify the light/switch register, and examine the contents of the M792 ROM.

Jumpered the DD11 expansion backplane back in, and slotted in the MS11-L memory. Limited success: can modify and examine memory for example near address 001000, but cannot modify low memory addresses. In some ranges, can only modify every other word. Also, PC seems stuck at 022000.

At this point, I could really use a KM11 maintenance board set. These are pretty hard to get a hold of, but a few folks on the web have built their own reproductions. I put in a PCB order to ExpressPCB with a KM11 layout by Tom Uban (described here), and also put parts on order to stuff it.

Also, figuring I'll need to be going deeper into the CPU debug, I found and bought an HP1662A logic analyzer on eBay, for about the same money as the KM11 PCB and parts!