PDP-11/45: First RT-11 Boot From Disk!

Thu 06 July 2017 by Fritz Mueller

Not much technical in this post, except the big ticket news: after all the recent repairs and generating a fresh RT-11 pack RT-11 BOOTS FROM DISK!

It's been a very long road getting here, but the machine finally has a working, full-featured, operating system, capable of supporting self-hosted program development. Spent a few enjoyable hours editing, assembling, linking, and debugging some small MACRO-11 hacks directly on the machine, using the native RT-11 tools. This is probably a 40 year high-water-mark of functionality for these particular bits of hardware. Very satisfying!

Next up will be to clean and inspect the dozen or so RK05 packs that I have on hand, and retrieve and archive whatever content I can. That should let me clear off a few packs to create some space to experiment with some additional operating systems, including 6th Edition Unix.

At long last: RT-11 boots from an RK05 pack!

PDP-11/45: M9301 Troubles

Tue 04 July 2017 by Fritz Mueller

After a lot of recent progress with the RK11, suffered a setback: after trying once again to image a fresh RT-11 pack, the machine began to behave erratically at boot. Sometimes the boot monitor would run fine, sometimes it would run for a while and then take a machine exception or vector off to bad address, and sometimes the machine would fail to boot at all, immediately taking exceptions of various sorts. Getting so close, but then lost a lot of ground; I guess that's sort of the way it is going to be with a machine that is this old...

So, back to the top without a working boot monitor. Microcode sequencing and seemed to be working fine -- exam/deposit/step working reliably from the front panel. Checked execution of the first few instructions of the boot ROM in detail, a branch and some single operand instructions, and they seemed to execute correctly.

Assuming some sort of new failure in the CPU, I prepared to instrument the Unibus with the HP logic analyzer; it seemed that way I could trigger on machine exceptions then look back over the captured address traces for a pattern. To make sense of those, I'd need a listing of the boot ROM in hand. Noel Chiappa has a dump for this and a partial disassembly on his site here, but is it the same as mine? Better check...

Started to run through Noel's listing and compare to the ROM contents on my machine via front panel exam. Sure enough, there seemed to be some words different in a few places. Maybe the CPU is fine, and the ROM card is failing? Swapped out the M9301 for a simpler M792 diode matrix boot ROM, and sure enough -- was able to boot straight away off my RKDP pack, and from there reliably beat up the CPU with diagnostics. So, great news: just a failing M9301!

Alright, so now I want to capture a dump of the M9301 so I can systematically compare with Noel's listing to see if there's a pattern in failed/flaky bits to help guide my repair. For this I need a memory dump utility that I can toggle in from the front panel. Came up with this:

001000  012705  START:  MOV     #177564,R5      ;CONSOLE XCSR
        177564
001004  012700  L0:     MOV     #177570,R0      ;SWITCH REGISTER
        177570
001010  000000          HALT
001012  011004          MOV     @R0,R4          ;READ ADDR FROM SWITCHES
001014  000000          HALT
001016  011003          MOV     @R0,R3          ;READ COUNT FROM SWITCHES
001020  012401  L1:     MOV     (R4)+,R1        ;GET NEXT WORD TO DUMP
001022  012702          MOV     #6,R2           ;SIX DIGITS TO PRINT
        000006
001026  005000          CLR     R0              ;R0 GETS MSB R1
001030  073027          ASHC    #1,R0
        000001
001034  062700  L2:     ADD     #60,R0          ;MAKE INTO ASCII DIGIT
        000060
001040  010065          MOV     R0,2(R5)        ;OUTPUT
        000002
001044  105715          TSTB    (R5)
001046  100376          BPL     .-2
001050  005000          CLR     R0              ;R0 GETS 3 MSB R1
001052  073027          ASHC    #3,R0
        000003
001056  077212          SOB     R2,L2           ;LOOP DIGITS
001060  012765          MOV     #15,2(R5)       ;OUTPUT '\R'
        000015
        000002
001066  105715          TSTB    (R5)
001070  100376          BPL     .-2
001072  012765          MOV     #12,2(R5)       ;OUTPUT '\N'
        000012
        000002
001100  105715          TSTB    (R5)
001102  100376          BPL     .-2
001104  077333          SOB     R3,L1           ;LOOP WORDS
001106  000736          BR      L0              ;START OVER

Executed this a few times and got slightly different results, then things settled into a pattern where the lowest nybble of every word was consistently zeroed but everything else was fine. Smoking gun pointing to a single PROM on the M9301. Pulled and reseated that chip, and did the same for the other three while I was at it, and ... everything 100% after that. Wow, really should have just tried that first...

Well, at least I'm up and running again! The memory dump program might come in useful again at some other time, and as a byproduct after I fixed my M9301 I got 100% agreement with Noel's listing. So I think that listing can be considered authoritative now; good enough to generate replacement PROMs should anybody ever need to do so.

M9301-YB bootstrap ROM M792-YB diode-matrix bootstrap ROM

PDP-11/45: RK11 VII

Sun 25 June 2017 by Fritz Mueller

Okay, back from travel and picked up the thread on the RK11 interrupt problem this weekend. Put the KM11 in the first slot on the RK11 which allows to monitor interrupt request. An interrupt can be very easily generated from the front panel by writing bit 6 (Interrupt on Done Enable) on in the RKCS register at 777404. Did this, and noticed that interrupt request logic on the RK11 went active, but never cleared.

Checked bus request and grant continuity all the way through to the CPU backplane and back and that looked fine (the RK11 in its default configuration uses BR5).

Chased bus request with a logic probe all the way to the CPU backplane, and it was being asserted correctly. Looking at BG5, however, I noticed it was always asserted, even if BR5 was inactive. Disconnected all peripherals and terminated the Unibus directly on the CPU backplane, and this was still the case. So there was problem with BG5 in the CPU itself.

Threw the CPU UBC card out on extenders, and took a look at BG5 logic with a logic probe. The 8881 driver for this signal (E42 on sheet UBCD of the engineering drawings) had failed -- pins 8 and 9 were high, but pin 10, BG5, was not being driven low. Pulled this chip, put in a socket and a replacement, and the BG was then working properly. That's three repairs total to this poor old UBC card so far! Was able to then verify from the front panel that the processor fielded interrupt 220 in response to setting the IDE bit in RKCS. Progress!

Back to the MAINDEC ZRKK -- interrupt test now passes, and the diagnostic continues. BUT... error output now on test octal 57, which is working out the "hardware pole" feature of the controller. Thought this might be due to the RK11 being configured for two RK05 drives, but I have restored/connected only the first. Rejumpered the G740 flipchip for a single drive, but this didn't seem to help. Hmmm, will need to read the diagnostic source to see what it is trying to do...

G740 Disk Selection flipchip, jumpered for two drives

PDP-11/45: RK11 VI - "Pole" and Interrupt Issues

Thu 08 June 2017 by Fritz Mueller

Replaced the failed 7401 in the RKDA to RKDB data path, and verified that the RKDB stuck bit 11 problem was fixed. Ran the system for another couple hours to rewrite a fresh RT-11 pack, then another problem developed: read operations would successfully obtain the bus, but would never complete.

Some investigation with a KM11 on the RK11 controller showed that POLE was now incorrectly asserted continuously (did they mean "poll"?). Tracked this down to a failed 7420 (E1 on the M141 at A26, sheet RK11-C-12). Replaced, and reads were working again. I guess with a machine this age there is going to be a lot of this sort of thing where marginal parts give up after a few hours of use. Hopefully after some prolonged operation things will settle down a little bit...

In any case, still unable to boot any of my RT-11 packs. I Decided to step back and run MAINDEC diagnostic ZRKK. This is the RK11 dynamic diagnostic that destructively modifies a pack, and I had not previously run it -- I had just optimistically jumped to trying to boot existing packs after the drive restore and calibration. The diagnostic ran successfully for a while, through various format and read and write tests over the whole pack (encouraging!), then hung up consistently on test 35 (octal), which tests whether the RK11 interrupts the processor correctly when IDE is set.

So it could be the case that my RT-11 boots hang up when they first start trying to make use of interrupts. Verified that I can run MAINDEC ZKWA, which tests interrupts from the KW11-L line clock, so interrupt fielding in the CPU looks good.

Re-ran MAINDEC ZRKK, and noted that the CPU is waiting in micro-state BRK.00 (154) on page FLOWS 12 of the engineering drawings. This state has a wait for Unibus INTR to be asserted, so it looks like a problem with interrupt signaling on the RK11 side.

I have a bit of travel coming up, so I probably won't get back to this for a week or two. But the next sensible step I think will be to work with a logic analyzer on the backplane of the RK11, slot A6 where the M7820 interrupt control flipchip goes, and see what is/isn't happening with the interrupt signaling on that end. I suspect there will probably be a failed IC on the M7820.

PDP-11/45: RK11 V - Checksum issues

Sun 21 May 2017 by Fritz Mueller

Decided to use the disk utilities provided in PDP11GUI to write a fresh RT-11 pack. PDP11GUI successfully assembled and downloaded a driver, then took a little under 2 hours to download the pack image over the console serial line and write the pack without indicating any errors.

The resulting pack fared no better at boot than any of my existing legacy packs, however. I attempted to verify the pack via PDP11GUI, and noticed that the controller was indicating checksum errors for every sector on cylinders 64-127 and 192-202 (that is, whenever bit 11 of the RKDA was set). Tried this on several other packs, including the RKDP pack which had previously booted, and found that these errors were returned for these cylinders on all packs, so definitely a controller or drive issue, and not specific to any particular pack.

Verified that bit 11 of the RKDA could be read and written normally, and that when so addressed the RK05 drive would mechanically seek to the correct cylinders. Programmed some format-mode reads (these just return sector headers) manually via the front panel, and verified that the sector headers were being returned correctly from disk for the affected cylinders.

Tried a few flipchip swaps to see if the affected bit would move: B18 and B19 (adder; sheet RK11-C-14; no change), A21 and B21 (RKDB; sheet RK11-C-10; no change).

Programmed some all-mode reads (these return preamble, header, data, and postamble) manually via the front panel. These showed that during reads of affected cylinders, bit 11 was stuck always on. When reading unaffected cylinders, bit 11 turned on and off normally. So it seemed bit 11 was "leaking" from RKDA to RKDB.

Seeing this, tried a few more swaps: A15 and B15 (internal bus, sheet RK11-C-20, no change), A23 and B23 (RKDB data path; sheet RK11-C-21; bingo!). Stuck bit went away on this last swap. So it looks like a failed 7401, E2 on the M149 in slot A23. Pulled, socketed, and put some 7401 on order at Jameco where I can pick them up tomorrow on my way in to work. Getting closer!

PDP-11/45: CPU debug VI - RK11 NPRs and first disk boot!

Sun 14 May 2017 by Fritz Mueller

Took a little of time to sort through BC11 cables to find a good one for drive interfacing, but in the end I found one that worked okay and got the seek tester code from the previous post working reliably. At this point I mounted one of the recently cleaned RK05 packs, but found that the M9301 bootstrap would hang the bus on the first read operation.

A little scoping around on the RK11-C showed that it was asserting NPR to the processor, but never receiving NPG. A quick check of NPG continuity on the backplane showed a missing jumper on one slot of my DD11-D. Wrapped this on, verified NPG continuity all the way out to the RK11, but still no joy. Turns out the CPU is not asserting NPG at all.

Threw the CPU UBC card out on extenders and had a go at chasing though the NPG logic with a logic probe. Turns out to be a failed 8881 bus driver for NPG at the end of the line (E55 on KB11-A drawing UBCD). Pulled, socketed, replaced.

After this, the CPU was asserting NPG, but the signaling still looked a little squirrelly. Turns out that there are jumpers (W1-W5) on the M9301 bootstrap terminator that need to be installed to provide grant pull-ups when they are not otherwise provided internally by the processor (and the 11/45 is one such case). After installing the jumpers, NPG signaling looked solid.

Tried mounting a booting a few packs. Packs marked as having RT-11 would run for a short while and then hang. But an RKDP pack successfully booted! Wow, that feels pretty good after about two years of working seriously on this restoration. :-)

Going to stop here on a high note, and pick up trying to get a good RT-11 boot next time.

First boot off disk -- RKDP diagnostics pack!

PDP-11/45: RK05 II - Head Load and Servo Calibration

Sun 09 April 2017 by Fritz Mueller

Okay, disassembled and cleaned a few RK05 cartidges, following advice from the vcfed forum and cctalk mailing list (cleanroom gloves and wipes, 99% anyhydrous isoprop). Was surprised to find foam inside the hub on the disks (see pic below) but folks on vcfed advise that it is high-density polyeurethane and not subject to decay to the same extent as the other DEC foams, so I left it be.

Mounted one of the cleaned packs, and let it spin in the drive for a few hours with head load disabled in order to get a good flush on the air filtration system, let the various bearings on the drive loosen up, and make sure the replacement head retract batteries got a good charge. Drive ran quiet and balanced.

After that, took a deep breath and let the heads load -- no crash! Proceeded to work through the dynamic off-line calibration procedure for the head positioning servo system. This involves jumpering the control electronics on the drive to strobe simulated cylinder addresses from the sector counter. That provides a convenient source of oscillating seeks that can be used to calibrate the servos. Video here shows head load, a four cylinder oscillating seek, and a scope trace of the resulting sine position output of the electro-optical carriage position sensor:

Surprisingly, after about thirty years of non-operation, all of the servo calibration was within specified error bars, so no adjustments were necessary! At this point I decided to go for broke, cabled the drive to the RK11-C controller and attempted a boot. Some cncouraging front panel indicator activity, but soon halted with a seek error flagged in RKER. Not too surprising.

Okay, on to debugging the drive online with the controller, then. Worked up the following test code, inspired by something in one of the RK05 SPI workbooks. This reads two cylinder addresses from the high and low bytes of the front panel switches, and instructs the controller to instruct the drive to seek alternately between them:

        177570                          SW=177570
        177400                          RKDS=177400
        177404                          RKCS=177404
        177412                          RKDA=177412
000000                                  .ASECT
        001000                          .=1000
001000  012706  000700          START:  MOV     #700,SP         ;INIT STACK POINTER
001004  013700  177570          L0:     MOV     @#SW,R0         ;RETRIEVE SWITCHES
001010  000300                          SWAB    R0              ;LOWER SWITCHES TO UPPER
001012  004767  000012                  JSR     PC,SEEK         ;DO THE SEEK
001016  013700  177570                  MOV     @#SW,R0         ;RETRIEVE SWITCHES
001022  004767  000002                  JSR     PC,SEEK         ;DO THE SEEK
001026  000766                          BR      L0              ;START OVER
001030  042700  000377          SEEK:   BIC     #377,R0         ;MASK OFF LOWER BYTE
001034  072027  177775                  ASH     #-3,R0          ;SHIFT OVER TO CYL ADDRESS
001040  105737  177404          L1:     TSTB    @#RKCS          ;CHECK RKCS RDY BIT
001044  100375                          BPL     L1              ;LOOP IF BUSY
001046  032737  000100  177400  L2:     BIT     #100,@#RKDS     ;CHECK RKDS ARDY BIT
001054  001774                          BEQ     L2              ;LOOP IF BUSY
001056  010037  177412                  MOV     R0,@#RKDA       ;WRITE SEEK TARGET TO RKDA
001062  012737  000011  177404          MOV     #11,@#RKCS      ;WRITE SEEK CMD + GO TO RKCS
001070  000207                          RTS     PC              ;RETURN TO CALLER
        001000                          .END    START

At first this code was generating no seek activity on the drive. Decided to try swapping out the BC11-A drive cable, and that produced some limited success -- drive seeks, but some bits of the cylinder address are still apparently not making it across the cable.

The BC11-A cables are problematic. They seem flaky and fragile, and many of my spares seem bad. Any given cable may beep out fine on the bench, and yet fail consistently in use... It looks like what's up next is a voyage through my box of spares, swapping in cables looking for one that works reliably. Failing that, I'll need work on some sort of modern replacement, since original BC11-A in good shape are getting hard to find. It will be sad if at the end of this journey I can't boot the machine for mere lack of a good cable between the drives and controller!

Foam inside the hub of an RK05 pack -- could be bad news if this is decaying like other DEC foam... RK05 running offline from controller; pack spinning and heads loaded A box of BC11-A cables to be sorted through...

PDP-11/45: RK11 IV

Sat 01 April 2017 by Fritz Mueller

A quick note: rejumpered a spare M105 address decoder, and swapped it in for the one that was in the RK11. SSYN waveform is greatly improved (see before/after shots below), so looks like a bad driver on the original M105.

PDP-11/45: RK05

Sun 19 March 2017 by Fritz Mueller

Started going through the two RK05 drives. Lots of work to remove and clean up all the decaying foam. Replaced the emergency head retract NiCd battery packs on both units. They were both slightly leaky, but luckily neither had made a big mess.

Air filter elbows are intact on both units, and still slightly flexible, though they do have a white powdery coating where the material is degenerating. Cleaned up okay with some warm soapy water and a toothbrush. I'm sure these will continue to decompose/decay, however, and in the long term having loose particulates develop inside the elbow seems a certain recipe for a head crash. I may explore the possibility of 3D printing some sort of modern replacement for these.

Heads on the first unit look to be in decent shape, but some oxide buildup. In the second unit, the carriage was not parked, so the heads were in contact with each other. They don't look damaged from a preliminary inspection, but this head pair has considerably more oxide. Before and after cleaning shots of the upper and lower heads on the first unit below.

Pulled the H743 power supplies and reformed the larger electrolytics. After this, both power supplies powered up fine, though the -15V regulator on the first unit was trimmed very hot (-23 or so). Trimmed this down, put the supply back in the first unit and powered up. Under load, the -15V regulator drooped to -8, and a pico fuse on the +15V supply blew immediately.

Swapped in the -15V regulator from the second supply, which was not trimmed hot, and replaced the blown pico fuse. Now under load the supply held without drooping, and the +15V pico did not reblow. So looks like a bad -15V regulator. Put a few 723 regulator ICs on order in advance of debugging this.

After getting the power supply in the first unit up and going, blower powers on, power indicator lights, and after about three seconds the door safety relay clunks and load indicator lights. Write protect indicator toggles with panel switch presses per expectation. If the cartridge-on switch is depressed manually and the load toggle is hit, the spindle motor spins up and runs smoothly.

There are some significant abrasions on the lower panel of the first unit under both the spindle and spindle motor axles. It looks like a spacer button which is intended to hold off the lower panel has decayed; will need to improvise some sort of replacement.

RK05 drive internals H743 power supply pulled and on bench First RK05 lower head before cleaning First RK05 lower head after cleaning First RK05 upper head before cleaning First RK05 upder head after cleaning Abrasions on first RK05 lower cover, from contact with spindle motor shafts

PDP-11/45: RK11 III

Sun 26 February 2017 by Fritz Mueller

Okay, the M9202 bus jumper arrived, and like the 2-foot BC11 cable, the occasional timeouts go away when this is installed. Hantek digital scope also arrived, so I decided to throw it on the backplane for a closer look at the SSYN and timeout signals. The results were pretty interesting. Here's a capture of an RKCS access triggering a timeout glitch with the M902. The yellow trace is BUS A SSYN L (taken from C12J1 on the 11/45 backplane), and the blue trace is UBCB TIMEOUT (1) H (taken from D12U1):

What's interesting is that with the M9202 in place, the SSYN waveform shape on RCKS accesses is not significantly different -- and the timeout glitch still ocurrs from time to time, but at a reduced amplitude:

If the problem had been one solely of lumped loads on the bus, I would have expected the fix to manifest as a waveform difference, and for the glitches to have disappeared. These observations steered me back toward my original (less plausible?) supposition -- the the 74123 one-shot in the Unibus timeout logic in the CPU was flaky, and particularly sensitive for some reason to SSYN pulses of 568ns. Adding some extra bus length via a BC11 or the M9202 moves the timing by a nano or two off the troublesome period, and reduces the magnitude of the glitches.

So I went ahead and clipped out the suspect 74123, and put in a socket and a fresh part. Bingo! Timeout glitching was eliminated entirely. Here's a trace after the 74123 was replaced. This trace looks different because with the timeout glitch fixed, I could no longer use it to trigger the scope -- instead I had to trigger on the trailing edge of SSYN, so we see both RKCS and non-RKCS bus cycles. In any case, the timeout glitching is now gone:

So that's a nice result -- I think the new scope is going to be pretty useful. The rather extreme sawtooth on the falling edge of SSYN on RKCS accesses still looks pretty bad to me, even though it is no longer triggering timeouts. I might try swapping out the M105 address decoder on the RK11, which generates this signal, and see if the integrity here is improved. All for now!