FPGA Support

For the past year or so, I was relatively convinced that it wasn’t possible to put a fully working AGC into an FPGA. In fact, I even talked about it about in Simulation, and the Binary Clock Divider. The main reasons I thought it wouldn’t work were as follows:

  • The AGC is built on asynchronous logic. FPGAs very strongly prefer synchronous logic.
  • It’s not easy (or possible?) to implement everything with just NOR gates. FPGA tools get freaked out about combinational loops, and the AGC is, from the point of view of an FPGA, one big combinational loop. You might try to implement cross-coupled NOR gates as a register, but then what do you do about this thing?logisim_divider
  • Several circuits in the AGC rely on propagation delays for timing.  Remember this guy?prop_delaysDelays inside an FPGA are going to be much smaller than with discrete components, and can vary depending on how the fitter decides to lay things out.

A couple of weeks ago I realized that I had somehow managed to think about the problem at too high of a level and too low of a level at the same time. The real answer was to build a clock-driven NOR gate.

nor_clock
???

The basic idea is that this clocked NOR gate will sample its inputs on the falling edge of its clock, and apply the result to its output on the rising edge. The entire AGC receives the same clock, and so the entire state of the system is propagated lockstep. The sampling and output application are on opposite clock edges to prevent race conditions. All gates in the system change their outputs at the same time, and then a delay is given before sampling for the next cycle, to ensure all transitions have occurred and everything is settled. The propagation delays for the gates can even be tuned by setting the frequency of the clock!

The Verilog model for this is pretty easy:

module nor_2(y, a, b, rst, clk);
    parameter iv = 1'b0;
...
    input wire a, b, rst, clk;

`ifdef TARGET_FPGA
    output reg y = iv;
    reg next_val = iv;

    always @(posedge clk)
    begin
        y = next_val;
    end

    always @(negedge clk)
    begin
        next_val = ~(a|b);
    end
`else
...
`endif
endmodule

 

This model solves all of the above problems — the design becomes fully synchronous, we’re using clocked registers for everything, and propagation delays are accurately captured at the gate level! The only downside is that it takes a whopping 2 registers per NOR gate, which leads to a rather large final product that needs a kinda beefy FPGA to fit.


 

After verifying that the computer still worked with this new NOR gate model, I set about setting up a project in Quartus Prime for the FPGA dev board I have, the DE0-Nano.

I set up a PLL that generates both 51.2MHz and 2.048MHz clocks from the 50MHz crystal on the board. The 2.048MHz clock is fed directly into the timer module as the CLOCK input. The 51.2MHz clock serves as the system propagation clock. I chose 51.2MHz because it’s the multiple of 2.048MHz that gets closest to the propagation delays of the original gates. The original delays were 20ns, and a 51.2MHz clock gives our gates a delay of  19.53ns. Close enough! I want the system clock to be a multiple of the AGC clock so I don’t get any weird phasing issues.

With everything ready to go, I pulled in all of generated Verilog files and hit “Start Compilation” — and very quickly received an error message from synthesis:

Verilog HDL unsupported feature error: can’t synthesize pullup or pulldown primitive

Craaaaap. This makes total sense, of course, and I should have expected it. FPGAs don’t have internal tri-stating, open-drains, pullups, or anything of that sort. This type of thing only exists for the GPIOs. But as I’ve talked about before, the open-drain nature of the AGC gates is extremely important — both for fan-in expansion and cross-module buses.

It seems that the proper way to do this type of thing in an FPGA is to use the Verilog wand (wired-and) data type. A wand is just like a wire, but it can be assigned to multiple times, and its value is the logical AND of all expressions assigned to it. So in theory, all I had to do was tweak the open-drain buffer model to be a simple passthrough,  rather than switching to high-impedance mode when its input is high:

module od_buf(y, a);
    parameter delay = 2;
    input wire a;
    output wire y;

`ifdef TARGET_FPGA
    assign y = a;
`else
    assign #delay y = (a == 1'b1) ? 1'bZ : 1'b0;
`endif
endmodule

I also had to edit the verilog generator to leave codegen hints for the backplane generator. If any of a net’s connections are an open-drain pin, it leaves a comment that looks like this:

inout wire RL01_n; //FPGA#wand

The backplane generator looks for things like this, and changes the data type of the line from wire to wand when emitting code for the FPGA.

Problem solved, right?

Error (12014): Net “RL01_n”, which fans out to “RL01_n”, cannot be assigned more than one value

What?! That’s the whole point of wands…


 

After some experimentation, I found that Quartus only supports intramodule wands. As soon as a wand crosses a module boundary, all bets are off. So in order to get this to work, I was going to need to put everything into the same module — the same file.

I decided to draw the line at the component level. I really didn’t want to have the implementations for the components repeated everywhere, and besides, each component (74HC02, 74HC04, …) is sort of an atomic thing from the point of view of this simulation. So, I devised a way to get wands to cooperate across the component module boundaries. I call it “proxy wires”.

Here’s the idea: since the components can be considered atomic, we can have regular wires cross their module boundaries, rather than the main wands. We then assign each of these wires to the wand. So instead of driving the wand directly, the module drives wires which are then combined into the wand.

To accomplish this, the verilog generator needed more tweaks. When emitting a component declaration, it looks to see if any of the component’s pins are open-drain. If they are, it emits a codegen hint to the backplane generator that looks like this:

U74LVC07 U8007(__A08_NET_191, __A08_2___CO_IN, __A08_NET_185, RL01_n, __A08_NET_198, L01_n, GND, __A08_1___Z1_n, __A08_NET_221, RL01_n, __A08_NET_222, RL01_n, __A08_NET_220, VCC, SIM_RST, SIM_CLK); //FPGA#OD:2,4,6,8,10,12

This says that pins 2, 4, 6, 8, 10, and 12 are open-drain outputs, and should be handled specially.

I made the backplane generator combine all of the AGC modules into one big module (“fpga_agc”) when it’s generating code for an FPGA, rather than just a small module that ties modules together. If, when processing module contents, the backplane generator sees a comment like the one above, it splits up the declaration and determines which nets are affected. For each of these nets, it spits out some proxy wire declarations:

 wire RL01_n_U8007_10;
 wire RL01_n_U8007_12;
 wire RL01_n_U8007_4;
...

Each proxy wire takes the name of its original net, and is suffixed with the component number and pin number that it will be attached to. A series of assignments is then emitted to assign the proxy wires to their parent net:

 assign RL01_n = RL01_n_U8007_4;
 assign RL01_n = RL01_n_U8007_10;
 assign RL01_n = RL01_n_U8007_12;
...

And finally, the component declaration is reconstructed, with the original nets subbed out for their proxy wire counterparts:

U74LVC07 U8007(__A08_NET_191, __A08_2___CO_IN_U8007_2, __A08_NET_185, RL01_n_U8007_4, __A08_NET_198, L01_n_U8007_6, GND, __A08_1___Z1_n_U8007_8, __A08_NET_221, RL01_n_U8007_10, __A08_NET_222, RL01_n_U8007_12, __A08_NET_220, VCC, SIM_RST, SIM_CLK);

For the strong of heart, the final product can be found here: almost 5,000 lines of Verilog fun!


 

Once everything was in one big happy file, with proxy wires to protect the wands, synthesis went off more or less without a hitch!

fpga_first_life
IT LIVES

And voila! Timepulses 1 and 2 showing up on my oscilloscope! Further testing showed that the circuit I was most worried about, the edge detecting pulse generator, produced exactly the expected width (100ns) based on the simulation. And finally, here’s what the layout in the FPGA looks like:

fpga_agc
It’s beautiful!

As it happens, this was destined to not be my last battle trying to get things to work on the FPGA — but that’s a story for another post!

Start/Stop Logic

The start/stop logic, the central feature of computer startup, restart, and debugging, lives on page 2 of the Timer module.startstop

Let’s look at restart logic first. The five inputs on the top-left — SBY, ALGA, MSTRTP, STRT1, and STRT2 — are all possible causes for the computer to reset. Broadly speaking, these are either alarm conditions or developer-requested. I haven’t spent much time looking at the alarm logic yet, but rough definitions for them are as follows:

  • SBY – Standby — set when the computer enters standby mode (this never happened during any flights)
  • ALGA – Algorithm Alarm (?) — set when the running software causes a reboot-worthy exception
  • MSTRTP – Monitor-requested Start — set when the Monitor (the AGC’s electrical ground support equipment, or EGSE) injects a reset
  • STRT1 – Start 1 — Set if the digital Alarms module sees a voltage failure transient.
  • STRT2 – Start 2 — Set by the analog alarms module, which we don’t have schematics for. It probably indicates a more serious hardware failure, but is also likely used for powerup sequencing.

Any one of these events happening sets the initial flip-flop that I’ve marked “GOJAM in progress”. “GOJAM” is the name of the AGC’s hardware restart sequence (resets JAM a GO condition into the computer). This flip-flop will remain set until the GOJ1 signal becomes set. I’ll talk about that in just a bit.

The GOJAM-in-progress register directly controls the state of the downstream register “Reset Condition Detected”, but it first goes through a pair of gates in the column I’ve marked as “Register Gating”. Register gating is a very common pattern in the AGC, and are key to the operation of its registers and write bus. The inputs to a register that are so gated are prevented from affecting that register unless other conditions are also true. Here, both T12DC/ and EVNSET/ must be low at the same time as our input in order for the set to happen.

What does this mean in practice? Well, as discussed in The Timer Module, the combination of these two signals is equivalent to the timepulse 12 signal, T12. In other words, reset conditions are prevented from reaching this register until time 12 of the MCT (memory cycle time) during which they occur. There’s a couple possible reasons why T12DC/ nor EVNSET/ is used rather than T12 directly — it reduces the load on the T12 line, and (probably more importantly) means that there’s one less gate propagation delay between the start of T12 and the latching of a reset.

t12gating

Here you can see STRT1 being injected into the computer. The line I have highlighted, labeled b[0], represents the output of the “GOJAM in progress” register. It goes low immediately upon detection of STRT1, but the reset condition detected register, whose output is STOPA, doesn’t respond until the beginning of T12.

Once STOPA has been set, both main outputs of this logic are asserted: GOJAM and STOP. STOP is used on page 1 of the Timer module, to inhibit ODDSET/:

timer_clockdiv2

ODDSET/ is what makes the timepulse generator progress from T12 to T01, so as long as STOP is asserted we’re stuck in timepulse 12.

GOJAM is the main signal that prepares the computer for execution. It resets lots of conditions and registers throughout the computer, buts primary goal is to prepare the GOJ1 sequence for execution. GOJ1 is a simple unprogrammable instruction sequence that prepares the central registers for a jump to the program entry point. The ‘1’ in its name means that it’s a “stage 1” instruction — i.e., the stage 1 register needs to be set for it to run. GOJ1 is somewhat unique in that most other sequences begin during stage 0. GOJ1 starts at stage 1 because it shares the same opcode as the transfer control instruction TC. TC only has a single stage, so stage 1 of TC would otherwise be meaningless.

In order to set up GOJ1 for execution, the sequence register SQ must be cleared, the extend bit must be cleared, the stage 1 flip flop must be set, and all other stage registers must be unset.

SQ register and extend bit handling is done on page 1 of the SQ Register and Decoder module.

sq_gojam

I’ve highlighted the path through which GOJAM clears the extend bit SQEXT and the SQ register. It’s fairly circuitous, but it resets SQEXT somewhat directly, and, during the clear time of T12, sets the CSQG (clear SQ, gated) line, which takes care of setting all of the rest of the registers in the module (SQ, the quarter-code registers, and the instruction bit 10 register) to 0.

stage_gojam

STG1 and STG2 are respectively set and cleared directly by GOJAM. STG3 is cleared indirectly via the STRTFC signal, which is set as a result of GOJAM back in the SQ register module (it’s one of the lines that I highlighted).

The results of all of these changes lead into the instruction decoder half of the SQ Register and Decoder module.

goj1

This simple bit of logic says that if SQ is 0, we’re in stage 1, and SQEXT is not set, then the current instruction sequence is GOJ1.

Once GOJ1 has been selected as the sequence to execute, the “GOJAM in progress” register is cleared, GOJAM and STOP are deasserted, and the computer progresses out of T12 into T01 to start executing GOJ1.

As I mentioned earlier, GOJ1 is a very simple sequence. Here is its control pulse definition, taken from Hugh Blair-Smith’s AGC4 Memo #9:

goj1_pulses

Only two of the twelve timepulses generate any control pulses, and the control pulses in T02 are not actually used here. They simply don’t hurt anything, and letting GOJ1 share crosspoints with other sequences saved gates.

T08 is where the magic happens. The control pulse RSTRT puts octal 04000, the software entry point, onto the central write bus. WS and WB write this value into the S and B registers, respectively. Putting the address into the S register kicks off the core rope memory cycle to read the instruction stored there. The B register holds the next instruction to be executed, so when T12 arrives and GOJ1 has been completed, the 04000 will be loaded as the next instruction to execute.

This highlights a neat little design feature of the AGC instruction set — any literal address, when executed, jumps to itself. This happens because the opcode for TC is 000. So the whole point of GOJAM and GOJ1 is simply to set up a TC 4000 — a simple jump to the restart vector!

There’s a couple other things worth talking about in the main start/stop logic in the timer module. I’ll reproduce the image of it here:

startstop

First of note is the STRT2 input above the two registers in the middle. STRT2, being an alarm from the analog module, is apparently considered very high severity. Whereas all other alarms and reset conditions allow the current MCT to finish, STRT2 asserts GOJAM immediately, which causes the MCT to halt in its tracks and immediately start processing T12, regardless of which timepulse it was on.

The bottom half of this logic, which controls the Monitor Stop Requested logic, is never used in flight. It provides a means for the Monitor equipment to halt execution of the computer without inducing a reset, via the MSTP signal. When MSTP is asserted, the current MCT will complete up until its T12. STOP will then be asserted, preventing the computer from processing any more instructions until MSTP is deasserted.

Project Update and Build System Overview

It’s been a few months since I’ve posted, but I’ve been making steady progress. I’ve mostly been heads down capturing schematics and testing/debugging. I’m up to 14 modules integrated (1-12 and 14-15; module 13 is Alarms and it would probably be quite unhappy that half of the modules are still gone!). While this is technically only halfway done; it represents a pretty major milestone: all of the central logic is completed! All remaining modules now are I/O focused. This means that in theory the computer should be capable of executing most programs, so long as they don’t rely on any I/O or interrupts.

In addition to the above, I’ve got a few other exciting developments that I think deserve their own posts. There’s a couple of things I want to cover before I get there, so stay tuned!

For now, I want to give a brief overview of the build system I’ve arrived at. It’s far from perfect, but it accomplishes its goals reasonably well.

Each module in the AGC receives its own KiCAD project. The top-level schematic of each module looks roughly the same:schematic_layout

Each page of the module gets its own dedicated hierarchical sheet. The contents of these sheets match pretty closely with the original sheets; for the most part they only differ if I’ve opted to move related logic to the same place (logic for a single function was often split up among multiple modules in order to not exceed chip or backplane pin limits for any given module). Pin I/O for these sheets is also very close to the originals. Intramodule nets that run between sheets are drawn explicitly, while intermodule nets are given global names.

The top-level sheet also holds the backplane connector. I’ve opted to keep the backplane as clean as possible; new modules are initially introduced into the system with only input pins on their connectors. I then go through existing modules and assign any nets that drive inputs on the new module an output pin on the containing module(s)’s backplane connector(s). Similarly, inputs on older modules that have remained undriven that are produced by the new module are added to the new module’s connector. This minimizes both the number of pins used on each module’s connector as well as the number of nets running across the backplane, which is going to be populated enough as it is with just the nets that are actually used.

The hierarchical sheet contents also all look mostly the same:

sheet_overview

Components are numbered with the scheme ((Module number x 1000) + part number). Each part can also be tagged with code generation fields; the most useful and by far the most common of these is the initial condition flag.

The component selection has surprisingly remained unchanged since I started out, and is as follows:

  • 74HC04 — 1-input NOR gates (NOT gates)
  • 74HC02 — 2-input NOR gates
  • 74HC27 — 3-input NOR gates
  • 74HC4002 — 4-input NOR gates
  • 74LVC07 — Open drain buffers

And that’s pretty much it! Aside from passives and whatnot. The open-drain buffers allow me to create NOR gates with a higher fan-in than 4, as well as buses that are driven by multiple modules. While they’re not technically NOR gates, they’re not inaccurate; as I mentioned in Interpreting the Schematics, the NOR gates used by MIT were effectively all open-collector. There aren’t a whole lot of open-drain NOR gate chips out there, so it is much easier to just use regular NOR gates and slap open-drain buffers on when necessary.

Conversion to Verilog for simulation starts in the KiCAD netlist exporter window. KiCAD allows you to add new tabs for custom exporters, and so I’ve done just that:

codegen_dialog

When “Generate” is clicked with the AGC Verilog tab selected, KiCAD exports a generic XML netlist, and then invokes the netlist command (in this case, the Verilog generator). Upon completion, a little popup indicates success.codegen_complete

The generated Verilog looks a bit like this:

// Module declaration (the module I/O is everything on the backplane
// connector, plus VCC, GND, and the simulation SIM_RST (sim reset)
module parity_s_register(VCC, GND, SIM_RST, GOJAM, PHS4_n, T02_n, T07_n, T12A, TPARG_n, TSUDO_n, FUTEXT, CGG, CSG, WEDOPG_n, WSG_n, G01, G02, G03, G04, G05, G06, G07, G08, G09, G10, G11, G12, G13, G14, G15, G16, WL01_n, WL02_n, WL03_n, WL04_n, WL05_n, WL06_n, WL07_n, WL08_n, WL09_n, WL10_n, WL11_n, WL12_n, WL13_n, WL14_n, RAD, SAP, SCAD, OCTAD2, n8XP5, MONPAR, XB0_n, XB1_n, XB2_n, XB3_n, CYL_n, CYR_n, EDOP_n, GINH, SR_n, EXTPLS, INHPLS, RELPLS, G01ED, G02ED, G03ED, G04ED, G05ED, G06ED, G07ED, GEQZRO_n, RADRG, RADRZ, S11, S12);

// I/O wire declarations
 input wire SIM_RST;
 input wire CGG;
 input wire CSG;
 output wire CYL_n;
 output wire CYR_n;
 output wire EDOP_n;
 output wire EXTPLS;

// Internal wire declarations
 wire NET_106;
 wire NET_107;
 wire NET_108;
 wire NET_109;
 wire NET_110;
 wire NET_111;
 wire NET_112;

// Component list
 pullup R12001(__A12_1__GNZRO);
 pullup R12002(RELPLS);
 pullup R12003(INHPLS);
 pullup R12004(__A12_1__PALE);
 U74HC04 U12001(G01, __A12_1__G01A_n, G02, __A12_1__G02_n, G03, __A12_1__G03_n, GND, __A12_1__PA03_n, __A12_1__PA03, NET_196, G04, NET_190, G05, VCC, SIM_RST);
 U74HC27 U12002(G01, G02, G01, __A12_1__G02_n, __A12_1__G03_n, NET_182, GND, NET_181, __A12_1__G01A_n, G02, __A12_1__G03_n, NET_177, G03, VCC, SIM_RST);
 U74HC27 U12003(__A12_1__G01A_n, __A12_1__G02_n, G04, G05, G06, NET_183, GND, NET_186, G04, NET_190, NET_195, NET_180, G03, VCC, SIM_RST);
 U74HC4002 U12004(__A12_1__PA03, NET_177, NET_182, NET_181, NET_180, NET_179, GND, NET_178, NET_183, NET_186, NET_185, NET_184, __A12_1__PA06, VCC, SIM_RST);
 U74HC04 U12005(G06, NET_195, NET_183, NET_133, __A12_1__PA06, __A12_1__PA06_n, GND, NET_189, G07, NET_127, G08, NET_174, G09, VCC, SIM_RST);

Above this is a top-level module called agc.v, that instantiates each of these modules and forms a sort of virtual backplane, connecting all of the modules together via intramodule nets.

Initially the backplane file was written by hand, but somewhere around the fourth module, I realized that keeping track of what signals went where was untenable, and wrote a second python script to do it automatically for me: the backplane generator. This script reads in the verilog sources for all modules in a directory, and processes all of their I/O nets. Nets that are inputs for one module and outputs of another are marked as ‘internal’, while nets that have no source module or no destination module are considered ‘external’. The required I/O for the backplane is thus determined, and a new module implementing the determined backplane, and instantiating each of the modules, is generated.

With these two scripts, the entirety of the AGC is automatically generated. The only things that need manual editing are test scripts that instantiate it, as I/O changes — but it is very easy to keep track of I/O the backplane needs, since it’s automatically determined and presented in a nice, sorted list. The list is also waaaay shorter than the list of all nets in the system.

The last piece of the build system that I haven’t talked about is my Makefile. KiCAD unfortunately doesn’t have any command-line arguments, and not only would they be hard to add for netlist generation, but the KiCAD developers have rejected patches to do exactly that in the past. Because generating code for over a dozen modules is extremely tedious, I used xdotool to script it all. Xdotool is kind of like AutoHotkey for X Windows; the Makefile steps through a list of modules for which to generate code, opens KiCAD, and uses keyboard shortcuts to navigate to and begin code generation.

codegen_makefile

It’s super hangly, but it works. Put it all together, and an executable Icarus Verilog simulation binary can be produced from scratch simply by typing make.

Next time I’ll go back to looking at hardware design, with an overview of the start/stop logic!

The Timer Module

It’s finally time for the first in-depth analysis of a module! Logic module A2, the Timer, is responsible for almost all of the timing and synchronization in the computer. It’s broken down into three pages. Here’s the unmodified scans of the originals:

a02-1a02-2a02-3

 

 

 

 

As I mentioned in the Timing Overview, R-700 has a pretty good overview of what the Timer and Scaler modules do:

r700_fig3-3

And for reference, here’s what the clock divider at the heart of the Pulse Forming Divider Logic does:

logisim_divider

I’ll use the phase names from this example for this discussion.

The 2.048MHz clock generated by the computer’s oscillator (the CLOCK signal) enters this module and runs directly into a clock divider.

timer_clockdiv1

Above I’ve circled in red almost everything covered by the Pulse Forming Divider Logic block in fig. 3-3. Immediately of note is the fact that only two of the four phase control pulses are actually created here: PHS2 at the bottom, and PHS4 at the top. PHS3 is actually created in Module A24, INOUT VII.

timer_phs3

Since PHS3/ is the inverse of CT, we can treat CT and PHS3 as effectively the same thing.

That’s great and all, but PHS1 is absolutely nowhere to be found. Mapping the phase letters to phase control pulse names, we get:

PHS2 = C (blue wire)

PHS3 = B (olive wire)

PHS4 = D (magenta wire, with some extra sauce)

(Recall that the ordering of phases is A->C->B->D, so the apparent mismatch of PHS2 and PHS3 makes sense.)

The PHS4 logic is a bit more complicated than PHS2 or PHS3; rather than taking the D phase directly, it first inverts it and then NORs it with the A phase (green and magenta). The resulting signal is very similar to D, except in that it can’t overlap with A.

timer_phase_control_pulses

The simulator confirms that this brings down PHS4 a bit earlier than it otherwise would have been. The additional propagation delays incurred by this logic also keeps PHS3 (CT) and PHS4 from overlapping, which may or may not be important.

Anyways, we can pretty safely expect that PHS1 is the A phase of the clock divider (highlighted in magenta above). However, other than inhibiting the tail end of PHS4, the only thing the A phase does is run through an inverter to create RT. This creates a signal with 75% duty cycle… whose only inactive period is during PHS1.

The Clear, Write, and Read control pulses CT (Clear Time), WT (Write Time), and RT (Read Time) produced by this clock divider are the very same as those talked about in the section of R-700 I reproduced in the post on registers. Happily, it explicitly mentions what these signals should look like:

The CLEAR 2 pulse, that always occurs during the first half of WRITE 2, would have forced the flip-flop to the 0 state…. Thus the simultaneous occurrence of READ 1, WRITE 2, and the short CLEAR 2 pulses transfers the content of REG 1 to REG 2.

Sure enough, this is exactly what the simulator shows, which is pretty cool!

timer_rwc_pulses

It does leave open the question of what happens during PHS1, however. timer_phs1_gap

At this point all I can say for certain is that PHS1 is the only time during which register transactions are not happening.

Down below all of this is a very interesting little bit of logic.

ovfstb_pulser

I talked about this briefly in my post about open-sourcing everything, since it was the primary motivator for my addition of propagation delays to Logisim. Basically, this is a rising-edge-detector that emits a pulse on the rising edge of its input, which in this case is CT/. In other words, this circuit creates a short pulse as soon as Clear Time is over. Here’s what that looks like in Logisim:

prop_delays

The two rightmost NOT gates are used to control the duration of the pulse. Here, because there’s 5 gates in the loop driving OVFSTB/, the width of the pulse is 5 gate propagation delays. In the original AGC, the average gate delay was 20ns, so OVFSTB/ was low for somewhere around 100ns. There aren’t that many places that I’ve found in the computer so far where the gate propagation delay truly matters for the logic (thankfully), but this is one of the big ones. It’s somewhat likely that I’ll have to add in another couple of NOTs to widen the pulse in my replica.

Anyways, the underflow and overflow checks of the write bus happen while OVFSTB/ is low. This logic is pretty straightforward, and lives at the top of page 3 of the Timer module:

timer_overflow_checks

The Virtual AGC Assembly Language Manual section on the A register has a good little discussion about how this check works. Basically, it’s the main ‘modified’ feature of the Modified One’s Complement number system used in the AGC. Bits 16 and 15 are both sign bits, but during an overflow or underflow, only bit 15 will change. Bit 16 then is used as the sign for the result, meaning that the answer for any addition will at the very least have the correct sign. Furthermore, overflow and underflow can easily be detected by checking to see when bits 16 and 15 are not the same, which is exactly what is happening above.

Next up in the Timer is another clock divider circuit.

timer_clockdiv2

It’s driven by the inverse of phase D from the first divider, which is interesting in that it’s got a 75% duty cycle. Up until now I’ve only talked about what happens with 50% duty cycle square waves with these clock dividers; it hadn’t occurred to me to try anything else until I started simulating this module.

timer_cdiv_duty_cycle

Phases A and B are both long, while phases C and D are both short. And interestingly, FS and FS/ still have 50% duty cycle, although that feature isn’t used here.

In this clock divider, the shorter phases C and D are used to create two pairs of signals, RINGA/ and RINGB/, and ODDSET/ and EVNSET/. Even though each one of the four signals is 512kHz, the pairs taken together constitute the two 1.024MHz lines drawn from the Pulse Forming Divider Logic and the Ring Counter in figure 3-3. The only difference between the two pairs is that ODDSET/ is inhibited when the computer is in STOP mode, while RINGA/ and RINGB/ both continue ticking away happily.

RINGA/ and RINGB/ are used to drive the Ring Counter, which occupies the top half of the second page of the Timer.

timer_ring_counter

The operation of this circuit is pretty neat. Here’s what it looks like in Logisim:

timer_ring_counter

It pulls off the division by 10 by first setting each of 5 flip flops, and then starting over from the first, resetting each. The outputs of the ring counter are used for various lower rate timing things, but I haven’t dug into the details much yet.

In the timer module, the outputs are of the ring counter are combined in Strobe Pulse Generator gates to produce strobe signals. According to R-700, these are used “for phase and pulse length control of various interface timing pulses, roughly 3 μsec duration, at different positions within the basic 9.76 μsec interval of the 102.4kHz signal”.

timer_strobe_pulse_generator

Instead of having SB0, SB1, SB2, and SB3 as indicated by figure 3-3, we’ve instead got SB0, SB1, SB2, and SB4, plus an additional signal called EDSET. Here’s the simulator output for the ring counter and these signals:

timer_ring_counter_signals

Oddly, EDSET always stays low. Looking at only the signals that contribute to it, it appears to be impossible to reach during nominal operation.

timer_edset

It’s not even anywhere close enough to blame the difference on my shorter modeled propagation delays; there is always at least one input solidly high. My best guess is that it’s there to somewhat streamline initialization of the ring counter. During powerup, each of the five flip-flops in the ring counter will randomly hold a 1 or 0. They’ll eventually get synchronized as in the above GIF, but module A5, Cross Point Generator NQI, folds EDSET back into P03:

timer_edset_p03

Presumably, this catches an undesirable startup condition in the ring counter and forces P03 low, although I haven’t spent much time analyzing this.

The final bit of timing circuitry in the Timer module is the Time Pulse Generator. It takes up the entirety of the third page, and is pretty complex. I attempted to untangle it a little bit in my KiCad copy, which make it very slightly cleaner:

timer_time_pulse_generator

It’s implemented as a 12-bit shift register, and is driven forward by ODDSET/ and EVNSET/. Each of the twelve signals T01..T12 denotes one of the twelve time pulses that make up a Memory Cycle Time (MCT). Instructions are encoded as a series of sequences, and each sequence the computer can execute takes up one MCT. So in a sense, these pulses control which phase of microinstructions the computer is on out of each larger instruction.

Here’s what the signals look like:

timer_time_pulses

Note that even stages are NORed with EVNSET/ and odd stages are NORed with ODDSET, which results in non-overlapping time pulses. This is desirable to prevent sequence steps from interfering with one another.

Interestingly, the start of each time pulse is coincident with PHS1, with PHS4 filling the gaps between time pulses.

timer_time_pulse_phs1

Of course, this means that the time spent writing to registers each time pulse is quite short compared to the read and clear times:

timer_time_pulse_rcw

The write signal lasts throughout the PHS4 downtime, though. Presumably there’s enough time for everything to be safely written, but it’s surprisingly tight.

The other signals created by the time pulse generator are the T01DC/..T12DC/ signals. These are the overlapping versions of the time pulses (T11DC/ doesn’t exist):

timer_time_pulse_dc

And that’s about it for the timer module! The only things I didn’t cover are the start/stop logic, which will get its own post, and the first stage of the Scaler, which I’ll talk about with the Scaler module.

Open Source!

Things have finally settled down in the tool hacking land and I’ve moved on to spending most of my time drawing up schematics. And all of the work I’ve been doing is now up on GitHub! Here’s links to all of the repositories, plus descriptions of what they are:

My Fork of Logisim Evolution

Logisim is a great tool for visualizing logic circuits, but has has two shortcomings that prevent it from being useful in its unmodified form: it doesn’t allow you to set initial conditions, and it doesn’t simulate gate propagation delay. I had addressed the former before my blog post on simulation, but propagation delays do matter in the AGC, which is why I said it had some features that prevented it from being more useful at the time.

The particular circuit that inspired to me to ahead and hack propagation delays into Logisim is this thing from page 1 of the Timer module:

ovfstb_pulser

My Verilog simulations showed that produces a pulse whose width is exactly 5 propagation delays on the rising edge of its input (pin B of 37148 above). Logisim disagreed:

no_prop_delays
I don’t think that’s right…

Now that I’ve got propagation delays added in as a configurable property of logic gates, the Logisim simulation looks like this:

prop_delays
Much better!

My Fork of KiCAD

Unlike Logisim, my changes to KiCAD are fairly minimal; they simply flesh out the XML format for the generic netlist generator a bit. KiCAD allows you to set user-defined attributes per part. I’m using this feature to feed part parameters like initial conditions into the Verilog generator. However for multipart components, only one (seemingly random) part’s attributes actually get written to the XML file. My fork addresses this and writes out attributes for all parts of multipart components.

It’s worth noting that unlike my Logisim stuff, the official version of KiCAD will work totally fine for opening and editing the schematics and boards; my changes are only needed for proper functioning of the Verilog generator. I also might be able to get these changes pushed upstream.

AGC Hardware KiCAD Projects

This repository contains the real meat of this project: all of the KiCAD projects, schematics, board layouts, and components for the physical AGC. It’s a bit sparse at the moment, since I’ve been spending a lot of time figuring out how to group modules and how to best use KiCAD’s heirarchical sheets, but it should grow quickly.

AGC Simulation

This repository holds all of the simulation harnesses and tests for the Verilog models of the stuff in the AGC Hardware repository, plus any supporting tooling I develop. The Verilog generator that I spend so much time talking about is in here.

It also houses the autogenerated Verilog, at least for now. I’m normally not one for committing generated products to repositories, but the Verilog generation process is quite manual at the moment and I’m not aware of an easy way to script it. I’ll probably end up addressing that issue later when I’ve got multiple interfacing modules to deal with.

A Brief Tour through the Schematics Part III: The Registers and ALU

The registers in the AGC are all constructed using classic cross-coupled NOR gates. They’re divided into two classes, the “central” registers A, L, Q, and Z, and the “special” registers, which includes all of the rest of them. The central registers, as well as the banking registers, are also mapped over the lower area of Erasable memory; i.e., memory location 0 is not actually part of the Erasable memory core RAM, but rather is redirected to the A register. These registers are shown in R-700 in Table 2-II:

r700_table2-2

Outside of these, there’s a handful of non-addressable special registers:

Non-Addressable Special Registers

REGISTER PURPOSE
B Contains the next instruction to be executed
G Memory buffer register; contains data read from or data to be written to erasable or fixed memory
X One of two ALU input registers
Y One of two ALU input registers
SQ Upper 6 bits of current instruction; used for instruction decoding
S Lower 12 bits of current instruction, used as address argument for opcode

There’s also a couple of “fake” registers that get thrown around a lot, but don’t exist as physical flip-flops:

Fake Registers

REGISTER PURPOSE
C The complement of the B register
U The output of the ALU

Before I start talking about the real schematics, I want to mention a small section from R-700 that’s particularly helpful here. It’s so good, in fact, that I’m going to just reproduce the text and accompanying figure verbatim:

An illustrative example of the NOR logic in the computer is provided by the operation of the flip-flop registers in the central processor. Digits are transferred from one register to another through a common set of lines called the write buses. The central register flip-flops are selected by read and write pulses applied to gates that either set or interrogate the flip-flop of the corresponding register. Figure 3-2 shows a hypothetical set of three flip-flops similar to those in one bit column of the computer’s central register section.

Information transfers between the registers are controlled by three clocked action pulses: read, write, and clear. Thus the WRITE BUS is normally in the 1 state, and changes to 0, while transferring a 1. Suppose REG 1 contains a 1, i.e., the top gate of its flip-flop has an output of 0. At the time that the READ 1 signal goes to 0 from its normal 1 state, the output of the read gate, CONTENT 1, becomes a 1. This propagates through a read bus fan-in and an inverter and fan-out amplifier to make WRITE BUS become 0. Suppose that WRITE 2 is made 0 concurrently with READ 1. Then the coincidence of 0’s at the write gate of REG 2 generates a 1 at the SET 2 input, thus setting the bit to 1.

r700_fig3-2

If REG 1 had contained a 0, the write bus would have remained at 1, and no setting input would have appeared at the upper gate of REG 2. The CLEAR 2 pulse, that always occurs during the first half of WRITE 2, would have forced the flip-flop to the 0 state, where it would remain; whereas when a 1 is transferred, the SET 2 signal persists after the CLEAR 2, and thus forces the register back to the 1 state. Thus the simultaneous occurrence of READ 1, WRITE 2, and the short CLEAR 2 pulses transfers the content of REG 1 to REG 2. Only the content of REG 2 may be altered in the process. REG 1 and REG 3 retain their original contents. An instance of gates being used to increase fan-in is shown where several CONTENT signals are mixed together to form the signal READ BUS. An increase in fan-out is achieved by the two gates connected in parallel to form the signal WRITE BUS.

This should provide some good context for upcoming discussions. Plus, we’ve already seen the generation of the Read and Write pulses for each register in the Instructions portion of this tour (the Clear pulses are a bit weird, I’ll get to them later). As a quick refresher, many of the control pulses coming out of the Cross Point Generator modules have names like “WA” (write A register) or “RB” (read B register).

Without further ado, here’s what the real deal looks like:

a8-1-top

This is the first page of the first of four 4 Bit Modules. All four of the 4 Bit Modules are very nearly identical, and consist of two very nearly identical pages. Furthermore, as you can kind of see above, each page is split into two almost identical “one bit columns”.

These one bit columns consist of flip-flops for the X, Y, A, L, Q, Z, B, and G registers, along with the necessary circuitry to read and write each. The X and Y registers at the top feed into one bit of the ALU, which is, quite simply, just an adder. I’ve also marked the one bit column’s “read bus” and “write bus” that correspond to R-700’s example above.

On closer inspection, though, the signals controlling these registers aren’t quite what we might be expecting based on the output of the Cross Point Generators.

a_reg_signals

Instead of WA/ (write A not), CA (clear A), and RA/ (read A not), we’ve got WAG/, CAG, and RAG/. This is because there’s one more stop for the control pulses before they reach the central registers: the Service Gates module.

a7-1-top

This module combines the base time pulses from the Timer module with the control pulses created by the Cross Point Generators. Zooming in a bit, here’s where the WAG/ and CAG signals get created:

a_reg_gated_signals

Also note that this is the birthplace of the register Clear signals. The Cross Point Generators don’t generate any sort of clear control pulses; instead, they’re created here as the intersection between a register Write control pulse and the CT/ timing control pulse.

At this point, the only registers we haven’t looked at are the banking registers EB and FB, and the instruction argument register S.

The banking registers are (for some reason) located in the Rupt Service Module, whose main purpose is to handle interrupts. I’ll go over the other functionalities of this module later, but for now, here’s EB and FB:

a15-1-top

The BB register is also only “real” in the sense that it’s accessible in the AGC’s address space; only the EB and FB registers exist as physical flip-flops. BB accesses simply access both of these registers simultaneously.

Lastly, we have the S register, which is in the Parity and S Register Module. Here is the second page of that module:

a12-2-top

It’s mostly the S register, plus a bit of logic about editing operations which I’ll talk about in the Memory part of the tour. But while we’re here, I’ll talk about what’s on the first page of this module too:

a12-1-top

This structure computes the parity (i.e., whether the number of 1’s is even or odd) of the contents of the G register. It’s a very basic used to verify that the contents of memory are self-consistent. If for some reason a 1 unexpectedly becomes a 0 or vice-versa in memory, this circuit will catch and flag the problem before the data leaves the G register.

That’s all for now! I’ll be getting into more detailed circuit analyses before continuing this schematic overview tour. All that’s left to go, though, is Memory, Interrupts, and I/O!

Simulation, and the Binary Clock Divider

Simulation is going to be extremely important for this project to be successful. The most obvious reason is that PCB errors are going to be expensive and painful to debug, but there’s a handful of reasons specific to the AGC:

  • The available schematics are incomplete. All of the Tray A modules are there, which is the really important bit. But we’re missing almost all of Tray B. That means all of the memory-interfacing circuitry needs to be recreated to work well with the core logic. (To be fair, I was probably going to do this anyways; building ferrite core memory is currently out of scope for this project).
  • The AGC uses chains of NOTs to create delays in some places. The below image, is an example from the Timer module, puts the CT signal through three inversion stages before driving the main line everything that uses the signal is connected to.

chain_delay

The propagation delay for AGC NOR gates was 20us nominal, and 30us worst case. Both 74HC and 74LS NOR gate components are roughly twice as fast as this, so the timing surrounding such circuits in my computer will be quite different, and very well may require modification.

  • The scans of the schematics aren’t perfect.
smudge
Signal TP…something…RG/
cutoff
And signal T7PH…something
  • There’s errors in the schematics. This is a list of errors compiled from a signal list, made by Jim Lawton. Most of the things in this list aren’t concerning. Signals with no sinks have no negative impact whatsoever, it just means a signal got put out on a pin and it never ended up needing to be used. Even most of the seemingly more egregious “signals with multiple sources” can safely be ignored; almost all of the entries here are actually instances of fan-out expansion gates or open-collector buses, being incorrectly marked as errors. The really concerning ones are the signals with no source. Those I’ll need to figure out.

My initial reaction to simulating a processor was to put it into an FPGA, but I learned quite quickly that this is a Bad Idea. The AGC is built mostly on asynchronous logic, which doesn’t translate well at all to the Look-Up Tables of FPGAs (and makes replication of timing quite difficult, even outside of an FPGA).

With all that said, an attempt to do exactly that has already been made by Dave Roberts. However, it doesn’t appear he ever got it fully working (though he got pretty close), and comments in the files he provides allude to issues with timing. Still, the fact that he got as far as he did is both very cool, and very promising!

Anyways, with the FPGA idea gone, I started testing out some digital simulation programs. I very quickly learned that doing any sort of digital simulation for the AGC is going to be an interesting challenge.

To demonstrate, I’m going to focus on a specific circuit, whose properties have been driving my selection of an approach to simulation.

Primary single-stage clock divider
Module A1, Primary Single-Stage Clock Divider

The entry point of the 2.048MHz oscillator into the Tray A logic modules is Module A2, the Timer. It immediately runs into a rather interesting structure of 6 NOR gates (ignore the one in the upper right). This circuit accomplishes something pretty remarkable. And as it turns out, it’s a pain to simulate.

For the purposes of this discussion, I’ll focus on another instance of this circuit, also in the Timer module. It’s exactly the same, but all of the nodes have labels, which makes it easier to show what’s going on.

A Stand-Alone Clock Divider Circuit
A Stand-Alone Clock Divider Circuit

I was pretty confused by this thing when I first started looking at it. Normally you only see NOR gates cross coupled in pairs to build SR latches, but here we have four gates that are all cross-coupled, feeding another, more common cross-coupled pair (which then feed back into the group of four).

So, let’s throw it into a simulator to see what it does. My first attempt was to implement it in Logisim, which some people have used to implement fully working processors. Unfortunately, this didn’t really work out too well.

logisim_error

It thinks the entire circuit is an error! Circuitmaker 2000, which John Pultorak used to simulate his AGC, gives a bit more insight into why Logisim is so upset about this:

circuitmaker_flashing

The entire circuit flashing on and off nonstop! Circuitmaker 2000 initializes all nodes to 0 during the first time step of the simulation. Because everything here is a NOR gate, that means the next time step, all of them will put out a 1. However this makes all inputs a 1, so the next time step, all of their outputs will go to 0, and so on and so forth. Logisim fails for a similar reason: it tries to derive the initial state of the circuit through propagation, and quite simply can’t. Another interesting thing about this circuit: watch what happens as the input signal (P01/ in the sample circuit) changes. The two gates it feeds into are prevented from oscillating, but only while P01/ is high. In other words, there’s no input you can give this circuit to make it settle on a state, like you can with an SR latch (which will also oscillate forever in a simulator like this given bad inputs).

The core of the problem is that simulators like this implement ideal gates. In the real world, small variations in the construction of the components will cause the circuit to settle into a stable state quite quickly, preventing oscillations like this from occurring for too long. Analog simulators like SPICE, which also model ideal components, typically let you set initial conditions on the circuit to force the circuit into a known state at the beginning of a simulation (otherwise, they’re subject to the same oscillations). But unfortunately, it seems that this is not a common thing for digital engines, or at least the free ones I was able to get my hands on. Even Verilog doesn’t let you set initial values on wires.

So what to do? It’s possible to force the clock divider into a known state by increasing the number of inputs on the NOR gates and adding additional circuitry:

circuitmaker_reset

This approach doesn’t sit well with me, though, because it means the circuit being simulated isn’t the same as the one being built. Moreover, I’d have to sketch out two separate versions of the schematic, one for PCB layout and one for simulation, and that’s just asking for errors.

After doing a lot of research, I’ve decided to take a three-pronged approach to simulation. First, I’ll be using LTSpice for analog simulation, in small-to-medium-sized doses. That’ll be important both for signal integrity, and to be sure the propagation delays I put into the digital simulation are at least somewhat accurate.

Second, I patched Logisim to accept initial conditions for gates.

logisim_initial_output
Hooray! No errors!

Logisim will be primarily used for simple circuits like this, just to figure out what’s going on. It’s got a few features that prevent it from being more useful beyond that without further modification, but for simple stuff, it’s great.

The bulk of the work will be done in KiCAD, an open-source PCB design program that’s roughly comparable to Eagle. KiCAD doesn’t have any simulation built in, aside from very basic SPICE integration that can call out to LTSpice. Instead, I’ll be using Icarus Verilog as the primary simulation engine. I’ve been working on a Verilog exporter to generate Verilog from KiCAD-exported netlists. This approach has a lot of advantages: there’ll be one source of truth for both PCB layout and simulation (what is simulated is what will be built), and I can insert the additional circuitry shown above automatically into the generated Verilog without having any of it in the schematic or PCB. Verilog is also more or less an industry standard for digital design and simulation. It’s really fast compared to the graphical tools, and I’ll be able to do a lot with it. (Incidentally, it’s the simulation-only features of Verilog that allow me to simulate the AGC at all; the generated Verilog won’t be synthesizable, so unfortunately it still can’t be put into real hardware).

KiCAD’s generic netlist exporter also requires some patches to fully support everything we need.

kicad_example

The modified netlist exporter makes sure the 1’s circled on U1C and U2B in the above image make it into the netlist file. I’ve placed these as markers for gates that have their “ResetValue” attribute set to 1. This attribute tells the Verilog generator what to use as the reset value for each gate.

I’ll go into more detail on the Verilog generator in a later post. For now, let’s wrap this one up by using the patched Logisim and Icarus Verilog to analyze the clock divider circuit.

logisim_divider
Logisim running the clock divider

As expected, FS01 is a square wave with half the frequency of P01/. But more interesting are F01A, F01B, F01C, and F01D — they constitute a four-phase clock of the same frequency as FS01. And as described in R-393, “signals FxA, FxB, FxC, FxD represent pulses 90° (electrical) out of phase. A leads C (not B) by 90°; C leads B by 90°; B leads D by 90°”. This is actually a really neat circuit — only 6 gates, and it produces 6 useful clocking signals. No wonder it’s used so much throughout the AGC!

Finally, here’s a waveform output from the same circuit, simulated in Icarus Verilog:

iverilog_divider
GTKWave displaying Icarus Verilog’s results

Exactly as expected! Unlike Logisim, my Verilog models are taking the propagation delays of the individual NOR gates into account. You can see the results of this as overlapping signals — most conspicuously in the above image, F01B overlapping with F01C.

That’s all for simulation for now! The Verilog generator should be done shortly, after which the analysis can really take off. Until then!

A Brief Tour through the Schematics Part II: Instructions

Instruction processing in the AGC is quite complicated. It had a 15-bit word length, with 3 bits used for the opcode and 12 bits used for the address. This gives us 23 = 8 instructions, and allows us to address 212 = 4,096 words of memory, right?

Nope. In reality, the computer supported 34 instructions, and had 2,048 words of erasable memory and 36,864 words of fixed memory. The memory addressing was handled by a somewhat complex banking scheme, which I’ll cover in a later post. But what about the instructions?

R-700 boils it all down into one somewhat confusing diagram in figure 3-6:

r700_fig3-6

Bits 13, 14, and 15 make up the 3 base opcode bits. An extra internal bit could be set by executing the EXTEND instruction. The extend bit acts essentially as just a fourth opcode bit — when it’s set, an entirely different set of instructions is selected from. These are called the “extracode instructions”. At the end of every instruction (except INDEX), the extend bit is reset back to zero. (As a result of this behavior, interrupts are inhibited while the extend bit is set).

EXTEND itself is actually not one of the 8 base instructions either; it’s in a different class known as “implied address instructions”. The premise behind implied address codes is that not all addresses make sense with all opcodes. The first 7 addresses in erasable memory aren’t actually used for RAM; the AGC maps internal registers to these addresses. I’ll go into more details in the next installment, but for now I’ll just say that not all of them are 15 bits long. The computer normally has no issue executing instructions directly out of registers in the memory map (to the instruction loader, they appear to just be in memory, after all), but this really only works for registers that are the same size as an instruction word. EXTEND, therefore, is actually assembled as “TC 00006”, transfer control to memory location 6, which contains one of these stubby registers. Knowing that this request can’t be right, the computer instead executes EXTEND.

There are other instruction/memory combinations that don’t make sense. Recall that the AGC has two types of memory: erasable (which can be modified) and fixed (which can’t). Instructions that modify memory, therefore, don’t make any sense when supplied with fixed memory addresses. Look at how the address for an instruction is encoded in figure 3-6: erasable memory is accessed only when bits 11 and 12 of the word are 0. Again, we can use this to our advantage: if bits 13-15 (and the extend bit) indicate an erasable instruction, we know bits 11 and 12 won’t be in use for the address, so we can use them as more instruction bits. In this situation, these bits are called the “quarter code”, because they select from 4 possible instructions for each erasable “general” opcode.

We can take this concept even further when dealing with I/O instructions. I/O is done through “channels” that don’t exist in the main address space (again, more on this later). There’s only 63 of these, which leaves plenty more room in the instruction word for encoding the instruction. In practice, only bit 10 of the instruction was used for this. Figure 3-6 calls this a “channel” instruction.

Now, onto decoding.

While an instruction is executing, the instruction to be executed next is loaded into the internal B register. Upon instruction completion, the new instruction is transferred out of the B register and split into two separate registers — the S register, which receives bits 1 through 12, and the SQ register, which receives bits 10 through 15, which is enough to decode the instruction using all of the above rules.

With all of that background, here’s a top-level look at Module A3, the SQ Register and Decoder.

a3-1-top

Page 1 contains the SQ register itself, the extend bit, and first-stage decoding logic that maps the three base opcode bits into eight signals representing the eight possible combinations, and similar logic for the two quarter code bits. It also has a bit of interrupt inhibition logic that interacts with the extend bit stuff.

a3-2-top

Page 2 contains the next step of instruction decoding. This is where instruction selection actually happens. Here the eight opcode signals, four quarter code signals, extend bit, and SQ bit 10 are combined to create many more signals, essentially one for each instruction. Output signals from here are named after the instructions themselves.

The next step in the instruction processing chain is Module A4, the Stage Branch Decoding Module. The first page contains the stage and branch registers themselves:

a4-1-top

The stage registers STG1, STG2, and STG3 are used to select which phase of an instruction is being executed for multi-MCT instructions. The branch registers are used to handle conditions for branching instructions.

Page 2 of the Stage Branch Decoding module appears to combine the outputs of these registers with the instruction selection logic of the SQ Register and Decoder module in a manner similar to the cross point generators.

a4-2-top

When all is said and done, we’ve now selected the subsequence of control pulses to be executed during the current MCT.

The last step is to combine our instruction selection with the time pulse signals T01,T02,…,T12 from the Timer module to generate the actual control pulses. These are basic microinstructions that mostly interact with the core registers — things like “WA” (Write the value on the write bus to the A register) or “NISQ” (load a New Instruction into the SQ register). This work is done in Module A5, Cross Point Generator NQI, and Module A6, Cross Point Generator II. These modules implement a “control pulse matrix”, which outputs the logical product of the time pulses and subsequence selectors.

R-393 Figure 2-3 contains a pretty good diagram of what all of this looks like in a block diagram:

r393_figure2-3

Note that this is for the Block I AGC, so some details are not quite the same as what I discussed above, but it’s pretty close.

That’s it for now. Next up in the tour are the central registers!

A Brief Tour through the Schematics, Part I: Timing

Before I start digging into the guts of the logic, I’m going to give a top-level overview of the architecture, paired with where things are in the schematics as I currently understand them. In this first part, I’ll go over timing.

Figure 3-3 from R-700 contains the basic clock architecture of the Block II AGC:

r700_fig3-3

The source for all timing in the AGC was a 2.048MHz crystal oscillator, contained in aptly named Module B7 – Clock Oscillator. The output of this module, the CLOCK signal, enters into the “Pulse Forming Divider Logic”, which makes up the majority of Module A2, page 1:

a2-1-top

This logic puts out four phases of 1.024MHz, used as the main clock for the AGC, as well as the fundamental control pulses RT (read time), CT (clear time), and WT (write time).

a2-2-top

Page 2 of of the Timer Module contains the ring counter shown in Figure 3-3, plus start/stop logic gating the time pulse generator.

a2-3-top

Page 3 contains a 12-step ring counter, generating the main time pules signals of the computer, T01 through T12. One iteration through the 12 time pulses is called a “Memory Cycle Time”, or MCT. During an MCT, a single Fixed or Erasable memory access could be performed. The breakdown of memory timing throughout an MCT is shown in R-393, figure 3-1:

mct_memory_timing

Furthermore, instructions are defined as subsequences of control pulses that execute each time pulse. For example, here is the time pulse sequence of the TC (Transfer Control) instruction, from AGC4 Memo #9:tc_pulse_sequence

It triggers control pulses during time steps 1, 2, 3, 6, and 8. Unlike TC, most instructions take two or three MCTs to fully execute.

The output of the 5-step ring counter (or rather, its subsequent divide-by-2 circuit) flows into Module A1, the Scaler Module.

a1-1-top

This is page 1, but page 2 looks pretty much identical. It generates all of the power-of-two sub-frequencies of its 102.4kHz input (51.2kHz, 25.6kHz, etc…) down to 0.390625Hz. These are used in the I/O interfaces, and during operation in standby mode. Some of them drive counters in the INOUT modules that operate as timers for the software to use.

Last in our timing discussion is Module A14, Memory Timing and Addressing.

a14-1-top

Page 1 of the Memory Timing and Addressing module contains the “timing” portion. I’m guessing it effectively implements the timing considerations shown in R-393 figure 3-1, but I haven’t spent much time looking at this one yet.

Next up: instruction processing!

Interpreting the Schematics

On first look, the schematics for the AGC are pretty hard to follow. Aside from some pullup and pulldown resistors in the restart monitor, the available Tray B modules like the oscillator and power supply, and the Tray A interface modules, they consist essentially entirely of NOR gates.

There’s a few peculiarities with the NOR gates used. Two types of NOR gates appear on the schematics. The first type is just a standard NOR gate:

nor_gate

The second type is almost exactly the same as the first; it’s called a “fan in gate”, and the only difference between it and the standard nor gate is that the the power pin on chips used for fan in gates is left disconnected. It is represented as a NOR gate with a black nose:

fan_in_gate

Fan in gates are used in three different ways throughout the schematics. The first, and by far the most common, is the function for which they are named: to increase the fan-in (i.e., the number of inputs) of a NOR gate. This is done simply by connecting the output of a fan in gate to the output of a regular NOR gate. Here’s a simple demonstrative example from the RUPT Service module:

fan_in_example

Here, a six-input NOR gate is constructed using a NOR gate and a fan in gate. The principles behind this behavior are quite straightforward, given the design of the integrated circuits.

nor_internals

As one of the first integrated circuits ever produced, the Block II NOR gate is quite simple internally, composed of only 8 resistors and 6 bipolar junction transistors per IC (this type of circuit is known as RTL, Resistor-Transistor Logic). The above image shows the layout of one of the two NOR gates contained within each chip. Here, pin 10 connects to the power supply; pin 5 connects to ground; pins 6, 7, and 8 are the inputs of the NOR gate; and pin 9 is the output. This configuration is known as open collector: the upper (“collector”) pins of the three transistors are tied together and connected directly to the output pin of the IC. And in this, case, the resistor connecting the power pin to the rest of the circuit can be considered an internal pull-up resistor, so the external pull-up normally seen with open collector circuits isn’t (usually!) needed.

Given all of that, it’s easy to see how fan in gates work. Here’s an internal view of the above 6-input NOR example: fan_in_internals

As you can see, connecting a fan in gate to a NOR gate effectively just adds three more transistor-resistor input circuits to the NOR gate! The fan in gate’s power pin is left disconnected because otherwise the pull-up resistance would be halved. Neat, huh?

The second function performed by the fan in gates is common module outputs. It’s very closely related to the first; conceptually, it’s the same thing. The basic idea is that there’s nothing forcing a fan in gate to be close to its connected NOR gate. Indeed, there’s no particular reason why the two can’t be on different boards. This means that two completely separate modules can both drive the same signal. An example of this is the treatment of the read bus in the RUPT service module:

rupt_rl14

Here a fan in gate is being used to drive read bus line 14, with no connected NOR gate anywhere in the module. The main source of this signal is actually in four-bit module A11, here:

rl14_source

Note that RL14/ is facing right like an input here.

The third and final application of “fan in” gates isn’t actually for fan in expansion at all; it’s for interfacing with electrical ground support equipment (EGSE). Throughout the computer are gates like this one in the timer module:

monitor_gate

Unlike RL14/ above, MT01 isn’t actually used anywhere in the computer proper. Therefore, instead of receiving power from any connected NOR gate, this gate would normally remain completely unpowered. Instead, this signal and those like it are routed to the “test connector” on the front of the computer. You can see that on this great image of an opened up AGC from Autopilot on Wikipedia:

Opened Apollo Guidance Computer
By Autopilot (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
The main piece of EGSE that was hooked up to the test connector was “the Monitor”. The Monitor was more or less like a JTAG debugger for the AGC. It allowed for displaying the contents central registers, stepping through instructions, setting breakpoints, displaying or modifying arbitrary memory locations, and displaying the contents of the write bus.

Anyways, the majority of the unpowered signals that run out to the test connector begin with the letter “M”, like the MT01 example above — presumably short for “Monitor”.

E-1880: A Case History of the AGC Integrated Logic Circuits shows how these signals can be used in equipment like the monitor:

interface_circuit

The main feature here is the pull-up resistor to +V on the interface side. Note that this resistor is effectively no different than the one connected to pin 10 inside the chip — the only difference is who’s supplying the power. So these gates remain completely unpowered during flight, drawing almost no current — but as soon as you plug an EGSE device into the test port, these gates are provided power and start producing useful signals. This is an extremely useful characteristic, since power was so precious; it allowed for lots of debug circuitry to be included without incurring major power draw penalties.

Somewhat late in the program (after the flights of Apollo 11 and Apollo 12), a new special module was designed to help with diagnostics during flight. Information for debugging on orbit was limited; if any one of a number of alarms occurred, the computer would simply indicate “RESTART”, without any insight into which of the alarms had caused it. The new module, called the “Restart Monitor”, was a small module that was plugged into the test port for the duration of the flight. It allowed for computer software to read the specific alarm that had caused a restart from I/O channel 77 (more on that later). And luckily, we have the schematics for it! It just so happens that the MT01 signal shown above is one of the ones used by the Restart Monitor:

mt01_used

As expected from the E-1880 diagram, there’s a 3.3k pull-up resistor on the line before it’s used in any logic.

There’s one last peculiarity that’s common in the schematics. In addition to increasing the fan in of a NOR gate by attaching fan in gates, it’s also possible to increase the fan out of a gate by attaching parallel powered NOR gates. Any given NOR gate output can drive 5 connected inputs, and this number can be increased by 5 for each connected parallel gate. Here’s a fan out expansion for T02/ in the timer module:

fan_out

From this, we can safely assume that T02/ is driving more than 5 inputs. But we need to be careful. There’s also no reason fan out expansion gates can’t be in other modules; and with board real estate being limited, it appears that indeed sometimes these fan out expansion gates were moved to less populated modules. Module A-24, INOUT VII, also includes some fan out gates for T02/:

t02_inout

So in reality T02/ is probably driving more than 15 inputs.

Armed with all of this knowledge, we can now start doing more in-depth looks at the design of each of the logic modules. But first, I’m going to give a higher level functional breakdown of larger chunks of the schematics. Stay tuned!