FPGA Support

For the past year or so, I was relatively convinced that it wasn’t possible to put a fully working AGC into an FPGA. In fact, I even talked about it about in Simulation, and the Binary Clock Divider. The main reasons I thought it wouldn’t work were as follows:

  • The AGC is built on asynchronous logic. FPGAs very strongly prefer synchronous logic.
  • It’s not easy (or possible?) to implement everything with just NOR gates. FPGA tools get freaked out about combinational loops, and the AGC is, from the point of view of an FPGA, one big combinational loop. You might try to implement cross-coupled NOR gates as a register, but then what do you do about this thing?logisim_divider
  • Several circuits in the AGC rely on propagation delays for timing.  Remember this guy?prop_delaysDelays inside an FPGA are going to be much smaller than with discrete components, and can vary depending on how the fitter decides to lay things out.

A couple of weeks ago I realized that I had somehow managed to think about the problem at too high of a level and too low of a level at the same time. The real answer was to build a clock-driven NOR gate.

nor_clock
???

The basic idea is that this clocked NOR gate will sample its inputs on the falling edge of its clock, and apply the result to its output on the rising edge. The entire AGC receives the same clock, and so the entire state of the system is propagated lockstep. The sampling and output application are on opposite clock edges to prevent race conditions. All gates in the system change their outputs at the same time, and then a delay is given before sampling for the next cycle, to ensure all transitions have occurred and everything is settled. The propagation delays for the gates can even be tuned by setting the frequency of the clock!

The Verilog model for this is pretty easy:

module nor_2(y, a, b, rst, clk);
    parameter iv = 1'b0;
...
    input wire a, b, rst, clk;

`ifdef TARGET_FPGA
    output reg y = iv;
    reg next_val = iv;

    always @(posedge clk)
    begin
        y = next_val;
    end

    always @(negedge clk)
    begin
        next_val = ~(a|b);
    end
`else
...
`endif
endmodule

 

This model solves all of the above problems — the design becomes fully synchronous, we’re using clocked registers for everything, and propagation delays are accurately captured at the gate level! The only downside is that it takes a whopping 2 registers per NOR gate, which leads to a rather large final product that needs a kinda beefy FPGA to fit.


 

After verifying that the computer still worked with this new NOR gate model, I set about setting up a project in Quartus Prime for the FPGA dev board I have, the DE0-Nano.

I set up a PLL that generates both 51.2MHz and 2.048MHz clocks from the 50MHz crystal on the board. The 2.048MHz clock is fed directly into the timer module as the CLOCK input. The 51.2MHz clock serves as the system propagation clock. I chose 51.2MHz because it’s the multiple of 2.048MHz that gets closest to the propagation delays of the original gates. The original delays were 20ns, and a 51.2MHz clock gives our gates a delay of  19.53ns. Close enough! I want the system clock to be a multiple of the AGC clock so I don’t get any weird phasing issues.

With everything ready to go, I pulled in all of generated Verilog files and hit “Start Compilation” — and very quickly received an error message from synthesis:

Verilog HDL unsupported feature error: can’t synthesize pullup or pulldown primitive

Craaaaap. This makes total sense, of course, and I should have expected it. FPGAs don’t have internal tri-stating, open-drains, pullups, or anything of that sort. This type of thing only exists for the GPIOs. But as I’ve talked about before, the open-drain nature of the AGC gates is extremely important — both for fan-in expansion and cross-module buses.

It seems that the proper way to do this type of thing in an FPGA is to use the Verilog wand (wired-and) data type. A wand is just like a wire, but it can be assigned to multiple times, and its value is the logical AND of all expressions assigned to it. So in theory, all I had to do was tweak the open-drain buffer model to be a simple passthrough,  rather than switching to high-impedance mode when its input is high:

module od_buf(y, a);
    parameter delay = 2;
    input wire a;
    output wire y;

`ifdef TARGET_FPGA
    assign y = a;
`else
    assign #delay y = (a == 1'b1) ? 1'bZ : 1'b0;
`endif
endmodule

I also had to edit the verilog generator to leave codegen hints for the backplane generator. If any of a net’s connections are an open-drain pin, it leaves a comment that looks like this:

inout wire RL01_n; //FPGA#wand

The backplane generator looks for things like this, and changes the data type of the line from wire to wand when emitting code for the FPGA.

Problem solved, right?

Error (12014): Net “RL01_n”, which fans out to “RL01_n”, cannot be assigned more than one value

What?! That’s the whole point of wands…


 

After some experimentation, I found that Quartus only supports intramodule wands. As soon as a wand crosses a module boundary, all bets are off. So in order to get this to work, I was going to need to put everything into the same module — the same file.

I decided to draw the line at the component level. I really didn’t want to have the implementations for the components repeated everywhere, and besides, each component (74HC02, 74HC04, …) is sort of an atomic thing from the point of view of this simulation. So, I devised a way to get wands to cooperate across the component module boundaries. I call it “proxy wires”.

Here’s the idea: since the components can be considered atomic, we can have regular wires cross their module boundaries, rather than the main wands. We then assign each of these wires to the wand. So instead of driving the wand directly, the module drives wires which are then combined into the wand.

To accomplish this, the verilog generator needed more tweaks. When emitting a component declaration, it looks to see if any of the component’s pins are open-drain. If they are, it emits a codegen hint to the backplane generator that looks like this:

U74LVC07 U8007(__A08_NET_191, __A08_2___CO_IN, __A08_NET_185, RL01_n, __A08_NET_198, L01_n, GND, __A08_1___Z1_n, __A08_NET_221, RL01_n, __A08_NET_222, RL01_n, __A08_NET_220, VCC, SIM_RST, SIM_CLK); //FPGA#OD:2,4,6,8,10,12

This says that pins 2, 4, 6, 8, 10, and 12 are open-drain outputs, and should be handled specially.

I made the backplane generator combine all of the AGC modules into one big module (“fpga_agc”) when it’s generating code for an FPGA, rather than just a small module that ties modules together. If, when processing module contents, the backplane generator sees a comment like the one above, it splits up the declaration and determines which nets are affected. For each of these nets, it spits out some proxy wire declarations:

 wire RL01_n_U8007_10;
 wire RL01_n_U8007_12;
 wire RL01_n_U8007_4;
...

Each proxy wire takes the name of its original net, and is suffixed with the component number and pin number that it will be attached to. A series of assignments is then emitted to assign the proxy wires to their parent net:

 assign RL01_n = RL01_n_U8007_4;
 assign RL01_n = RL01_n_U8007_10;
 assign RL01_n = RL01_n_U8007_12;
...

And finally, the component declaration is reconstructed, with the original nets subbed out for their proxy wire counterparts:

U74LVC07 U8007(__A08_NET_191, __A08_2___CO_IN_U8007_2, __A08_NET_185, RL01_n_U8007_4, __A08_NET_198, L01_n_U8007_6, GND, __A08_1___Z1_n_U8007_8, __A08_NET_221, RL01_n_U8007_10, __A08_NET_222, RL01_n_U8007_12, __A08_NET_220, VCC, SIM_RST, SIM_CLK);

For the strong of heart, the final product can be found here: almost 5,000 lines of Verilog fun!


 

Once everything was in one big happy file, with proxy wires to protect the wands, synthesis went off more or less without a hitch!

fpga_first_life
IT LIVES

And voila! Timepulses 1 and 2 showing up on my oscilloscope! Further testing showed that the circuit I was most worried about, the edge detecting pulse generator, produced exactly the expected width (100ns) based on the simulation. And finally, here’s what the layout in the FPGA looks like:

fpga_agc
It’s beautiful!

As it happens, this was destined to not be my last battle trying to get things to work on the FPGA — but that’s a story for another post!

Start/Stop Logic

The start/stop logic, the central feature of computer startup, restart, and debugging, lives on page 2 of the Timer module.startstop

Let’s look at restart logic first. The five inputs on the top-left — SBY, ALGA, MSTRTP, STRT1, and STRT2 — are all possible causes for the computer to reset. Broadly speaking, these are either alarm conditions or developer-requested. I haven’t spent much time looking at the alarm logic yet, but rough definitions for them are as follows:

  • SBY – Standby — set when the computer enters standby mode (this never happened during any flights)
  • ALGA – Algorithm Alarm (?) — set when the running software causes a reboot-worthy exception
  • MSTRTP – Monitor-requested Start — set when the Monitor (the AGC’s electrical ground support equipment, or EGSE) injects a reset
  • STRT1 – Start 1 — Set if the digital Alarms module sees a voltage failure transient.
  • STRT2 – Start 2 — Set by the analog alarms module, which we don’t have schematics for. It probably indicates a more serious hardware failure, but is also likely used for powerup sequencing.

Any one of these events happening sets the initial flip-flop that I’ve marked “GOJAM in progress”. “GOJAM” is the name of the AGC’s hardware restart sequence (resets JAM a GO condition into the computer). This flip-flop will remain set until the GOJ1 signal becomes set. I’ll talk about that in just a bit.

The GOJAM-in-progress register directly controls the state of the downstream register “Reset Condition Detected”, but it first goes through a pair of gates in the column I’ve marked as “Register Gating”. Register gating is a very common pattern in the AGC, and are key to the operation of its registers and write bus. The inputs to a register that are so gated are prevented from affecting that register unless other conditions are also true. Here, both T12DC/ and EVNSET/ must be low at the same time as our input in order for the set to happen.

What does this mean in practice? Well, as discussed in The Timer Module, the combination of these two signals is equivalent to the timepulse 12 signal, T12. In other words, reset conditions are prevented from reaching this register until time 12 of the MCT (memory cycle time) during which they occur. There’s a couple possible reasons why T12DC/ nor EVNSET/ is used rather than T12 directly — it reduces the load on the T12 line, and (probably more importantly) means that there’s one less gate propagation delay between the start of T12 and the latching of a reset.

t12gating

Here you can see STRT1 being injected into the computer. The line I have highlighted, labeled b[0], represents the output of the “GOJAM in progress” register. It goes low immediately upon detection of STRT1, but the reset condition detected register, whose output is STOPA, doesn’t respond until the beginning of T12.

Once STOPA has been set, both main outputs of this logic are asserted: GOJAM and STOP. STOP is used on page 1 of the Timer module, to inhibit ODDSET/:

timer_clockdiv2

ODDSET/ is what makes the timepulse generator progress from T12 to T01, so as long as STOP is asserted we’re stuck in timepulse 12.

GOJAM is the main signal that prepares the computer for execution. It resets lots of conditions and registers throughout the computer, buts primary goal is to prepare the GOJ1 sequence for execution. GOJ1 is a simple unprogrammable instruction sequence that prepares the central registers for a jump to the program entry point. The ‘1’ in its name means that it’s a “stage 1” instruction — i.e., the stage 1 register needs to be set for it to run. GOJ1 is somewhat unique in that most other sequences begin during stage 0. GOJ1 starts at stage 1 because it shares the same opcode as the transfer control instruction TC. TC only has a single stage, so stage 1 of TC would otherwise be meaningless.

In order to set up GOJ1 for execution, the sequence register SQ must be cleared, the extend bit must be cleared, the stage 1 flip flop must be set, and all other stage registers must be unset.

SQ register and extend bit handling is done on page 1 of the SQ Register and Decoder module.

sq_gojam

I’ve highlighted the path through which GOJAM clears the extend bit SQEXT and the SQ register. It’s fairly circuitous, but it resets SQEXT somewhat directly, and, during the clear time of T12, sets the CSQG (clear SQ, gated) line, which takes care of setting all of the rest of the registers in the module (SQ, the quarter-code registers, and the instruction bit 10 register) to 0.

stage_gojam

STG1 and STG2 are respectively set and cleared directly by GOJAM. STG3 is cleared indirectly via the STRTFC signal, which is set as a result of GOJAM back in the SQ register module (it’s one of the lines that I highlighted).

The results of all of these changes lead into the instruction decoder half of the SQ Register and Decoder module.

goj1

This simple bit of logic says that if SQ is 0, we’re in stage 1, and SQEXT is not set, then the current instruction sequence is GOJ1.

Once GOJ1 has been selected as the sequence to execute, the “GOJAM in progress” register is cleared, GOJAM and STOP are deasserted, and the computer progresses out of T12 into T01 to start executing GOJ1.

As I mentioned earlier, GOJ1 is a very simple sequence. Here is its control pulse definition, taken from Hugh Blair-Smith’s AGC4 Memo #9:

goj1_pulses

Only two of the twelve timepulses generate any control pulses, and the control pulses in T02 are not actually used here. They simply don’t hurt anything, and letting GOJ1 share crosspoints with other sequences saved gates.

T08 is where the magic happens. The control pulse RSTRT puts octal 04000, the software entry point, onto the central write bus. WS and WB write this value into the S and B registers, respectively. Putting the address into the S register kicks off the core rope memory cycle to read the instruction stored there. The B register holds the next instruction to be executed, so when T12 arrives and GOJ1 has been completed, the 04000 will be loaded as the next instruction to execute.

This highlights a neat little design feature of the AGC instruction set — any literal address, when executed, jumps to itself. This happens because the opcode for TC is 000. So the whole point of GOJAM and GOJ1 is simply to set up a TC 4000 — a simple jump to the restart vector!

There’s a couple other things worth talking about in the main start/stop logic in the timer module. I’ll reproduce the image of it here:

startstop

First of note is the STRT2 input above the two registers in the middle. STRT2, being an alarm from the analog module, is apparently considered very high severity. Whereas all other alarms and reset conditions allow the current MCT to finish, STRT2 asserts GOJAM immediately, which causes the MCT to halt in its tracks and immediately start processing T12, regardless of which timepulse it was on.

The bottom half of this logic, which controls the Monitor Stop Requested logic, is never used in flight. It provides a means for the Monitor equipment to halt execution of the computer without inducing a reset, via the MSTP signal. When MSTP is asserted, the current MCT will complete up until its T12. STOP will then be asserted, preventing the computer from processing any more instructions until MSTP is deasserted.