Synthesis Without a Clock

Week 3: How we use Yosys to synthesize a fully asynchronous processor. C-elements from standard cells, delay chains that survive optimization, and latches that actually work.

4-phase bundled-data async pipeline: C-elements controlling latch enables, matched delay elements on the request path, combinational logic between stages.
Diagram from Jens Sparsø, Introduction to Asynchronous Circuit Design

The Problem

Yosys is a synthesis tool. It reads your Verilog, optimizes it, and maps it onto a target standard cell library. It does this very well for synchronous designs, where flip-flops sit on the same clock and combinational logic fills the space between them. Later in the flow, clock tree synthesis balances the clock distribution so every flip-flop fires at the right time. The tools handle it. That is the model the entire EDA industry is built around.

Asynchronous design throws all of that away.

There is no central clock. Everything runs on self-timed loops. Instead of firing a clock and relying on the tools to verify that everything settles in time, you build the control systems by hand. There is no CTS step because there is no clock tree. The timing relationships between stages are managed by delay elements that you design, instantiate, and constrain yourself.

This means synthesis has to be handled carefully. Yosys does not know it is building an async design. It sees Verilog, it optimizes, and if you are not careful, it will optimize away exactly the things you need to keep.

First Principles: The Muller C-Element

Before we talk about synthesis tricks, we need to talk about the foundational building block of asynchronous computing: the Muller C-element.

A C-element is a state-holding gate. Its behavior is captured in a simple truth table:

| A | B | OUT |
|---|---|-----|
| 0 | 0 |  0  |  Both low: output low
| 0 | 1 | OUT |  Disagree: hold previous state
| 1 | 0 | OUT |  Disagree: hold previous state
| 1 | 1 |  1  |  Both high: output high

When both inputs agree, the output follows. When they disagree, the output holds.

4-Phase Bundled-Data Protocol

If you step back and look at an async pipeline, it actually looks quite similar to a synchronous clocked system. You have sequential elements capturing data with combinational logic clouds between them computing the next values. The difference is how the sequential elements are controlled. Instead of a master clock coordinating everything, the stages coordinate only amongst themselves.

In a 4-phase bundled-data protocol, each pipeline stage has a latch, a block of combinational logic, and a matched delay element. The C-elements act as latch controllers, opening and closing the latches based on handshake signals. A request signal propagates forward to the next stage, telling it that new data is available. An acknowledge signal comes back, telling the previous stage that the data has been captured and it is safe to proceed.

The matched delay element is critical. Since there is no clock edge to guarantee timing, the delay must be long enough to prevent the request signal from arriving at the next stage before the combinational logic has finished computing. The delay matches (or exceeds) the worst-case propagation through the combinational cloud, ensuring the data is stable before it gets latched.

If you are serious about learning async design principles, get a copy of Jens Sparsø's Introduction to Asynchronous Circuit Design. It has been my constant companion throughout this project and is loaded with excellent information. The protocol description above barely scratches the surface.

The SKY130 Problem

The SKY130 standard cell library does not include a C-element. It is not a standard logic gate, so it is not in the library. You have two options.

Option A: draw the transistor-level layout yourself in Magic, characterize it with ngspice across PVT corners, generate a Liberty timing model, and add it to the cell library. This produces a better cell (faster, smaller, lower power) and some people genuinely enjoy this work.

I am not one of those people.

I did go down that road using Magic and xschem, and I will cover that in a future blog post. But I quickly realized it was too much work at this stage. Drawing layout, running DRC, extracting parasitics, running SPICE across corners, generating Liberty models, all of that for every custom cell, multiplied by every variant I needed. For a one-person team trying to hit a tapeout deadline, it was not practical.

So I went with Option B: build the C-element from existing standard cells.

Gate Primitives

The gate-level representation of a C-element is well-known. You can construct one from AND, OR, and feedback gates. The logic is straightforward. The challenge is keeping synthesis from touching it.

Synthesis tools love to optimize. That is their job. Yosys will happily flatten your carefully constructed C-element, rearrange the gates, and produce something that is logically equivalent but structurally wrong. For combinational logic, this is fine. For a C-element, where the physical structure and feedback path matter, it is a disaster.

The solution: wrap every gate in its own module.

`timescale 1ns / 1ps
`default_nettype none
module primitive_and2
  (
    output wire X,
    input wire  A,
    input wire  B
   );

  assign X = A & B;

endmodule

`default_nettype wire

Notice we do not hand-instantiate an AND2 gate here. We leave the functional description and let Yosys pick up the appropriate AND2 cell from the library for us. The key is the module boundary, not the contents.

Then in LibreLane, set hierarchy preservation:

SYNTH_HIERARCHY_MODE: keep

Yosys will not optimize across module boundaries when hierarchy preservation is enabled. Each gate stays exactly as you placed it. The C-element keeps its structure, the feedback path stays intact, and synthesis does not interfere.

You build up every async-specific cell this way. C-elements, mutex arbiters, whatever you need. It is not as elegant as a hand-drawn custom cell, and it will be larger and slower. But it works, it is correct, and it gets you to tapeout.

Delay Chains

The next challenge: delay elements.

In a bundled-data async design, delay elements provide the timing margin that guarantees data is stable before the receiving latch captures it. They are the async equivalent of setup time. Get the delay wrong and the design fails. Remove the delay and the design fails.

To Yosys, a delay element looks like a buffer. And buffers that do not drive anything useful look like candidates for removal. Even if Yosys leaves them alone, OpenROAD may try to resize or reroute them during optimization. You have to protect these cells at every stage of the flow.

Here is what I did:

First, build the delay module as a hierarchy. At the input and output, place a primitive AND2 gate (using the same wrapped-module approach as above). Between them, hand-instantiate a chain of SKY130 delay cells. Give every delay cell instance a name suffix of _dont_touch.

In LibreLane, configure the flow to apply a dont_touch attribute to any instance matching that suffix:

RSZ_DONT_TOUCH_RX: "dont_touch"

This tells both Yosys and OpenROAD to leave these cells alone: no optimization, no resizing, no removal.

The input and output buffer primitives are critical. They serve as anchor points that isolate the delay chain from the rest of the design. Without them, OpenROAD will occasionally try to buffer or upsize the nets connected to the dont_touch cells, and when it does, it disconnects them. OpenROAD does not gracefully handle disconnecting a dont_touch cell. It treats the disconnection as a fatal error and the entire flow crashes.

You also need to set a false path through the delay module in your timing constraints. If you do not, the timing engine will try to analyze the delay chain as a real timing path, complain about the delay, and attempt to "fix" it by adding buffers.

Three things, all required:

  1. Input and output buffer primitives on the delay module
  2. dont_touch attributes on the delay cell instances
  3. False path constraints through the delay module

Do all three and you will never have a problem. Skip any one of them and the flow will eventually break in a way that is difficult to debug.

Latches

The best part about async design: you can use latches instead of flip-flops.

Latches are smaller, faster, and lower power than flip-flops. A flip-flop is essentially two latches back-to-back, so replacing flip-flops with latches saves roughly half the area and power for your storage elements. In an async design where there is no clock edge to synchronize against, latches are the natural storage element.

The problem: the version of Yosys bundled with LibreLane does not handle latches with asynchronous resets properly. No matter how you configure the latch mapping file, Yosys does not recognize them as latches. It either infers them incorrectly or throws errors.

The workaround: do not ask Yosys to infer latches. Hand-instantiate them. But you can make it practical with a parameterized module that wraps the instantiation in a generate block:

// Latch
`ifndef SYNTHESIS
always_latch
  if (!reset_n)      data_out <= {DATA_SIZE{1'b0}};
  else if (latch_en) data_out <= data_in;
`else
genvar              latch_i;
generate
  for (latch_i=0; latch_i < DATA_SIZE; latch_i++)
    sky130_fd_sc_hd__dlrtp_2 bit_ (
      .RESET_B(reset_n),
      .D(data_in[latch_i]),
      .Q(data_out[latch_i]),
      .GATE(latch_en)
    );
endgenerate
`endif

The SYNTHESIS define is the key here. For simulation, always_latch runs faster than hand-instantiated gate primitives, so you get the behavioral description during simulation and the physical cells during synthesis. You set this in your LibreLane configuration:

VERILOG_DEFINES:
  - "SYNTHESIS"

With this approach you get a parameterized latch module that takes a bus of data signals. It is not as convenient as always_latch everywhere, but it is just a module instantiation.

Once the latches are in the design as direct instantiations, both Yosys and OpenROAD treat them as fixed cells. No inference issues, no mapping problems. It just works.

The Pattern

If you step back, the pattern for all three of these problems is the same: do not let the synthesis tool make decisions about your async control logic. Wrap everything in preserved hierarchy. Hand-instantiate what needs to be exact. Protect it with attributes and constraints. Let Yosys optimize your datapath logic, where it excels, but keep it away from the control structures that make async design work.

This is not how Yosys was designed to be used. It is a synchronous synthesis tool being asked to do something it was never built for. But with the right guardrails, it does the job.

Next week: timing constraints for async designs. How you set up SDC constraints when there is no clock, and how to keep OpenROAD from "helping" in ways that break everything.