The Paradox
SDC, the Synopsys Design Constraints format, is the industry standard for specifying timing constraints. Setup time. Hold time. Clock uncertainty. Clock groups. Clock tree synthesis targets. Every concept, every command, every analysis mode assumes a central timing reference: the clock.
Our design has no clock. No clock pin, no clock tree, no clock period that means anything. But the timing engine will not run without one. OpenSTA, the timing analyzer inside OpenROAD, requires at least one clock to be defined or it simply has nothing to analyze. The entire flow stalls.
So the first thing you do in an async SDC file is lie to the tools.
The Virtual Clock
In the LibreLane configuration, you start here:
CLOCK_PORT: null
CLOCK_PERIOD: 50
Setting CLOCK_PORT to null tells LibreLane there is no physical clock pin on the design. But you still define a period, because the timing engine needs a reference value. Then in the SDC file:
create_clock -name vclk -period 50
This creates a virtual clock that is not connected to anything. It does not drive any logic. The period of 50ns is arbitrary. It exists purely to satisfy the tool. Without a clock definition, the LibreLane flow scripts fail during setup. All of the downstream steps that depend on timing analysis, design rule checks, transition time analysis, capacitance validation, none of it runs.
This is the first lesson of async constraints: the tools were not built for you, so you give them what they need to function while making sure they do not interfere with what you have built.
Every Handshake Is a Clock Domain
In a synchronous design, you typically have one clock, maybe a handful. All flip-flops on the same clock are analyzed together for setup and hold. The clock tree gets balanced so every flip-flop sees the same clock edge at the same time.
In our async design, each pipeline stage has its own handshake, and each handshake has its own enable signal that opens and closes a group of latches. From the timing engine's perspective, each of these enable signals is effectively an independent clock.
So we define one. For every handshake:
create_clock -name decode_latch_clk -period 50 [get_pins exec/decode_latch/hndshk/clock_start/*/Y]
create_clock -name pc_clk -period 50 [get_pins exec/pc_is_exec_mem/hndshk/clock_start/*/Y]
create_clock -name mem_wr_clk -period 50 [get_pins mem/hndshk/clock_start/*/Y]
# ... and so on for each handshake group
Each clock is defined at its handshake's enable signal output. The period is still the same arbitrary 50ns. What matters is not the period but the fact that the timing engine now knows these are separate clock domains.
Then you declare them all asynchronous to each other:
set_clock_groups -asynchronous \
-group [get_clocks vclk] \
-group [get_clocks decode_latch_clk] \
-group [get_clocks pc_clk] \
-group [get_clocks mem_wr_clk] \
# ... one group per handshake
This tells the timing engine: do not analyze timing paths between these clock domains. They are independent. There is no relationship between them. Only analyze paths within each group.
Why Bother with Per-Handshake Clocks?
If all the clocks are asynchronous and the periods are meaningless, why define them at all? The answer is skew balancing.
For most latches in the design, skew does not matter much. A latch opens, data flows in, the latch closes. If one bit of a 16-bit word latches a fraction of a nanosecond before another, it is not a problem. The handshake protocol guarantees the data is stable before the enable arrives. The data is already there, waiting.
But when you have a feedback path, skew becomes critical. Consider the program counter: its output feeds back through combinational logic to its own input. If the bits of the PC do not all close at the same time, you can read a corrupted value. Not half old, half new. More likely a single bit that is wrong, and not every time. An intermittent single-bit corruption that depends on physical placement and routing. That is one of the hardest bugs to find in any design.
By defining each handshake enable as a clock, the tool treats the enable signal like a clock net and balances the buffer tree to the latch enable pins. This is the same skew balancing you would get in a synchronous design, just applied locally to each handshake group instead of globally.
The good news: for most handshake groups, this balancing is essentially free. When you have one enable signal driving a handful of latches, the path is already naturally balanced. There is no additional cell area or power cost. The tool confirms what the layout already provides.
Where it matters is when you have a handshake driving dozens of latches. Without the clock definition, the buffer tree for that enable signal will not be balanced. The tool will buffer it for drive strength, but it will not care about matching arrival times at each latch. That unbalanced buffering eats into your timing margin. The data will almost certainly be correct, but it is never good to erode your margin. And in the worst case, a latch furthest downstream on the buffer tree could see a clipped enable pulse, which is exactly the kind of problem that works fine in simulation and fails intermittently on silicon.
Stop Clock Propagation
There is a subtlety when using latches in any design. A latch is transparent when enabled, so from the timing engine's perspective, a clock signal can propagate through the latch Q output into whatever combinational logic follows. This creates thousands of false timing paths.
The fix:
set_sense -type clock -stop_propagation [get_pins -hierarchical *bit_/Q]
This tells the timing engine: the clock stops at the latch output. Do not propagate it into the datapath. The clock is only relevant on the enable pin, not on Q.
False Paths on Delay Chains
We covered this conceptually in last week's post about synthesis. Here is what it looks like in the SDC:
# Function block delay chains
set_false_path -through [get_pins -hierarchical */dly/*/A]
set_false_path -through [get_pins -hierarchical */dly/*/B]
set_false_path -through [get_pins -hierarchical */dly/*/X]
Every delay chain module lives under a dly hierarchy. A single wildcard pattern catches all of them. The timing engine ignores these paths entirely: no upsizing, no buffer insertion, no "fixing" the delay that you intentionally put there.
False Paths on Feedback Loops
C-elements, mutexes, and arbiters all contain intentional feedback. The output feeds back to the input. That is how they hold state. But the timing engine sees a combinational loop and flags it as an error.
The solution comes back to the naming conventions we discussed last week. Every feedback gate in every async cell is named with a _fb_gate suffix. Then a single pair of wildcard constraints handles all of them:
set_false_path -through [get_pins -hierarchical *_fb_gate/*/A]
set_false_path -through [get_pins -hierarchical *_fb_gate/*/A1]
This covers C-elements, mutexes, and arbiters in all their variants. The false path removes these feedback arcs from setup and hold analysis, but the cells are still subject to design rule checks: max capacitance, max transition time, max fanout. The tool still optimizes them, it just does not try to analyze timing through the feedback loop.
Design for Constrainability
If there is a theme across all of this, it is that the constraints are only clean because the RTL was designed with constraints in mind. The _fb_gate naming convention, the hndshk/clock_start hierarchy, the dly module structure: all of that was chosen so that a handful of wildcard patterns could catch every instance. If the naming were inconsistent, or the hierarchy were flat, this SDC file would be hundreds of lines of individual pin paths instead of a few clean patterns.
Design your Verilog so your constraints can be simple. It saves enormous time when you inevitably have to iterate on the physical design.
More broadly, hardware design requires a holistic viewpoint. Everything you do upstream has effects downstream. Everything is tightly coupled. We do not live in a software world where you can isolate different parts of the system from each other. Your RTL affects your synthesis. Your synthesis affects your constraints. Your constraints affect your place and route. Your place and route affects your timing. And your timing sends you back to your RTL. You have to be aware of what is happening multiple levels above and below you at all times. I have seen this break down on big teams where walls go up between disciplines, to the point that no one understands what is happening in the other groups. You end up with a system that works, but is not optimal, and no one understands how to make it better because no one truly understands how the whole system interacts with itself. This is the price of complexity, and there may not be a way around it. But simply keeping the full system in mind is something every engineer should strive for.
The Real Timing Verification
Everything in this post is about keeping the tools from breaking your design during place and route. These constraints prevent the optimizer from upsizing your delay chains, removing your feedback loops, or creating unbalanced enable trees. They are defensive constraints.
The actual timing verification for an async bundled-data design, confirming that each delay chain is longer than its corresponding data path, happens separately. That is a different problem with different tools, and a topic for a future post.
Next week: a detour into custom cell design with Magic. What happens when the standard cell library does not have the cells you need, and what we learned from trying to draw our own.