Timing

Asynchronous Design

We have to decide whether to time synchronously or asynchronously.

Synchronous design uses a clock, and all flip-flops are clocked on the +ve edge(usually). All the activity is in lock-step with the clock.

Asynchronous design does not use a clock. Instead, events in the system are used to initiate operations, which when terminated initiate other operations and so on. The activity is spread over time, and is therefore not so easy to predict; thus async systems are difficult to design, get working, monitor and change. Tool support is also much better for synchronous than asynchronous because it is easier to formulate rules. Industry almost universally adopts synchronous design as this is a safer route to obtaining working silicon.

Adopting synchronous design, the simplest scheme is a single clock; this is used in most systems. In a complex system, we need to check timing constraints each time the system is modified, and there is the possibility of overlap of the master and slave clocks in D-type flip-flops.

Two-Phase Non-Overlapping Clock

In complex systems, we can opt for a two-phase non-overlapping clock; these are applied to alternate master/slave (transparent latch) stages. It is important that one clock is on, the other is off. Furthermore, when one clock turns off, there is a gap before the other turns on, significantly reducing the likelihood of both clocks being on at the same time. This is much safer than a single clock.

To make two-phase non-overlapping clocks from a single clock input pin just requires two nor gates and one inverter.

Clock Buffering

The number of inputs (the load) in a complex system is very large.

  • We can’t drive this from a single gate – clocks need buffering.
  • How can we organise buffering so as to minimise time through buffers?

To know the time for one gate to drive another, we need to know the size of the driving gate, and the size of the receiving gate.

In the stump, we have 7x16 + 4 + 16 + 16 + 2

Timing Delays

\tau is a fundamental parameter of a fabrication line and is quoted by the manufacturer (30ps for 180nm).

Other gate delays can be expressed in terms of \tau.

For large loads, single enormous transistors are not viable, so use a power-up tree combined with geometric buffering.

Clock Skew

Variations in load and differences in line lengths results in the clock arriving at different points at different times. This can form a significant part of the clock period. We want to minimise the skew by:

  • Ensure that all clock drivers see the same load.
  • Use metal for all clock connections.
  • Use the same length of clock line for all parts of the chip. We can do this by dividing the whole chip into a 2d grid and using an iterated H-clock tree.

Effect of Scaling on Timing

If all dimensions and voltages are scaled down by a factor s, our chip can hold s^2. The current per transistor will be reduced by s, so the overall chip current goes up by s, but because our voltage has gone down by s, the chip power stays the same (but current density increases by s.

C_{in} \propto \frac{\text{transistor area}}{\text{transistor thickness}} so C_{in} scales down by s. The output resistance of a gate is proportional to \frac{v}{c}, so remains the same. Gate delay scales down by s.

\text{gate power} = \text{voltage} * \text{current}, so scales down by s^2

Speed-power product improves by s^3.

This is all good news!

Actually??? no.

Non-local interconnects are a problem because line length l remains the same as before.

C_l \propto \frac{l*w'}{d}, where d is the layer separation, so the capacitance of the line remains the same.

R_l \propto \frac{l}{w' * t'}, where t is the depth of thr metal. The resistance scales up by s^2.

The time down a line therefore goes up by s^2. Local interconnection delay remains the same as before.

As chips scale down further, interconnection delay is becoming more prominent than gate delay. The trend is that chip timing will be determined by interconnection times rather than gate delays, and therefore the time for signals to travel from one side of the chip to the other may take several clock cycles.

Table Of Contents

Previous topic

Testability

Next topic

Logic

This Page