Matrix Operations using Temporal Computing

· 593 words · 3 minute read

A New Approach to Vector Multiplication and Matrix Operations using Temporal Computing 🔗

Introduction 🔗

The technology we discuss in this concept paper relates to a radical rethink of both computer architecture and the method of arithmetic used for calculation. Our target operation is the vector dot product  $\sum_{i=0}^Iw_ix_i$  , which is the build block of matrix multiplication, which is the building block of all AI.. This note’s thesis is that to make a radically more efficient operator we need to fundamentally rethink both the data representation and the underlying mode of storage and operation. Temporal processing fits nicely with both of these, as it utilises unary codes, and manipulates and operates on this representation using time-delays for storage. So far we have built this in c-mos (using positions in arrays, which we call “index space”) and require funding to move this to a “fully” temporal instantiation.

Representation 🔗

Position based number systems (e.g. binary and base-10) are the go to representation for all forms of data processing, since they have a compression property that provides efficient log scaling for storage capacity. In contrast, unary and unary codes (see figure 1 below) were amongst the earliest number systems and have operations which are considerably simpler than their position based equivalent, addition being a simple concatenation and multiplication being repeated replication.

unary

Figure 1 Unary and Unary Codes. Here the x means an empty cell

Unary has always been seen as inefficient as a storage medium but that is only because the static data structure does not log scale.  However, under a scheme that extends unary encoded data “geometrically” (figure 2) a natural algorithm for dot-product emerges, which we call the MADD algorithm (PAT:GB1812867.8A ) (figure 3). This has the following benefits

  • Storage of the data is compressed, one set of operands become the index value (which is no cost) and the other benefitting from additive compression ($ax_1+ax_2 \equiv a(x_1+x_2)$)
  • The algorithm scales with the magnitude of the operands, not the number of operands.
  • Everything is in-memory - so no movement on and off the element.
  • Smaller numbers are naturally processed more quickly.
  • Zeros are naturally not processed.

5x3

Figure 2. “Geometric” code representation of 5x3  Note the number of 1/0 squares is the multiplication .

To date we have built (taped-out) this in c-mos (135nm OpenLane) using the above index space representation and are evaluating it over the summer, for speed increases and energy savings.

Excitingly, there is a “sweet spot” when you combine unary with the alternative storage medium of time-delays. Time is naturally an interval which is equivalent to unary, so a synergy exists. The benefits of using time-delays as storage are as follows

  • They are naturally low power being sparse and the time element being a free resource!
  • They are easily parallelised.
  • At the same time, since they are clock dependent (and clocks easily tick in the terahertz range) they can actually accelerate processing.

Our aim is to recast the above problem into this domain using initially c-mos, but, because temporal encoding is very flexible, we can also explore photonic and analog. Since this is a “moonshot” it is difficult to assess the impact of this approach, however because we have refactored the dot product operation on a fundamental level, we hope that the many angles will each combine to provide the designated savings.

blocks

Figure 3 Effectively finding the area of the rectangles 4 × 10 and 4 × 4 by progressively sweeping in one pass back across the indexed array, counting initially the first 6 × 4 and then the remaining 8 × 4.