PandA-2024.02
Register allocation

Register allocation problem defition

The storage value insertion phase inserts additional nodes in the scheduled data flow graph. Each edge that crosses a cycle step boundary represents a value that has to be stored somewhere. The storage allocation function can therefore be defined as the following transformation:

Given a scheduled data flow graph $G(V_o,E,C)$, the storage value insertion is a transformation $G(V_o,E,C)\rightarrow (V_o\cup V_s,E',C)$, which adds storage value $v\in V_s$ to the graph such that all edges $e\in E$ which cross a cycle step boundary are connected to a storage value.

The register allocation problem can be formulated as the allocation of a storage module $m\in M_s$ for each storage value $v\in V_s$:

Register allocation: the register allocation function $\psi : V_s\rightarrow \Pi(M_s)$, identifies the storage module holding a value from the set $V_s$

The binding information is needed for evaluating and/or performing the register optimization. Therefore, the accurate estimation of the number of registers requires both scheduling and binding.

Compatibility graph definition

The main task to provide good solutions to the register allocation problem is the procedure used to recognize the overlapping of the life time intervals. Since a register is needed for the values alive between two control steps, the analysis can be easily performed on state transition graph (see Finite State Machine). In fact, a vertex in this graph represent all operations executed in a control step. The value that will be further needed will be alive across the edges outcoming from this vertex. Besides, an edge represents the changing from a control step to the next one, so the values alive between two control steps are the values alive on this edge. The dataflow analysis, presented by Appel, allows to compute, for each edge, which are the variables alive. These variables will be the vertices of a conflict graph, that is a graph so defined:

The resulting conflict graph is minimal with respect to number of conflicting variables (i.e. the number of edges in the graph). In a such way, the solution to register allocation problem will use the minimum number of registers. The problem can be formulated as a clique-covering problem (search of largest cliques into the compatibility graph, complementary to the conflict one) and it can be easily resolved with an heuristic vertex coloring on conflict graph. An heuristic coloring assigns a different color to source and target of each edge. So variables alive in the same moment will be connected all together by conflict edges, so they will be differently colored. Since each color used will represent a different register in the final design, the variables will be assigned, as requested, to different registers. When variables are not alive together, there are no conflict edges so it can happen that the algorithm assigns to the variables the same color. It means that they could share the same register, in fact the values are not alive in the same moment.

Introduction

Into data flow graph, each edge represents a data value. If the value crosses a cycle step boundaries, it has to be stored into a register.
For example, in the data flow graph below, the edges from n1 to n3, from n2 to n4 and from n3 to n4, cross the low boundary of control step 1 and control step 2. Each of this edge represents a storage value and so, it has to be assigned to a register.

dot_inline_dotgraph_22.png

This can be considered the first and simpler approach to register allocation problem. This section has the goal to exploit some properties of the problem and try to reduce the number of storage modules. It is organized as follow. The subsection Principles of register allocation explains and defines the problem in other to get a more reasonable approach. Subsection Traditional approach to register allocation problem overviews the classical approach to register allocation and PandA's approach to register allocation defines the approach adopted and implemented into PandA framework. Finally, subsection Algorithms overview shortly defines algorithms implemented into registerAllocation class.

Principles of register allocation

The register allocation problem is defined as finding a solution to the register allocation function $ \psi : V_s \rightarrow \Pi( M_s ) $, that identifies the storage module (like registers and register files) holding a value from the set $ V_s $ of values that have to be stored. A storage module has to be assigned to each storage value. If the register allocation problem is considered in isolation, the goal is to minimize the number of storage modules. Every storage value is labeled with the variable it belongs to using a function $ var: V_s \to LV $ where $LV$ is the set of variables from the language.
The function $ \omega : V_s \to C $ determines for each storage value the cycle step in which it is written.
The function $ \rho : V_s \to \Pi(C) $ determines the set of cycle steps in which the storage value is read. The function $ P: V_s \to C $, determines the last cycle step in which the storage value $ v \in V_s$ is read. Thus, $ P(v) = max(\rho(v)) $.
A storage value v that is written in state $ \omega(v) $ and read for the last time in the state $ P(v) $ is called life in the interval $ lt(v) = [\omega(v), P(v)] $. The interval will be called lifetime (interval) of a storage value v. The requirement to keep storage values from one variable together is modeled by a set of lifetimes assigned to each variable $ u \in LV $. The function $ LT(u) $ describes this set of lifetimes: $ LT(u) = \{lt(v_s)|var(v_s) = u, u \in LV\} $

The compatibility and conflict graphs

Usually, from an examination of the control/data flow graph, it can be derived a conflict graph $ G_s(V_s,W) $ (also called interference graph), where nodes $ V_s $ are the storage value to be stored, and edges W are pairs of values that cannot be assigned to the same storage module because they are alive at the same time. This means that the storage values that are adjacent in $ G_s(V_s,W) $ cannot be stored in the same register because their interval life overlap. Concluding, two vertices are joined by an edge if they are in conflict. The complement of a conflict graph will be called compatibility graph.

Edges $\bar{W}$ of the compatibility graph $ \bar{G}_s(V_s,\bar{W}) $ (e.g., the complement of the conflict graph $ G_s(V_s,W) $ described above) are defined as follow:

$ \bar{W}=\lbrace (v_{i},v_{j}) \mid \ll \omega(v_{i}),P(v_{i}) \gg \parallel \ll \omega(v_{j}),P(v_{j}) \gg = false\rbrace $

and $V_s$ is the set of the storage values; $\omega(v)$ and $P(v)$ are the write and last read cycle steps and the operator $||$ returns true if two intervals overlap and return false otherwise:

$\ll x_1, y_1 \gg || \ll x_2, y_2 \gg =$ $ \\false,\ if\ y_1<x_2\vee y_2<x_1\\true,\ otherwise$

This means that storage values that are adjacent in $G_s(V_s,W)$ can be stored in the same register without overwriting each others values.

Traditional approach to register allocation problem

A clique covering algorithm, applied to compatibility graph, groups the values in such a way that all values that belong to the same clique can be stored in the same register.
A heuristic algorithm for clique covering of compatibility graph is the Tseng's algorithm. The clique covering is done using a heuristic that first combines those nodes having the most neighbors in common. To break a tie also the number of excluded edges from the combination is calculated. One edge is selected from this set $W''$. A new vertex $v_a$ represents the union of the cliques of $v_1$ and $v_2$. All excluded edges are removed from the graph and $v_a$ is connected to the remaining edges of $v_1$. Finally, $v_1$ and $v_2$ are removed from the graph.
Solving the general clique problem will attempt to minimize the number of registers needed to store all the variables. Method registerAlg::clique_cover() implements the Tseng's clique cover algorithm.

However, there are special cases in which more efficient algorithms may be used, as explained into Algorithms overview.
It's known that finding a clique covering in a graph $G$ is the dual problem of finding a coloring in the complement graph $\overline{G}$,so method registerAlg::coloring() provides a way to solve register allocation problem by conflict graph coloring as these graph are one the complement of each other by definition and they are both provided from compatibilityGraph objects.

PandA's approach to register allocation

Use of Reducted SDG Graph

What said in the previous section could be considered as a conservative approach. In fact, with a further and careful analysis, the conflict graph can be pruned by unuseful constraints (e.g. unuseful interference edges) due to false conflicts among variables. So, storage values that were considered conflicting before, now can become compatible and so the number of storage modules can be further reduced. \ In particular, the analysis focuses on the fact that more than one mutual exclusive control flow can exist in the behavioural description: this means that a pair $ (v_i,v_j) $ of storage values can holds the same register (in spite of their interval life overlap or not) if $ v_i,v_j $ belong to paths that are in mutual exclusion each other. \ So, in order to perform this analysis, it can be useful to use a different flow graph representation, the Reduced System Dependence Graph with information about Feedback edges (the RFSDG Graph), that contains all the information you may need afterwards about data and control dependences. This graph represents the transitive reduction of the usual System Dependence Graph (the SDG graph); it's used at this step because it shows a complete and minimum representation of all control/data flows.

Dataflow Analysis

Dataflow analysis, based on Appel\'s approach, has been quite modified to obtain better result in a multi-flow problem, such as a usual DFG is.
First step is usual dataflow analysis, where live variables are computed for each vertex, to get information about liveness. Liveness information (live-in and live-out) can be calculated from use and def as the following dataflow equations shows:

$ in[n] = use[n] \bigcup (out[n] - def[n]) $
$ out[n] = \bigcup_{s \in succ[n]} in[s] $

These dataflow equations for liveness analysis mean that:

Conflict graph creation

After dataflow analysis, conflic graph can be created. In graph creation, each edge into the control flow graph is taken into account. If source vertex and target one are scheduled into different control step, it means that a register is needed for each variable living out from the source vertex to keep value alive until target one will use them. So a conflict edge can be set between each pair of variable in a such situation: they cannot use the same register module. However, it is a conservative approach. Now, some improvements implemented into the proposed solution have been explained.
This algorithm is able to detect $ alias $ variables. In fact, theorically, each vertex can execute only one operation and so store only one result. Multi-definitions are allowed only by using statements like as:

$ a = b = c + d $

In this way, variable a and variable b are different ones but they contain the same value, so they can share the same register. In proposed solution, they can be detected because definitions have been forwarded. So, during the analysis of an edge, if two variables are both alive and definition vertex is the same, they can be considered alias and then there is not conflict between them.

There is no conflict also when the defining vertices are in mutual exclusion. In fact, at run-time, if a variable is defined by a vertex, the other one will not be defined, so register could be shared between them.
Constants and read input variables doesn't need any register. pair of variables living togheter and without any of above properties are in conflict. So an edge between them must be added into the conflict graph.

Choose of the Best Algorithm

As shown in Figure \ ref{fig:flowchart}, there are four main register allocation algorithms. The best one is chosen on the basis of a precise analysis of the topology of the resulting conflict graph. In fact there are special cases in which efficient algorithms may be used.
Before describing precisely each algorithm, some definition can be useful:

Algorithms overview

Left Edge Algorithm

The lifetimes of all values are represented by intervals. The register allocation problem can be viewed as the problem of assigning the intervals to registers along a horizontal line, such that no two intervals in the same register overlap. These intervals can be seen as wires which have to be assigned to tracks (the registers), which will make this problem analogous to the channel routing problem without vertical constraints. A left edge algorithm can be used to solve this problem.

Left edge algorithm $G_s (V_s , W)$

$\\.sort\_values(v\in V_s,\omega (v)); \ \ \ M_s=\emptyset; \\ .\mathbf{foreach} \ v \in \ V_s \ \mathbf{do} \\ .\ \ \ \mathbf{foreach}\ r \in \ M_s\ \mathbf{do} \\ .\ \ \ \ \ \ \ \mathbf{if} \ \omega (v) \ > P(r) \ \mathbf{then} \\ .\ \ \ \ \ \ \ \ \ \ \ \psi(v)=r; \ \mathbf{then} \\ .\ \ \ \ \ \ \ \ \ \ \ P(r)=P(v);\\ .\ \ \ \ \ \ \ \mathbf{endif} \\ .\ \ \ \mathbf{endfor} \\ .\ \ \ \mathbf{if}\ \psi(v)=0 \ \mathbf{then} \\ .\ \ \ \ \ \ \ M_s = M_s \cup \{ r \} ; \ \ \textrm{ / / add new register } \\ .\ \ \ \ \ \ \ \psi(v)=r;\\ .\ \ \ \ \ \ \ P(r)=P(v);\\ .\ \ \ \mathbf{endif} \\ .\mathbf{endfor} $
Since the list of registers is pre-sorted the check if a new value overlaps with the values in the registers can simply be done by comparing the write time of the value with the last cycle step in which the register is occupied. This cycle step is called the last read step of a register: P(r).
Sorting is the most complex step in this algorithm and the left-edge algorithm can be performed with complexity $O(nlogn)$ where n is the number of values to be stored. Method registerAlg::left_edge() implements previously described algorithm.

Chordal Graph Coloring Algorithm

The left edge algorithm will only guarantee optimality if the life times of all variables can be ordered in a linear fashion. If the data flow graph contains conditionals this may not be the case. When variables are alive across the beginning or ends of conditionals these situations can arise. The lifetimes cannot be represented along a single line but form a tree-like structure. The confict register allocation graph is not an interval graph anymore. It can be proved that a graph G is an interval graph, if and only if it does not contain a subgraph which is a so called asteroidal triple. See [1] pag. 16-17.
Despite the fact that the asteroidal triples cause the conflict graph to be non-interval graph, they are still triangular (chordal) graphs. Algorithms exist to efficiently color these graphs. Similar to a left-edge algorithm an ordering in which to color the vertices has to be build first. A lexicographic breath first search provides such an ordering. The algorithm returns a sequence of vertices $ \sigma $ which can be used in coloring the graph. Vertices will be picked in order of their maximal labels. Each time a vertex is picked the labels of all its neighbor vertices are updated. The sequence are compared lexicographically. The node sequence returned in $\sigma$ will be a so-called perfect vertex elimination scheme.

$\\Lexicographic\ BFS\ G(V,E) \\ \\ .\mathbf{foreach}\ v \in V\ \mathbf{do} \\ .\ \ \ \ label(v)=\emptyset ;\\ .\mathbf{endfor} \\ .\mathbf{for} \ i=|V|\ to\ 1\ \mathbf{do} \\ .\ \ \ \ \mathbf{let}\ v_x \in V\ \textrm{such\ that\ label}(v_x)\ \textrm{is\ maximal};\\ .\ \ \ \ \sigma(i)=v_x;\\ .\ \ \ \ \mathbf{foreach}\ v\in \{Adj(v_x)|\sigma^{-1}(v)=0\}\\ .\ \ \ \ \ \ \ label(v)=label(v)+i;\\ .\ \ \ \ \mathbf{endforeach}\\ .\mathbf{endfor}$


Once such an ordering is obtained a simple modification to the left edge algorithm can be made. All values have to added to the registers in the order prescribed by $\sigma$. The test if a value can be added to a register becomes more complicated than a simple left edge algorithm as one cannot simply test for overlap between the read cycle step of the last value with the new value to be assigned. All values previously assigned to the register have to be checked for overlap. The following algorithm produces the minimal number of registers in case of conflict register allocation graph is a triangular graph. The complexity is O(V+W).
$\\ Chordal\ graph\ coloring\ G(V_s,W)\\ \\ .M_s = \emptyset\\ .\mathbf{for}\ k=|V_s|\ to\ 1\ \mathbf{do}\\ .\ \ \ \ v=\sigma(k);\\ .\ \ \ \ \mathbf{foreach}\ r \in M_s\ \mathbf{do}\\ .\ \ \ \ \ \ \ \ \mathbf{if} \not \exists v_s \in r\ \textrm{such that } lt(v_s)||lt(v)=true\ \mathbf{then} \\ .\ \ \ \ \ \ \ \ \ \ \ \ \psi(v)=r;\\ .\ \ \ \ \ \ \ \ \mathbf{endif}\\ .\ \ \ \ \mathbf{endfor}\\ .\ \ \ \ \mathbf{if}\ \psi(v)=0\ \mathbf{then}\\ .\ \ \ \ \ \ \ \ M_s=M_s \cup \{r\};\ \ \textrm{ / / new register}\\ .\ \ \ \ \ \ \ \ \psi(v)=r;\\ .\ \ \ \ \mathbf{endif}\\ .\mathbf{endfor} $


Method registerAlg::chordalColoring() implements the previus chordal graph coloring preceded by the lexicographic breath first search to get the $\sigma$ node order.

Cyclic Register Allocation Algorithm

Cyclic Register Allocation Algorithm is the best algorithm to apply when the conflict graph is a cyclic graph, but only when there is almost one cycle of length strictly greater than three. This is due to the fact that if the conflict graph is a cyclic graph but every cycle of length strictly greater than three possesses a chord, this means that the graph is a chordal graph and so one of the two previous algorithm are applied.

References

[1] L. Stok, "Data path synthesis", Integration, VLSI journal, Vol.18, pp.1-71, June 1994.

[2] Y.L. Lin, "Recent Developments in High-Level Synthesis", ACM Trans. Design Automation of Electronics Systems, vol. 2, pp. 2-21, January 1997.

[3] A. W. Appel, "Modern Compiler Implementation in Java", Cambridge University Press, pp. 1-548, 1998.

[4] R. Shamir, "Advanced Topics in Graph Algorithms", Notes of Course "\e Advanced \e Topics \e in \e Graphs \e Algorithms", Tel Aviv University, Spring 1994.


Generated on Mon Feb 12 2024 13:03:42 for PandA-2024.02 by doxygen 1.8.13