PandA-2024.02
|
Decompose some complex gimple statements into a set of simpler operations. More...
#include "IR_lowering.hpp"
#include "config_HAVE_ASSERTS.hpp"
#include "Parameter.hpp"
#include "application_manager.hpp"
#include "custom_map.hpp"
#include "dbgPrintHelper.hpp"
#include "design_flow_graph.hpp"
#include "design_flow_manager.hpp"
#include "design_flow_step_factory.hpp"
#include "exceptions.hpp"
#include "graph.hpp"
#include "hash_helper.hpp"
#include "hls_device.hpp"
#include "hls_manager.hpp"
#include "math_function.hpp"
#include "string_manipulation.hpp"
#include "technology_flow_step.hpp"
#include "technology_flow_step_factory.hpp"
#include "technology_manager.hpp"
#include "technology_node.hpp"
#include "time_info.hpp"
#include "tree_basic_block.hpp"
#include "tree_common.hpp"
#include "tree_helper.hpp"
#include "tree_manager.hpp"
#include "tree_manipulation.hpp"
#include "tree_node.hpp"
#include "tree_reindex.hpp"
#include <cmath>
#include <cstddef>
#include <limits>
#include <vector>
Go to the source code of this file.
Data Structures | |
struct | mult_cost |
This structure holds the "cost" of a multiply sequence. More... | |
struct | algorithm |
This structure records a sequence of operations. More... | |
Macros | |
#define | MULT_COST_LESS(X, Y) ((X).cost < (Y) || ((X).cost == (Y) && (X).latency < (Y))) |
This macro is used to compare a pointer to a mult_cost against an single integer "cost" value. More... | |
#define | CHEAPER_MULT_COST(X, Y) ((X).cost < (Y).cost || ((X).cost == (Y).cost && (X).latency < (Y).latency)) |
This macro is used to compare two mult_costs against each other. More... | |
Enumerations | |
enum | alg_code { alg_unknown, alg_zero, alg_m, alg_shift, alg_add_t_m2, alg_sub_t_m2, alg_add_factor, alg_sub_factor, alg_add_t2_m, alg_sub_t2_m, alg_impossible } |
the code for lowering of div, mult and rem comes from GCC sources (expmed.c) More... | |
Functions | |
static void | synth_mult (struct algorithm &alg_out, unsigned long long t, const struct mult_cost &cost_limit, unsigned long long data_bitsize, tree_managerRef &TM) |
Compute and return the best algorithm for multiplying by T. More... | |
static bool | choose_mult_variant (unsigned long long data_bitsize, long long int val, struct algorithm &alg, enum mult_variant &variant, short int Mult_cost, tree_managerRef &TM) |
Find the cheapest way of multiplying a value of mode MODE by VAL. More... | |
Variables | |
static CustomUnorderedMap< std::pair< unsigned int, unsigned long long >, std::pair< enum alg_code, struct mult_cost > > | alg_hash |
Decompose some complex gimple statements into a set of simpler operations.
Matteo M. Fusi and Matteo Locatelli modifies the division by an integer constant integrating the work done in this paper: Florent de Dinechin, “Multiplication by Rational Constants” - IEEE Transactions on Circuit Systems-II: Express Briefs, Vol. 59, NO 2, february 2012
Definition in file IR_lowering.cpp.
#define CHEAPER_MULT_COST | ( | X, | |
Y | |||
) | ((X).cost < (Y).cost || ((X).cost == (Y).cost && (X).latency < (Y).latency)) |
This macro is used to compare two mult_costs against each other.
The macro returns true if X is cheaper than Y. Currently, the cheaper of two mult_costs is the one with the lower "cost". If "cost"s are tied, the lower latency is cheaper.
Definition at line 236 of file IR_lowering.cpp.
Referenced by choose_mult_variant(), and synth_mult().
#define MULT_COST_LESS | ( | X, | |
Y | |||
) | ((X).cost < (Y) || ((X).cost == (Y) && (X).latency < (Y))) |
This macro is used to compare a pointer to a mult_cost against an single integer "cost" value.
This is equivalent to the macro CHEAPER_MULT_COST(X,Z) where Z = {Y,Y}.
Definition at line 230 of file IR_lowering.cpp.
Referenced by choose_mult_variant().
enum alg_code |
the code for lowering of div, mult and rem comes from GCC sources (expmed.c)
Enumerator | |
---|---|
alg_unknown | |
alg_zero | |
alg_m | |
alg_shift | |
alg_add_t_m2 | |
alg_sub_t_m2 | |
alg_add_factor | |
alg_sub_factor | |
alg_add_t2_m | |
alg_sub_t2_m | |
alg_impossible |
Definition at line 165 of file IR_lowering.cpp.
|
static |
Find the cheapest way of multiplying a value of mode MODE by VAL.
Try three variations:
Return true if the cheapest of these cost less than MULT_COST, describing the algorithm in *ALG and final fixup in *VARIANT.
Fail quickly for impossible bounds.
Ensure that Mult_cost provides a reasonable upper bound. Any constant multiplication can be performed with less than 2 * bits additions.
Definition at line 706 of file IR_lowering.cpp.
References add_variant, basic_variant, CHEAPER_MULT_COST, mult_cost::cost, algorithm::cost, mult_cost::latency, MULT_COST_LESS, negate_variant, and synth_mult().
Referenced by IR_lowering::expand_MC().
|
static |
Compute and return the best algorithm for multiplying by T.
The algorithm must cost less than cost_limit If retval.cost >= COST_LIMIT, no algorithm was found and all other field of the returned struct are undefined. MODE is the machine mode of the multiplication.
Indicate that no algorithm is yet found. If no algorithm is found, this value will be returned and indicate failure.
t == 1 can be done in zero cost.
t == 0 sometimes has a cost. If it does and it exceeds our limit, fail now.
The cache tells us that it's impossible to synthesize multiplication by T within alg_hash[hash_index].cost.
COST_LIMIT is at least as restrictive as the one recorded in the hash table, in which case we have no hope of synthesizing a multiplication. Just return.
If we get here, COST_LIMIT is less restrictive than the one recorded in the hash table, so we may be able to synthesize a multiplication. Proceed as if we didn't have the cache entry.
The cached algorithm shows that this multiplication requires more cost than COST_LIMIT. Just return. This way, we don't clobber this cache entry with alg_impossible but retain useful information.
If we have a group of zero bits at the low-order part of T, try multiplying by the remaining bits and then doing a shift.
See if treating ORIG_T as a signed number yields a better sequence. Try this sequence only for a negative ORIG_T as it would be useless for a non-negative ORIG_T.
Shift ORIG_T as follows because a right shift of a
negative-valued signed type is implementation defined.
The function expand_shift will choose between a shift
and a sequence of additions, so the observed cost is given as MIN (m * add_cost[speed][mode], shift_cost[speed][mode][m]).
If we have an odd number, add or subtract one.
If T was -1, then W will be zero after the loop. This is another
case where T ends with ...111. Handling this with (T + 1) and subtract 1 produces slightly better code and results in algorithm selection much faster than treating it like the ...0111 case below.
Reject the case where t is 3. Thus we prefer addition in that case.
T ends with ...111. Multiply by (T + 1) and subtract 1.
T ends with ...01 or ...011. Multiply by (T - 1) and add 1.
We may be able to calculate a * -7, a * -15, a * -31, etc
quickly with a - a * n for some appropriate constant n.
Look for factors of t of the form t = q(2**m +- 1), 2 <= m <= floor(log2(t - 1)). If we find such a factor, we can multiply by t using an algorithm that multiplies by q, shift the result by m and add/subtract it to itself.
We search for large factors first and loop down, even if large factors are less probable than small; if we find a large factor we will find a good sequence quickly, and therefore be able to prune (by decreasing COST_LIMIT) the search.
If the target has a cheap shift-and-add instruction use that in preference to a shift insn followed by an add insn. Assume that the shift-and-add is "atomic" with a latency equal to its cost, otherwise assume that on superscalar hardware the shift may be executed concurrently with the earlier steps in the algorithm.
Other factors will have been taken care of in the recursion.
If the target has a cheap shift-and-subtract insn use that in preference to a shift insn followed by a sub insn. Assume that the shift-and-sub is "atomic" with a latency equal to it's cost, otherwise assume that on superscalar hardware the shift may be executed concurrently with the earlier steps in the algorithm.
Try shift-and-add (load effective address) instructions, i.e. do a*3, a*5, a*9.
If best_cost has not decreased, we have not found any algorithm.
We failed to find an algorithm. Record alg_impossible for
this case (that is, <T, MODE, COST_LIMIT>) so that next time we are asked to find an algorithm for T within the same or lower COST_LIMIT, we can immediately return to the caller.
Cache the result.
If we are getting a too long sequence for `struct algorithm' to record, make this search fail.
Copy the algorithm from temporary space to the space at alg_out. We avoid using structure assignment because the majority of best_alg is normally undefined, and this is a critical function.
Definition at line 246 of file IR_lowering.cpp.
References alg_add_factor, alg_add_t2_m, alg_add_t_m2, alg_hash, alg_impossible, alg_m, alg_shift, alg_sub_factor, alg_sub_t2_m, alg_sub_t_m2, alg_unknown, alg_zero, CHEAPER_MULT_COST, mult_cost::cost, algorithm::cost, exact_log2(), floor_log2(), mult_cost::latency, algorithm::log, max, min, algorithm::op, algorithm::ops, and THROW_ERROR.
Referenced by choose_mult_variant().
|
static |
Definition at line 239 of file IR_lowering.cpp.
Referenced by synth_mult().