Decompose some complex gimple statements into a set of simpler operations. More...

#include "IR_lowering.hpp"
#include "config_HAVE_ASSERTS.hpp"
#include "Parameter.hpp"
#include "application_manager.hpp"
#include "custom_map.hpp"
#include "dbgPrintHelper.hpp"
#include "design_flow_graph.hpp"
#include "design_flow_manager.hpp"
#include "design_flow_step_factory.hpp"
#include "exceptions.hpp"
#include "graph.hpp"
#include "hash_helper.hpp"
#include "hls_device.hpp"
#include "hls_manager.hpp"
#include "math_function.hpp"
#include "string_manipulation.hpp"
#include "technology_flow_step.hpp"
#include "technology_flow_step_factory.hpp"
#include "technology_manager.hpp"
#include "technology_node.hpp"
#include "time_info.hpp"
#include "tree_basic_block.hpp"
#include "tree_common.hpp"
#include "tree_helper.hpp"
#include "tree_manager.hpp"
#include "tree_manipulation.hpp"
#include "tree_node.hpp"
#include "tree_reindex.hpp"
#include <cmath>
#include <cstddef>
#include <limits>
#include <vector>

Include dependency graph for IR_lowering.cpp:

Go to the source code of this file.

Data Structures
struct	mult_cost
	This structure holds the "cost" of a multiply sequence. More...

struct	algorithm
	This structure records a sequence of operations. More...

Macros
#define	MULT_COST_LESS(X, Y) ((X).cost < (Y) \|\| ((X).cost == (Y) && (X).latency < (Y)))
	This macro is used to compare a pointer to a mult_cost against an single integer "cost" value. More...

#define	CHEAPER_MULT_COST(X, Y) ((X).cost < (Y).cost \|\| ((X).cost == (Y).cost && (X).latency < (Y).latency))
	This macro is used to compare two mult_costs against each other. More...

Enumerations
enum	alg_code { alg_unknown, alg_zero, alg_m, alg_shift, alg_add_t_m2, alg_sub_t_m2, alg_add_factor, alg_sub_factor, alg_add_t2_m, alg_sub_t2_m, alg_impossible }
	the code for lowering of div, mult and rem comes from GCC sources (expmed.c) More...

Functions
static void	synth_mult (struct algorithm &alg_out, unsigned long long t, const struct mult_cost &cost_limit, unsigned long long data_bitsize, tree_managerRef &TM)
	Compute and return the best algorithm for multiplying by T. More...

static bool	choose_mult_variant (unsigned long long data_bitsize, long long int val, struct algorithm &alg, enum mult_variant &variant, short int Mult_cost, tree_managerRef &TM)
	Find the cheapest way of multiplying a value of mode MODE by VAL. More...

Variables
static CustomUnorderedMap< std::pair< unsigned int, unsigned long long >, std::pair< enum alg_code, struct mult_cost > >	alg_hash

Detailed Description

Decompose some complex gimple statements into a set of simpler operations.

Matteo M. Fusi and Matteo Locatelli modifies the division by an integer constant integrating the work done in this paper: Florent de Dinechin, “Multiplication by Rational Constants” - IEEE Transactions on Circuit Systems-II: Express Briefs, Vol. 59, NO 2, february 2012

Author: Fabrizio Ferrandi fabri.nosp@m.zio..nosp@m.ferra.nosp@m.ndi@.nosp@m.polim.nosp@m.i.it $Revision$ $Date$ Last modified by $Author$

Definition in file IR_lowering.cpp.

Macro Definition Documentation

◆ CHEAPER_MULT_COST

#define CHEAPER_MULT_COST	(	X,
		Y
	)	((X).cost < (Y).cost \|\| ((X).cost == (Y).cost && (X).latency < (Y).latency))

This macro is used to compare two mult_costs against each other.

The macro returns true if X is cheaper than Y. Currently, the cheaper of two mult_costs is the one with the lower "cost". If "cost"s are tied, the lower latency is cheaper.

Definition at line 236 of file IR_lowering.cpp.

Referenced by choose_mult_variant(), and synth_mult().

◆ MULT_COST_LESS

#define MULT_COST_LESS	(	X,
		Y
	)	((X).cost < (Y) \|\| ((X).cost == (Y) && (X).latency < (Y)))

This macro is used to compare a pointer to a mult_cost against an single integer "cost" value.

This is equivalent to the macro CHEAPER_MULT_COST(X,Z) where Z = {Y,Y}.

Definition at line 230 of file IR_lowering.cpp.

Referenced by choose_mult_variant().

Enumeration Type Documentation

◆ alg_code

enum alg_code

the code for lowering of div, mult and rem comes from GCC sources (expmed.c)

Enumerator
alg_unknown
alg_zero
alg_m
alg_shift
alg_add_t_m2
alg_sub_t_m2
alg_add_factor
alg_sub_factor
alg_add_t2_m
alg_sub_t2_m
alg_impossible

Definition at line 165 of file IR_lowering.cpp.

Function Documentation

◆ choose_mult_variant()

static bool choose_mult_variant	(	unsigned long long	data_bitsize,
		long long int	val,
		struct algorithm &	alg,
		enum mult_variant &	variant,
		short int	Mult_cost,
		tree_managerRef &	TM
	)

static

Find the cheapest way of multiplying a value of mode MODE by VAL.

Try three variations:

a shift/add sequence based on VAL itself
a shift/add sequence based on -VAL, followed by a negation
a shift/add sequence based on VAL - 1, followed by an addition.

Return true if the cheapest of these cost less than MULT_COST, describing the algorithm in *ALG and final fixup in *VARIANT.

Fail quickly for impossible bounds.

Ensure that Mult_cost provides a reasonable upper bound. Any constant multiplication can be performed with less than 2 * bits additions.

Definition at line 706 of file IR_lowering.cpp.

References add_variant, basic_variant, CHEAPER_MULT_COST, mult_cost::cost, algorithm::cost, mult_cost::latency, MULT_COST_LESS, negate_variant, and synth_mult().

Referenced by IR_lowering::expand_MC().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ synth_mult()

static void synth_mult	(	struct algorithm &	alg_out,
		unsigned long long	t,
		const struct mult_cost &	cost_limit,
		unsigned long long	data_bitsize,
		tree_managerRef &	TM
	)

static

Compute and return the best algorithm for multiplying by T.

The algorithm must cost less than cost_limit If retval.cost >= COST_LIMIT, no algorithm was found and all other field of the returned struct are undefined. MODE is the machine mode of the multiplication.

Indicate that no algorithm is yet found. If no algorithm is found, this value will be returned and indicate failure.

t == 1 can be done in zero cost.

t == 0 sometimes has a cost. If it does and it exceeds our limit, fail now.

The cache tells us that it's impossible to synthesize multiplication by T within alg_hash[hash_index].cost.

COST_LIMIT is at least as restrictive as the one recorded in the hash table, in which case we have no hope of synthesizing a multiplication. Just return.

If we get here, COST_LIMIT is less restrictive than the one recorded in the hash table, so we may be able to synthesize a multiplication. Proceed as if we didn't have the cache entry.

The cached algorithm shows that this multiplication requires more cost than COST_LIMIT. Just return. This way, we don't clobber this cache entry with alg_impossible but retain useful information.

If we have a group of zero bits at the low-order part of T, try multiplying by the remaining bits and then doing a shift.

See if treating ORIG_T as a signed number yields a better sequence. Try this sequence only for a negative ORIG_T as it would be useless for a non-negative ORIG_T.

  Shift ORIG_T as follows because a right shift of a

negative-valued signed type is implementation defined.

  The function expand_shift will choose between a shift

and a sequence of additions, so the observed cost is given as MIN (m * add_cost[speed][mode], shift_cost[speed][mode][m]).

If we have an odd number, add or subtract one.

If T was -1, then W will be zero after the loop.  This is another

case where T ends with ...111. Handling this with (T + 1) and subtract 1 produces slightly better code and results in algorithm selection much faster than treating it like the ...0111 case below.

Reject the case where t is 3. Thus we prefer addition in that case.

T ends with ...111. Multiply by (T + 1) and subtract 1.

T ends with ...01 or ...011. Multiply by (T - 1) and add 1.

We may be able to calculate a * -7, a * -15, a * -31, etc

quickly with a - a * n for some appropriate constant n.

Look for factors of t of the form t = q(2**m +- 1), 2 <= m <= floor(log2(t - 1)). If we find such a factor, we can multiply by t using an algorithm that multiplies by q, shift the result by m and add/subtract it to itself.

We search for large factors first and loop down, even if large factors are less probable than small; if we find a large factor we will find a good sequence quickly, and therefore be able to prune (by decreasing COST_LIMIT) the search.

If the target has a cheap shift-and-add instruction use that in preference to a shift insn followed by an add insn. Assume that the shift-and-add is "atomic" with a latency equal to its cost, otherwise assume that on superscalar hardware the shift may be executed concurrently with the earlier steps in the algorithm.

Other factors will have been taken care of in the recursion.

If the target has a cheap shift-and-subtract insn use that in preference to a shift insn followed by a sub insn. Assume that the shift-and-sub is "atomic" with a latency equal to it's cost, otherwise assume that on superscalar hardware the shift may be executed concurrently with the earlier steps in the algorithm.

Try shift-and-add (load effective address) instructions, i.e. do a*3, a*5, a*9.

If best_cost has not decreased, we have not found any algorithm.

We failed to find an algorithm.  Record alg_impossible for

this case (that is, <T, MODE, COST_LIMIT>) so that next time we are asked to find an algorithm for T within the same or lower COST_LIMIT, we can immediately return to the caller.

Cache the result.

If we are getting a too long sequence for `struct algorithm' to record, make this search fail.

Copy the algorithm from temporary space to the space at alg_out. We avoid using structure assignment because the majority of best_alg is normally undefined, and this is a critical function.

Definition at line 246 of file IR_lowering.cpp.

References alg_add_factor, alg_add_t2_m, alg_add_t_m2, alg_hash, alg_impossible, alg_m, alg_shift, alg_sub_factor, alg_sub_t2_m, alg_sub_t_m2, alg_unknown, alg_zero, CHEAPER_MULT_COST, mult_cost::cost, algorithm::cost, exact_log2(), floor_log2(), mult_cost::latency, algorithm::log, max, min, algorithm::op, algorithm::ops, and THROW_ERROR.