Backport commit from upstream LLVM:

r219009
[ISel] Keep matching state consistent when folding during X86 address match

from Brad (maintainer)
This commit is contained in:
ajacoutot 2015-08-24 07:45:56 +00:00
parent 086b280429
commit 4413e0aa9e
4 changed files with 228 additions and 2 deletions

View File

@ -1,4 +1,4 @@
# $OpenBSD: Makefile,v 1.107 2015/06/15 06:20:48 ajacoutot Exp $
# $OpenBSD: Makefile,v 1.108 2015/08/24 07:45:56 ajacoutot Exp $
# XXX: Remember to bump MODCLANG_VERSION in lang/clang/clang.port.mk when
# updating this port.
@ -10,7 +10,7 @@ COMMENT = modular, fast C/C++/ObjC compiler, static analyzer and tools
LLVM_V = 3.5
DISTNAME = llvm-${LLVM_V}.20140228
REVISION = 34
REVISION = 35
CATEGORIES = devel
MASTER_SITES = http://comstyle.com/source/
EXTRACT_SUFX = .tar.xz

View File

@ -0,0 +1,59 @@
$OpenBSD: patch-include_llvm_CodeGen_SelectionDAGISel_h,v 1.1 2015/08/24 07:45:56 ajacoutot Exp $
r219009
[ISel] Keep matching state consistent when folding during X86 address match
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
--- include/llvm/CodeGen/SelectionDAGISel.h.orig Tue Aug 4 22:44:44 2015
+++ include/llvm/CodeGen/SelectionDAGISel.h Tue Aug 4 22:46:22 2015
@@ -238,6 +238,12 @@ class SelectionDAGISel : public MachineFunctionPass {
const unsigned char *MatcherTable,
unsigned TableSize);
+ /// \brief Return true if complex patterns for this target can mutate the
+ /// DAG.
+ virtual bool ComplexPatternFuncMutatesDAG() const {
+ return false;
+ }
+
private:
// Calls to these functions are generated by tblgen.

View File

@ -0,0 +1,107 @@
$OpenBSD: patch-lib_CodeGen_SelectionDAG_SelectionDAGISel_cpp,v 1.1 2015/08/24 07:45:56 ajacoutot Exp $
r219009
[ISel] Keep matching state consistent when folding during X86 address match
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
--- lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp.orig Tue Aug 4 22:47:10 2015
+++ lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp Tue Aug 4 22:52:46 2015
@@ -2363,6 +2363,45 @@ struct MatchScope {
bool HasChainNodesMatched, HasGlueResultNodesMatched;
};
+/// \\brief A DAG update listener to keep the matching state
+/// (i.e. RecordedNodes and MatchScope) uptodate if the target is allowed to
+/// change the DAG while matching. X86 addressing mode matcher is an example
+/// for this.
+class MatchStateUpdater : public SelectionDAG::DAGUpdateListener
+{
+ SmallVectorImpl<std::pair<SDValue, SDNode*> > &RecordedNodes;
+ SmallVectorImpl<MatchScope> &MatchScopes;
+public:
+ MatchStateUpdater(SelectionDAG &DAG,
+ SmallVectorImpl<std::pair<SDValue, SDNode*> > &RN,
+ SmallVectorImpl<MatchScope> &MS) :
+ SelectionDAG::DAGUpdateListener(DAG),
+ RecordedNodes(RN), MatchScopes(MS) { }
+
+ void NodeDeleted(SDNode *N, SDNode *E) {
+ // Some early-returns here to avoid the search if we deleted the node or
+ // if the update comes from MorphNodeTo (MorphNodeTo is the last thing we
+ // do, so it's unnecessary to update matching state at that point).
+ // Neither of these can occur currently because we only install this
+ // update listener during matching a complex patterns.
+ if (!E || E->isMachineOpcode())
+ return;
+ // Performing linear search here does not matter because we almost never
+ // run this code. You'd have to have a CSE during complex pattern
+ // matching.
+ for (SmallVectorImpl<std::pair<SDValue, SDNode*> >::iterator I =
+ RecordedNodes.begin(), IE = RecordedNodes.end(); I != IE; ++I)
+ if (I->first.getNode() == N)
+ I->first.setNode(E);
+
+ for (SmallVectorImpl<MatchScope>::iterator I = MatchScopes.begin(),
+ IE = MatchScopes.end(); I != IE; ++I)
+ for (SmallVector<SDValue, 4>::iterator J = I->NodeStack.begin(),
+ JE = I->NodeStack.end(); J != JE; ++J)
+ if (J->getNode() == N)
+ J->setNode(E);
+ }
+};
}
SDNode *SelectionDAGISel::
@@ -2617,6 +2656,14 @@ SelectCodeCommon(SDNode *NodeToMatch, const unsigned c
unsigned CPNum = MatcherTable[MatcherIndex++];
unsigned RecNo = MatcherTable[MatcherIndex++];
assert(RecNo < RecordedNodes.size() && "Invalid CheckComplexPat");
+
+ // If target can modify DAG during matching, keep the matching state
+ // consistent.
+ OwningPtr<MatchStateUpdater> MSU;
+ if (ComplexPatternFuncMutatesDAG())
+ MSU.reset(new MatchStateUpdater(*CurDAG, RecordedNodes,
+ MatchScopes));
+
if (!CheckComplexPattern(NodeToMatch, RecordedNodes[RecNo].second,
RecordedNodes[RecNo].first, CPNum,
RecordedNodes))

View File

@ -0,0 +1,60 @@
$OpenBSD: patch-lib_Target_X86_X86ISelDAGToDAG_cpp,v 1.1 2015/08/24 07:45:56 ajacoutot Exp $
r219009
[ISel] Keep matching state consistent when folding during X86 address match
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
--- lib/Target/X86/X86ISelDAGToDAG.cpp.orig Tue Aug 4 22:53:05 2015
+++ lib/Target/X86/X86ISelDAGToDAG.cpp Tue Aug 4 22:53:59 2015
@@ -290,6 +290,13 @@ namespace {
const X86InstrInfo *getInstrInfo() const {
return getTargetMachine().getInstrInfo();
}
+
+ /// \brief Address-mode matching performs shift-of-and to and-of-shift
+ /// reassociation in order to expose more scaled addressing
+ /// opportunities.
+ bool ComplexPatternFuncMutatesDAG() const {
+ return true;
+ }
};
}