Extend the unknowns-are-same-as-known-inputs type resolution heuristic.

For a very long time, one of the parser's heuristics for resolving ambiguous operator calls has been to assume that unknown-type literals are of the same type as the other input (if it's known). However, this was only used in the first step of quickly checking for an exact-types match, and thus did not help in resolving matches that require coercion, such as matches to polymorphic operators. As we add more polymorphic operators, this becomes more of a problem. This patch adds another use of the same heuristic as a last-ditch check before failing to resolve an ambiguous operator or function call. In particular this will let us define the range inclusion operator in a less limited way (to come in a follow-on patch).
2011-11-17 18:28:41 -05:00 · 2011-11-17 18:28:41 -05:00 · 1a8b9fb549
commit 1a8b9fb549
parent bf4f96b5e2
2 changed files with 147 additions and 26 deletions
--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@ -304,13 +304,18 @@ without more clues.  Now discard
 candidates that do not accept the selected type category.  Furthermore,
 if any candidate accepts a preferred type in that category,
 discard candidates that accept non-preferred types for that argument.
 Keep all candidates if none survive these tests.
 If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one candidate remains, use it.  If no candidate or more than one
+If there are both <type>unknown</type> and known-type arguments, and all
-candidate remains,
+the known-type arguments have the same type, assume that the
-then fail.
+<type>unknown</type> arguments are also of that type, and check which
 candidates can accept that type at the <type>unknown</type>-argument
 positions.  If exactly one candidate passes this test, use it.
 Otherwise, fail.
 </para>
 </step>
 </substeps>
@ -376,7 +381,7 @@ be interpreted as type <type>text</type>.
 </para>
 <para>
-Here is a concatenation on unspecified types:
+Here is a concatenation of two values of unspecified types:
 <screen>
 SELECT 'abc' || 'def' AS "unspecified";
@ -394,7 +399,7 @@ and finds that there are candidates accepting both string-category and
 bit-string-category inputs.  Since string category is preferred when available,
 that category is selected, and then the
 preferred type for strings, <type>text</type>, is used as the specific
-type to resolve the unknown literals as.
+type to resolve the unknown-type literals as.
 </para>
 </example>
@ -450,6 +455,36 @@ SELECT ~ CAST('20' AS int8) AS "negation";
 </para>
 </example>
 <example>
 <title>Array Inclusion Operator Type Resolution</title>
 <para>
 Here is another example of resolving an operator with one known and one
 unknown input:
 <screen>
 SELECT array[1,2] &lt;@ '{1,2,3}' as "is subset";
 is subset
 -----------
 t
 (1 row)
 </screen>
 The <productname>PostgreSQL</productname> operator catalog has several
 entries for the infix operator <literal>&lt;@</>, but the only two that
 could possibly accept an integer array on the left-hand side are
 array inclusion (<type>anyarray</> <literal>&lt;@</> <type>anyarray</>)
 and range inclusion (<type>anyelement</> <literal>&lt;@</> <type>anyrange</>).
 Since none of these polymorphic pseudo-types (see <xref
 linkend="datatype-pseudo">) are considered preferred, the parser cannot
 resolve the ambiguity on that basis.  However, the last resolution rule tells
 it to assume that the unknown-type literal is of the same type as the other
 input, that is, integer array.  Now only one of the two operators can match,
 so array inclusion is selected.  (Had range inclusion been selected, we would
 have gotten an error, because the string does not have the right format to be
 a range literal.)
 </para>
 </example>
 </sect1>
 <sect1 id="typeconv-func">
@ -594,13 +629,18 @@ the correct choice cannot be deduced without more clues.
 Now discard candidates that do not accept the selected type category.
 Furthermore, if any candidate accepts a preferred type in that category,
 discard candidates that accept non-preferred types for that argument.
 Keep all candidates if none survive these tests.
 If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one candidate remains, use it.  If no candidate or more than one
+If there are both <type>unknown</type> and known-type arguments, and all
-candidate remains,
+the known-type arguments have the same type, assume that the
-then fail.
+<type>unknown</type> arguments are also of that type, and check which
 candidates can accept that type at the <type>unknown</type>-argument
 positions.  If exactly one candidate passes this test, use it.
 Otherwise, fail.
 </para>
 </step>
 </substeps>
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@ -618,14 +618,16 @@ func_select_candidate(int nargs,
 					  Oid *input_typeids,
 					  FuncCandidateList candidates)
 {
-	FuncCandidateList current_candidate;
+	FuncCandidateList current_candidate,
-	FuncCandidateList last_candidate;
+				first_candidate,
 				last_candidate;
 	Oid		   *current_typeids;
 	Oid			current_type;
 	int			i;
 	int			ncandidates;
 	int			nbestMatch,
-				nmatch;
+				nmatch,
 				nunknowns;
 	Oid			input_base_typeids[FUNC_MAX_ARGS];
 	TYPCATEGORY slot_category[FUNC_MAX_ARGS],
 				current_category;
@ -651,9 +653,22 @@ func_select_candidate(int nargs,
 	 * take a domain as an input datatype.	Such a function will be selected
 	 * over the base-type function only if it is an exact match at all
 	 * argument positions, and so was already chosen by our caller.
 	 *
 	 * While we're at it, count the number of unknown-type arguments for use
 	 * later.
 	 */
 	nunknowns = 0;
 	for (i = 0; i < nargs; i++)
 	{
 		if (input_typeids[i] != UNKNOWNOID)
 			input_base_typeids[i] = getBaseType(input_typeids[i]);
 		else
 		{
 			/* no need to call getBaseType on UNKNOWNOID */
 			input_base_typeids[i] = UNKNOWNOID;
 			nunknowns++;
 		}
 	}
 	/*
 	 * Run through all candidates and keep those with the most matches on
@ -749,14 +764,16 @@ func_select_candidate(int nargs,
 		return candidates;
 	/*
-	 * Still too many candidates? Try assigning types for the unknown columns.
+	 * Still too many candidates?  Try assigning types for the unknown inputs.
 	 *
-	 * NOTE: for a binary operator with one unknown and one non-unknown input,
+	 * If there are no unknown inputs, we have no more heuristics that apply,
-	 * we already tried the heuristic of looking for a candidate with the
+	 * and must fail.
-	 * known input type on both sides (see binary_oper_exact()). That's
+	 */
-	 * essentially a special case of the general algorithm we try next.
+	if (nunknowns == 0)
-	 *
+		return NULL;			/* failed to select a best candidate */
-	 * We do this by examining each unknown argument position to see if we can
+
 	/*
 	 * The next step examines each unknown argument position to see if we can
 	 * determine a "type category" for it.	If any candidate has an input
 	 * datatype of STRING category, use STRING category (this bias towards
 	 * STRING is appropriate since unknown-type literals look like strings).
@ -770,9 +787,9 @@ func_select_candidate(int nargs,
 	 * Having completed this examination, remove candidates that accept the
 	 * wrong category at any unknown position.	Also, if at least one
 	 * candidate accepted a preferred type at a position, remove candidates
-	 * that accept non-preferred types.
+	 * that accept non-preferred types.  If just one candidate remains,
-	 *
+	 * return that one.  However, if this rule turns out to reject all
-	 * If we are down to one candidate at the end, we win.
+	 * candidates, keep them all instead.
 	 */
 	resolved_unknowns = false;
 	for (i = 0; i < nargs; i++)
@ -835,6 +852,7 @@ func_select_candidate(int nargs,
 	{
 		/* Strip non-matching candidates */
 		ncandidates = 0;
 		first_candidate = candidates;
 		last_candidate = NULL;
 		for (current_candidate = candidates;
 			 current_candidate != NULL;
@ -874,15 +892,78 @@ func_select_candidate(int nargs,
 				if (last_candidate)
 					last_candidate->next = current_candidate->next;
 				else
-					candidates = current_candidate->next;
+					first_candidate = current_candidate->next;
 			}
 		}
-		if (last_candidate)		/* terminate rebuilt list */
+
 		/* if we found any matches, restrict our attention to those */
 		if (last_candidate)
 		{
 			candidates = first_candidate;
 			/* terminate rebuilt list */
 			last_candidate->next = NULL;
 		}
 		if (ncandidates == 1)
 			return candidates;
 	}
 	/*
 	 * Last gasp: if there are both known- and unknown-type inputs, and all
 	 * the known types are the same, assume the unknown inputs are also that
 	 * type, and see if that gives us a unique match.  If so, use that match.
 	 *
 	 * NOTE: for a binary operator with one unknown and one non-unknown input,
 	 * we already tried this heuristic in binary_oper_exact().  However, that
 	 * code only finds exact matches, whereas here we will handle matches that
 	 * involve coercion, polymorphic type resolution, etc.
 	 */
 	if (nunknowns < nargs)
 	{
 		Oid			known_type = UNKNOWNOID;
 		for (i = 0; i < nargs; i++)
 		{
 			if (input_base_typeids[i] == UNKNOWNOID)
 				continue;
 			if (known_type == UNKNOWNOID)		/* first known arg? */
 				known_type = input_base_typeids[i];
 			else if (known_type != input_base_typeids[i])
 			{
 				/* oops, not all match */
 				known_type = UNKNOWNOID;
 				break;
 			}
 		}
 		if (known_type != UNKNOWNOID)
 		{
 			/* okay, just one known type, apply the heuristic */
 			for (i = 0; i < nargs; i++)
 				input_base_typeids[i] = known_type;
 			ncandidates = 0;
 			last_candidate = NULL;
 			for (current_candidate = candidates;
 				 current_candidate != NULL;
 				 current_candidate = current_candidate->next)
 			{
 				current_typeids = current_candidate->args;
 				if (can_coerce_type(nargs, input_base_typeids, current_typeids,
 									COERCION_IMPLICIT))
 				{
 					if (++ncandidates > 1)
 						break;	/* not unique, give up */
 					last_candidate = current_candidate;
 				}
 			}
 			if (ncandidates == 1)
 			{
 				/* successfully identified a unique match */
 				last_candidate->next = NULL;
 				return last_candidate;
 			}
 		}
 	}
 	return NULL;				/* failed to select a best candidate */
 }	/* func_select_candidate() */