Extend the unknowns-are-same-as-known-inputs type resolution heuristic.

For a very long time, one of the parser's heuristics for resolving ambiguous operator calls has been to assume that unknown-type literals are of the same type as the other input (if it's known). However, this was only used in the first step of quickly checking for an exact-types match, and thus did not help in resolving matches that require coercion, such as matches to polymorphic operators. As we add more polymorphic operators, this becomes more of a problem. This patch adds another use of the same heuristic as a last-ditch check before failing to resolve an ambiguous operator or function call. In particular this will let us define the range inclusion operator in a less limited way (to come in a follow-on patch).
2011-11-17 18:28:41 -05:00 · 2011-11-17 18:28:41 -05:00 · 1a8b9fb549
commit 1a8b9fb549
parent bf4f96b5e2
2 changed files with 147 additions and 26 deletions
--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@ -304,13 +304,18 @@ without more clues.  Now discard
 candidates that do not accept the selected type category.  Furthermore,
 if any candidate accepts a preferred type in that category,
 discard candidates that accept non-preferred types for that argument.
+Keep all candidates if none survive these tests.
+If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one candidate remains, use it.  If no candidate or more than one
-candidate remains,
-then fail.
+If there are both <type>unknown</type> and known-type arguments, and all
+the known-type arguments have the same type, assume that the
+<type>unknown</type> arguments are also of that type, and check which
+candidates can accept that type at the <type>unknown</type>-argument
+positions.  If exactly one candidate passes this test, use it.
+Otherwise, fail.
 </para>
 </step>
 </substeps>
@ -376,7 +381,7 @@ be interpreted as type <type>text</type>.
 </para>

 <para>
-Here is a concatenation on unspecified types:
+Here is a concatenation of two values of unspecified types:
 <screen>
 SELECT 'abc' || 'def' AS "unspecified";

@ -394,7 +399,7 @@ and finds that there are candidates accepting both string-category and
 bit-string-category inputs.  Since string category is preferred when available,
 that category is selected, and then the
 preferred type for strings, <type>text</type>, is used as the specific
-type to resolve the unknown literals as.
+type to resolve the unknown-type literals as.
 </para>
 </example>

@ -450,6 +455,36 @@ SELECT ~ CAST('20' AS int8) AS "negation";
 </para>
 </example>

+<example>
+<title>Array Inclusion Operator Type Resolution</title>
+
+<para>
+Here is another example of resolving an operator with one known and one
+unknown input:
+<screen>
+SELECT array[1,2] &lt;@ '{1,2,3}' as "is subset";
+
+ is subset
+-----------
+ t
+(1 row)
+</screen>
+The <productname>PostgreSQL</productname> operator catalog has several
+entries for the infix operator <literal>&lt;@</>, but the only two that
+could possibly accept an integer array on the left-hand side are
+array inclusion (<type>anyarray</> <literal>&lt;@</> <type>anyarray</>)
+and range inclusion (<type>anyelement</> <literal>&lt;@</> <type>anyrange</>).
+Since none of these polymorphic pseudo-types (see <xref
+linkend="datatype-pseudo">) are considered preferred, the parser cannot
+resolve the ambiguity on that basis.  However, the last resolution rule tells
+it to assume that the unknown-type literal is of the same type as the other
+input, that is, integer array.  Now only one of the two operators can match,
+so array inclusion is selected.  (Had range inclusion been selected, we would
+have gotten an error, because the string does not have the right format to be
+a range literal.)
+</para>
+</example>
+
 </sect1>

 <sect1 id="typeconv-func">
@ -594,13 +629,18 @@ the correct choice cannot be deduced without more clues.
 Now discard candidates that do not accept the selected type category.
 Furthermore, if any candidate accepts a preferred type in that category,
 discard candidates that accept non-preferred types for that argument.
+Keep all candidates if none survive these tests.
+If only one candidate remains, use it; else continue to the next step.
 </para>
 </step>
 <step performance="required">
 <para>
-If only one candidate remains, use it.  If no candidate or more than one
-candidate remains,
-then fail.
+If there are both <type>unknown</type> and known-type arguments, and all
+the known-type arguments have the same type, assume that the
+<type>unknown</type> arguments are also of that type, and check which
+candidates can accept that type at the <type>unknown</type>-argument
+positions.  If exactly one candidate passes this test, use it.
+Otherwise, fail.
 </para>
 </step>
 </substeps>
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@ -618,14 +618,16 @@ func_select_candidate(int nargs,
 					  Oid *input_typeids,
 					  FuncCandidateList candidates)
 {
-	FuncCandidateList current_candidate;
-	FuncCandidateList last_candidate;
+	FuncCandidateList current_candidate,
+				first_candidate,
+				last_candidate;
 	Oid		   *current_typeids;
 	Oid			current_type;
 	int			i;
 	int			ncandidates;
 	int			nbestMatch,
-				nmatch;
+				nmatch,
+				nunknowns;
 	Oid			input_base_typeids[FUNC_MAX_ARGS];
 	TYPCATEGORY slot_category[FUNC_MAX_ARGS],
 				current_category;
@ -651,9 +653,22 @@ func_select_candidate(int nargs,
 	 * take a domain as an input datatype.	Such a function will be selected
 	 * over the base-type function only if it is an exact match at all
 	 * argument positions, and so was already chosen by our caller.
+	 *
+	 * While we're at it, count the number of unknown-type arguments for use
+	 * later.
 	 */
+	nunknowns = 0;
 	for (i = 0; i < nargs; i++)
+	{
+		if (input_typeids[i] != UNKNOWNOID)
 			input_base_typeids[i] = getBaseType(input_typeids[i]);
+		else
+		{
+			/* no need to call getBaseType on UNKNOWNOID */
+			input_base_typeids[i] = UNKNOWNOID;
+			nunknowns++;
+		}
+	}

 	/*
 	 * Run through all candidates and keep those with the most matches on
@ -749,14 +764,16 @@ func_select_candidate(int nargs,
 		return candidates;

 	/*
-	 * Still too many candidates? Try assigning types for the unknown columns.
+	 * Still too many candidates?  Try assigning types for the unknown inputs.
 	 *
-	 * NOTE: for a binary operator with one unknown and one non-unknown input,
-	 * we already tried the heuristic of looking for a candidate with the
-	 * known input type on both sides (see binary_oper_exact()). That's
-	 * essentially a special case of the general algorithm we try next.
-	 *
-	 * We do this by examining each unknown argument position to see if we can
+	 * If there are no unknown inputs, we have no more heuristics that apply,
+	 * and must fail.
+	 */
+	if (nunknowns == 0)
+		return NULL;			/* failed to select a best candidate */
+
+	/*
+	 * The next step examines each unknown argument position to see if we can
 	 * determine a "type category" for it.	If any candidate has an input
 	 * datatype of STRING category, use STRING category (this bias towards
 	 * STRING is appropriate since unknown-type literals look like strings).
@ -770,9 +787,9 @@ func_select_candidate(int nargs,
 	 * Having completed this examination, remove candidates that accept the
 	 * wrong category at any unknown position.	Also, if at least one
 	 * candidate accepted a preferred type at a position, remove candidates
-	 * that accept non-preferred types.
-	 *
-	 * If we are down to one candidate at the end, we win.
+	 * that accept non-preferred types.  If just one candidate remains,
+	 * return that one.  However, if this rule turns out to reject all
+	 * candidates, keep them all instead.
 	 */
 	resolved_unknowns = false;
 	for (i = 0; i < nargs; i++)
@ -835,6 +852,7 @@ func_select_candidate(int nargs,
 	{
 		/* Strip non-matching candidates */
 		ncandidates = 0;
+		first_candidate = candidates;
 		last_candidate = NULL;
 		for (current_candidate = candidates;
 			 current_candidate != NULL;
@ -874,15 +892,78 @@ func_select_candidate(int nargs,
 				if (last_candidate)
 					last_candidate->next = current_candidate->next;
 				else
-					candidates = current_candidate->next;
+					first_candidate = current_candidate->next;
 			}
 		}
-		if (last_candidate)		/* terminate rebuilt list */
+
+		/* if we found any matches, restrict our attention to those */
+		if (last_candidate)
+		{
+			candidates = first_candidate;
+			/* terminate rebuilt list */
 			last_candidate->next = NULL;
 		}

 		if (ncandidates == 1)
 			return candidates;
+	}
+
+	/*
+	 * Last gasp: if there are both known- and unknown-type inputs, and all
+	 * the known types are the same, assume the unknown inputs are also that
+	 * type, and see if that gives us a unique match.  If so, use that match.
+	 *
+	 * NOTE: for a binary operator with one unknown and one non-unknown input,
+	 * we already tried this heuristic in binary_oper_exact().  However, that
+	 * code only finds exact matches, whereas here we will handle matches that
+	 * involve coercion, polymorphic type resolution, etc.
+	 */
+	if (nunknowns < nargs)
+	{
+		Oid			known_type = UNKNOWNOID;
+
+		for (i = 0; i < nargs; i++)
+		{
+			if (input_base_typeids[i] == UNKNOWNOID)
+				continue;
+			if (known_type == UNKNOWNOID)		/* first known arg? */
+				known_type = input_base_typeids[i];
+			else if (known_type != input_base_typeids[i])
+			{
+				/* oops, not all match */
+				known_type = UNKNOWNOID;
+				break;
+			}
+		}
+
+		if (known_type != UNKNOWNOID)
+		{
+			/* okay, just one known type, apply the heuristic */
+			for (i = 0; i < nargs; i++)
+				input_base_typeids[i] = known_type;
+			ncandidates = 0;
+			last_candidate = NULL;
+			for (current_candidate = candidates;
+				 current_candidate != NULL;
+				 current_candidate = current_candidate->next)
+			{
+				current_typeids = current_candidate->args;
+				if (can_coerce_type(nargs, input_base_typeids, current_typeids,
+									COERCION_IMPLICIT))
+				{
+					if (++ncandidates > 1)
+						break;	/* not unique, give up */
+					last_candidate = current_candidate;
+				}
+			}
+			if (ncandidates == 1)
+			{
+				/* successfully identified a unique match */
+				last_candidate->next = NULL;
+				return last_candidate;
+			}
+		}
+	}

 	return NULL;				/* failed to select a best candidate */
 }	/* func_select_candidate() */