From 8c49454caa636a02aa37e10b8941b7e67b6954bb Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 30 Mar 2020 11:14:58 -0400
Subject: [PATCH] Be more careful about extracting encoding from locale strings
 on Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GetLocaleInfoEx() can fail on strings that setlocale() was perfectly
happy with.  A common way for that to happen is if the locale string
is actually a Unix-style string, say "et_EE.UTF-8".  In that case,
what's after the dot is an encoding name, not a Windows codepage number;
blindly treating it as a codepage number led to failure, with a fairly
silly error message.  Hence, check to see if what's after the dot is
all digits, and if not, treat it as a literal encoding name rather than
a codepage number.  This will do the right thing with many Unix-style
locale strings, and produce a more sensible error message otherwise.

Somewhat independently of that, treat a zero (CP_ACP) result from
GetLocaleInfoEx() as meaning that we must use UTF-8 encoding.

Back-patch to all supported branches.

Juan José Santamaría Flecha

Discussion: https://postgr.es/m/24905.1585445371@sss.pgh.pa.us
---
 src/port/chklocale.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index c9c680f0b3..9e3c6db785 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -239,25 +239,44 @@ win32_langinfo(const char *ctype)
 	{
 		r = malloc(16);			/* excess */
 		if (r != NULL)
-			sprintf(r, "CP%u", cp);
+		{
+			/*
+			 * If the return value is CP_ACP that means no ANSI code page is
+			 * available, so only Unicode can be used for the locale.
+			 */
+			if (cp == CP_ACP)
+				strcpy(r, "utf8");
+			else
+				sprintf(r, "CP%u", cp);
+		}
 	}
 	else
 #endif
 	{
 		/*
-		 * Locale format on Win32 is <Language>_<Country>.<CodePage> . For
-		 * example, English_United States.1252.
+		 * Locale format on Win32 is <Language>_<Country>.<CodePage>.  For
+		 * example, English_United States.1252.  If we see digits after the
+		 * last dot, assume it's a codepage number.  Otherwise, we might be
+		 * dealing with a Unix-style locale string; Windows' setlocale() will
+		 * take those even though GetLocaleInfoEx() won't, so we end up here.
+		 * In that case, just return what's after the last dot and hope we can
+		 * find it in our table.
 		 */
 		codepage = strrchr(ctype, '.');
 		if (codepage != NULL)
 		{
-			int			ln;
+			size_t		ln;
 
 			codepage++;
 			ln = strlen(codepage);
 			r = malloc(ln + 3);
 			if (r != NULL)
-				sprintf(r, "CP%s", codepage);
+			{
+				if (strspn(codepage, "0123456789") == ln)
+					sprintf(r, "CP%s", codepage);
+				else
+					strcpy(r, codepage);
+			}
 		}
 
 	}