d8783c512e
metaphone() in a contrib. There seem to be a fair number of different approaches to both of these algorithms. I used the simplest case for levenshtein which has a cost of 1 for any character insertion, deletion, or substitution. For metaphone, I adapted the same code from CPAN that the PHP folks did. A couple of questions: 1. Does it make sense to fold the soundex contrib together with this one? 2. I was debating trying to add multibyte support to levenshtein (it would make no sense at all for metaphone), but a quick search through the contrib directory found no hits on the word MULTIBYTE. Should worry about adding multibyte support to levenshtein()? Joe Conway
122 lines
3.4 KiB
Plaintext
122 lines
3.4 KiB
Plaintext
/*
|
|
* fuzzystrmatch.c
|
|
*
|
|
* Functions for "fuzzy" comparison of strings
|
|
*
|
|
* Copyright (c) Joseph Conway <joseph.conway@home.com>, 2001;
|
|
*
|
|
* levenshtein()
|
|
* -------------
|
|
* Written based on a description of the algorithm by Michael Gilleland
|
|
* found at http://www.merriampark.com/ld.htm
|
|
* Also looked at levenshtein.c in the PHP 4.0.6 distribution for
|
|
* inspiration.
|
|
*
|
|
* metaphone()
|
|
* -----------
|
|
* Modified for PostgreSQL by Joe Conway.
|
|
* Based on CPAN's "Text-Metaphone-1.96" by Michael G Schwern <schwern@pobox.com>
|
|
* Code slightly modified for use as PostgreSQL function (palloc, elog, etc).
|
|
* Metaphone was originally created by Lawrence Philips and presented in article
|
|
* in "Computer Language" December 1990 issue.
|
|
*
|
|
* Permission to use, copy, modify, and distribute this software and its
|
|
* documentation for any purpose, without fee, and without a written agreement
|
|
* is hereby granted, provided that the above copyright notice and this
|
|
* paragraph and the following two paragraphs appear in all copies.
|
|
*
|
|
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
|
|
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
|
|
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
|
|
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
|
*
|
|
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
|
|
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
|
|
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
|
|
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
*
|
|
*/
|
|
|
|
|
|
Version 0.1 (3 August, 2001):
|
|
Functions to calculate the degree to which two strings match in a "fuzzy" way
|
|
Tested under Linux (Red Hat 6.2 and 7.0) and PostgreSQL 7.2devel
|
|
|
|
Release Notes:
|
|
|
|
Version 0.1
|
|
- initial release
|
|
|
|
Installation:
|
|
Place these files in a directory called 'fuzzystrmatch' under 'contrib' in the PostgreSQL source tree. Then run:
|
|
|
|
make
|
|
make install
|
|
|
|
You can use fuzzystrmatch.sql to create the functions in your database of choice, e.g.
|
|
|
|
psql -U postgres template1 < fuzzystrmatch.sql
|
|
|
|
installs following functions into database template1:
|
|
|
|
levenshtein() - calculates the levenshtein distance between two strings
|
|
metaphone() - calculates the metaphone code of an input string
|
|
|
|
Documentation
|
|
==================================================================
|
|
Name
|
|
|
|
levenshtein -- calculates the levenshtein distance between two strings
|
|
|
|
Synopsis
|
|
|
|
levenshtein(text source, text target)
|
|
|
|
Inputs
|
|
|
|
source
|
|
any text string, 255 characters max, NOT NULL
|
|
|
|
target
|
|
any text string, 255 characters max, NOT NULL
|
|
|
|
Outputs
|
|
|
|
Returns int
|
|
|
|
Example usage
|
|
|
|
select levenshtein('GUMBO','GAMBOL');
|
|
|
|
==================================================================
|
|
Name
|
|
|
|
metaphone -- calculates the metaphone code of an input string
|
|
|
|
Synopsis
|
|
|
|
metaphone(text source, int max_output_length)
|
|
|
|
Inputs
|
|
|
|
source
|
|
any text string, 255 characters max, NOT NULL
|
|
|
|
max_output_length
|
|
maximum length of the output metaphone code; if longer, the output
|
|
is truncated to this length
|
|
|
|
Outputs
|
|
|
|
Returns text
|
|
|
|
Example usage
|
|
|
|
select metaphone('GUMBO',4);
|
|
|
|
==================================================================
|
|
-- Joe Conway
|
|
|