Sanitizes a string for comparison by removing accents and stripping special characters.
Parameters
input: string
The raw string to be cleaned.
Returns string
A sanitized version of the string suitable for fuzzy matching, or the original (trimmed) string if the
sanitized string is empty.
Remarks
Deburrs: Converts Latin-1 Supplement and Latin Extended-A characters to basic
Latin letters (e.g., "Mötley" -> "Motley").
Unicode Filter: Retains all international letters (\p{L}) and numbers (\p{N})
from any script (Chinese, Japanese, Cyrillic, etc.) while removing punctuation
and symbols (e.g., "!", "#", "-").
Whitespace: Collapses multiple spaces into one and trims the result.
Sanitizes a string for comparison by removing accents and stripping special characters.