From 6f85cec4fbc1d1261c67a339a11792dd8f2efd14 Mon Sep 17 00:00:00 2001 From: Zvezdan Petkovic Date: Sun, 13 Jul 2025 08:23:24 +0200 Subject: [PATCH] runtime(python): update rendering of Unicode named literals in syntax script This change: * enforces that the alias starts with a letter * allows the other words in an alias to be separated by either a space or a hyphen, but not both or double separators * allows only a letter after space, possibly followed by letters or digits * allows both letters and digits after a hyphen Tested with: a = '\N{Cyrillic Small Letter Zhe} is pronounced as zh in pleasure' b = '\N{NO-BREAK SPACE} is needed here' # ... other tests here r = '\N{HENTAIGANA LETTER E-1} is a Japanese hiragana letter archaic ye' s = '\N{CUNEIFORM SIGN NU11 TENU} is a correction alias' t = '\N{RECYCLING SYMBOL FOR TYPE-1 PLASTICS} base shape is a triangle' print(a) print(b) print(r) print(s) print(t) The tests confirm the behavior and are selected from real Unicode tables/aliases to check these combinations based on the specification. fixes: #17323 closes: #17735 Signed-off-by: Christian Brabandt --- runtime/syntax/python.vim | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/runtime/syntax/python.vim b/runtime/syntax/python.vim index 7aa82f1b98..68036f5905 100644 --- a/runtime/syntax/python.vim +++ b/runtime/syntax/python.vim @@ -160,7 +160,8 @@ syn match pythonEscape "\\\o\{1,3}" contained syn match pythonEscape "\\x\x\{2}" contained syn match pythonEscape "\%(\\u\x\{4}\|\\U\x\{8}\)" contained " Python allows case-insensitive Unicode IDs: http://www.unicode.org/charts/ -syn match pythonEscape "\\N{\a\+\%(\s\a\+\)*}" contained +" The specification: https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/#G135165 +syn match pythonEscape "\\N{\a\+\%(\%(\s\a\+[[:alnum:]]*\)\|\%(-[[:alnum:]]\+\)\)*}" contained syn match pythonEscape "\\$" " It is very important to understand all details before changing the