thac0/vim - vim - SDF GIT Society

thac0/vim

Author	SHA1	Message	Date
Christian Brabandt	22e8e12d9f	patch 9.1.0645: regex: wrong match when searching multi-byte char case-insensitive Problem: regex: wrong match when searching multi-byte char case-insensitive (diffsetter) Solution: Apply proper case-folding for characters and search-string This patch does the following 4 things: 1) When the regexp engine compares two utf-8 codepoints case insensitive it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: * we try to match the correct length for the pattern and the text * in case of a match, we step over it correctly There is one tricky thing for the backtracing engine. We also need to calculate correctly the number of bytes to compare the 2 different utf-8 strings s1 and s2. So we will count the number of characters in s1 that the byte len specified. Then we count the number of bytes to step over the same number of characters in string s2 and then we can correctly compare the 2 utf-8 strings. 2) A similar thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. 3) A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. 4) When comparing characters using collections, we must also apply case folding to each character in the collection and not just to the current character from the search string. This doesn't apply to the NFA engine, because internally it converts collections to branches [abc] -> a\\|b\\|c fixes: #14294 closes: #14756 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-07-30 20:39:18 +02:00
zeertzjq	a59e031aa0	patch 9.1.0334: No test for highlight behavior with 'ambiwidth' Problem: No test for highlight behavior with 'ambiwidth'. Solution: Add a screendump test for 'ambiwidth' with 'cursorline'. (zeertzjq) closes: #14554 Signed-off-by: zeertzjq <zeertzjq@outlook.com> Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-15 19:14:38 +02:00
Christian Brabandt	c97f4d61cd	patch 9.1.0297: Patch 9.1.0296 causes too many issues Problem: Patch 9.1.0296 causes too many issues (Tony Mechelynck, @chdiza, CI) Solution: Back out the change for now Revert "patch 9.1.0296: regexp: engines do not handle case-folding well" This reverts commit `7a27c108e0` it causes issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It needs more work. fixes: #14487 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-10 16:22:17 +02:00
Christian Brabandt	7a27c108e0	patch 9.1.0296: regexp: engines do not handle case-folding well Problem: Regex engines do not handle case-folding well Solution: Correctly calculate byte length of characters to skip When the regexp engine compares two utf-8 codepoints case insensitively it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' so by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: - we try to match the correct length for the pattern and the text - in case of a match, we step over it correctly The same thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. fixes: #14294 closes: #14433 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-09 22:53:19 +02:00
Christian Brabandt	d2cc51f9a1	patch 9.1.0011: regexp cannot match combining chars in collection Problem: regexp cannot match combining chars in collection Solution: Check for combining characters in regex collections for the NFA and BT Regex Engine Also, while at it, make debug mode work again. fixes #10286 closes: #12871 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-01-04 22:54:08 +01:00
Christian Brabandt	be07caa071	patch 9.0.1777: patch 9.0.1771 causes problems Problem: patch 9.0.1771 causes problems Solution: revert it Revert "patch 9.0.1771: regex: combining chars in collections not handled" This reverts commit `ca22fc36a4`. Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 22:28:28 +02:00
Christian Brabandt	ca22fc36a4	patch 9.0.1771: regex: combining chars in collections not handled Problem: regex: combining chars in collections not handled Solution: Check for following combining characters for NFA and BT engine closes: #10459 closes: #10286 Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 20:38:56 +02:00
Martin Tournoij	25f3a146a0	patch 9.0.0700: there is no real need for a "big" build Problem: There is no real need for a "big" build. Solution: Move common features to "normal" build, less often used features to the "huge" build. (Martin Tournoij, closes #11283)	2022-10-08 19:26:41 +01:00
Bram Moolenaar	db77cb3c08	patch 9.0.0669: too many delete() calls in tests Problem: Too many delete() calls in tests. Solution: Use deferred delete where possible.	2022-10-05 21:45:30 +01:00
Bram Moolenaar	cb36c2a3cd	patch 9.0.0106: illegal byte regexp test doesn't fail when fix is reversed Problem: Illegal byte regexp test doesn't fail when fix is reversed. Solution: Make sure illegal bytes end up in sourced script file.	2022-07-29 18:32:20 +01:00
Bram Moolenaar	f50940531d	patch 9.0.0105: illegal memory access when pattern starts with illegal byte Problem: Illegal memory access when pattern starts with illegal byte. Solution: Do not match a character with an illegal byte.	2022-07-29 16:22:25 +01:00
Bram Moolenaar	2457b2bbc2	patch 8.2.4443: regexp pattern test fails on Mac Problem: Regexp pattern test fails on Mac. Solution: Do not use a swapfile for the buffer.	2022-02-22 16:19:37 +00:00
Bram Moolenaar	6456fae9ba	patch 8.2.4440: crash with specific regexp pattern and string Problem: Crash with specific regexp pattern and string. Solution: Stop at the start of the string.	2022-02-22 13:37:31 +00:00
Bram Moolenaar	424bcae1fb	patch 8.2.4273: the EBCDIC support is outdated Problem: The EBCDIC support is outdated. Solution: Remove the EBCDIC support.	2022-01-31 14:59:41 +00:00
Bram Moolenaar	65b6056659	patch 8.2.3409: reading beyond end of line with invalid utf-8 character Problem: Reading beyond end of line with invalid utf-8 character. Solution: Check for NUL when advancing.	2021-09-07 19:26:53 +02:00
Bram Moolenaar	0b94e297af	patch 8.2.2716: the equivalent class regexp is missing some characters Problem: The equivalent class regexp is missing some characters. Solution: Update the list of equivalent characters. (Dominique Pellé, closes #8029)	2021-04-05 13:59:53 +02:00
Bram Moolenaar	66c50c5653	patch 8.2.2278: falling back to old regexp engine can some patterns Problem: Falling back to old regexp engine can some patterns. Solution: Do not fall back once [[:lower:]] or [[:upper:]] is used. (Christian Brabandt, closes #7572)	2021-01-02 17:43:49 +01:00
Bram Moolenaar	ef2dff52de	patch 8.2.2177: pattern "^" does not match if first character is combining Problem: Pattern "^" does not match if the first character in the line is combining. (Rene Kita) Solution: Do accept a match at the start of the line. (closes #6963)	2020-12-21 14:54:32 +01:00
Bram Moolenaar	8a9bc95eae	patch 8.2.1786: various Normal mode commands not fully tested Problem: Various Normal mode commands not fully tested. Solution: Add more tests. (Yegappan Lakshmanan, closes #7059)	2020-10-02 18:48:07 +02:00
Bram Moolenaar	7d40b8a532	patch 8.2.1295: tests 44 and 99 are old style Problem: Tests 44 and 99 are old style. Solution: Convert to new style tests. (Yegappan Lakshmanan, closes #6536)	2020-07-26 12:52:59 +02:00
Bram Moolenaar	470adb827f	patch 8.2.1254: MS-Windows: regexp test may fail if 'iskeyword' set wrongly Problem: MS-Windows: regexp test may fail if 'iskeyword' set wrongly. Solution: Override the 'iskeyword' value. (Taro Muraoka, closes #6502)	2020-07-20 21:21:30 +02:00
Bram Moolenaar	59de417b90	patch 8.2.0938: NFA regexp uses tolower ()to compare ignore-case Problem: NFA regexp uses tolower() to compare ignore-case. (Thayne McCombs) Solution: Use utf_fold() when possible. (ref. neovim #12456)	2020-06-09 19:34:54 +02:00
Bram Moolenaar	afc13bd827	patch 8.2.0014: test69 and test95 are old style Problem: Test69 and test95 are old style. Solution: Convert to new style tests. (Yegappan Lakshmanan, closes #5365)	2019-12-16 22:43:31 +01:00
Bram Moolenaar	2a5b52758b	patch 8.1.1720: crash with very long %[] pattern Problem: Crash with very long %[] pattern. (Reza Mirzazade farkhani) Solution: Check for reg_toolong. (closes #4703)	2019-07-20 18:56:06 +02:00
Bram Moolenaar	221cd9f4dd	patch 8.1.0862: no verbose version of character classes Problem: No verbose version of character classes. Solution: Add [:ident:], [:keyword:] and [:fname:]. (Ozaki Kiichi, closes #1373)	2019-01-31 15:34:40 +01:00
Bram Moolenaar	30276f2beb	patch 8.1.0811: too many #ifdefs Problem: Too many #ifdefs. Solution: Graduate FEAT_MBYTE, the final chapter.	2019-01-24 17:59:39 +01:00
Bram Moolenaar	966e58e413	patch 8.0.0623: error for invalid regexp is not very informative Problem: The message "Invalid range" is used for multiple errors. Solution: Add two more specific error messages. (Itchyny, Ken Hamada)	2017-06-05 16:54:08 +02:00
Bram Moolenaar	13489b9c41	patch 8.0.0529: line in test commented out Problem: Line in test commented out. Solution: Uncomment the lines for character classes that were failing before 8.0.0519. (Dominique Pelle, closes #1599)	2017-03-30 22:20:29 +02:00
Bram Moolenaar	0c078fc7db	patch 8.0.0519: character classes are not well tested Problem: Character classes are not well tested. They can differ between platforms. Solution: Add tests. In the documentation make clear which classes depend on what library function. Only use :cntrl: and :graph: for ASCII. (Kazunobu Kuriyama, Dominique Pelle, closes #1560) Update the documentation.	2017-03-29 15:31:20 +02:00
Bram Moolenaar	d3c907b5d2	patch 7.4.2223 Problem: Buffer overflow when using latin1 character with feedkeys(). Solution: Check for an illegal character. Add a test.	2016-08-17 21:32:09 +02:00
Bram Moolenaar	6bff02eb53	patch 7.4.2222 Problem: Sourcing a script where a character has 0x80 as a second byte does not work. (Filipe L B Correia) Solution: Turn 0x80 into K_SPECIAL KS_SPECIAL KE_FILLER. (Christian Brabandt, closes #728) Add a test case.	2016-08-16 22:50:55 +02:00
Bram Moolenaar	ac105ed3c4	patch 7.4.2086 Problem: Using the system default encoding makes tests unpredictable. Solution: Always use utf-8 or latin1 in the new style tests. Remove setting encoding and scriptencoding where it is not needed.	2016-07-21 20:33:32 +02:00
Bram Moolenaar	490465bda6	patch 7.4.1785 Problem: Regexp test fails on windows. Solution: set 'isprint' to the right value for testing.	2016-04-24 15:11:02 +02:00
Bram Moolenaar	af98a49dd0	patch 7.4.1783 Problem: The old regexp engine doesn't handle character classes correctly. (Manuel Ortega) Solution: Use regmbc() instead of regc(). Add a test.	2016-04-24 14:40:12 +02:00
Bram Moolenaar	22e421549d	patch 7.4.1700 Problem: Equivalence classes are not properly tested. Solution: Add tests for multi-byte and latin1. Fix an error. (Owen Leibman)	2016-04-03 14:02:02 +02:00

35 Commits