0
0
mirror of https://github.com/vim/vim.git synced 2025-08-25 19:53:53 -04:00

260 Commits

Author SHA1 Message Date
Áron Hárnási
c43a0614d4
patch 9.1.1611: possible undefined behaviour in mb_decompose()
Problem:  possible undefined behaviour in mb_decompose(), when using the
          same pointer as argument several times
Solution: use separate assignments to avoid reading and writing the same
          object at the same time (Áron Hárnási)

closes: #17953

Signed-off-by: Áron Hárnási <aron.harnasi@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2025-08-09 23:43:13 +02:00
Yegappan Lakshmanan
e89aef3f65
patch 9.1.1390: style: more wrong indentation
Problem:  style: more wrong indentation
Solution: reformat a few more places
          (Yegappan Lakshmanan)

closes: #17309

Signed-off-by: Yegappan Lakshmanan <yegappan@yahoo.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2025-05-14 20:31:55 +02:00
Christian Brabandt
f2b16986a1
patch 9.1.1258: regexp: max \U and \%U value is limited by INT_MAX
Problem:  regexp: max \U and \%U value is limited by INT_MAX but gives a
          confusing error message (related: v8.1.0985).
Solution: give a better error message when the value reaches INT_MAX

When searching Vim allows to get up to 8 hex characters using the /\V
and /\%V regex atoms.  However, when using "/\UFFFFFFFF" the code point is
already above what an integer variable can hold, which is 2,147,483,647.

Since patch v8.1.0985, Vim already limited the max codepoint to INT_MAX
(otherwise it caused a crash in the nfa regex engine), but instead of
error'ing out it silently fell back to parse the number as a backslash
value and not as a codepoint value and as such this "/[\UFFFFFFFF]" will
happily find a "\" or an literal "F".  And this "/[\d127-\UFFFFFFFF]"
will error out as "reverse range in character class).

Interestingly, the max Unicode codepoint value is U+10FFFF which still
fits into an ordinary integer value,  which means, that we don't even
need to parse 8 hex characters, but 6 should have been enough.

However, let's not limit Vim to search for only max 6 hex characters
(which would be a backward incompatible change), but instead allow all 8
characters and only if the codepoint reaches INT_MAX, give a more
precise error message (about what the max unicode codepoint value is).
This allows to search for "[\U7FFFFFFE]" (will likely return "E486
Pattern not found") and "[/\U7FFFFFF]" now errors "E1517: Value too
large, max Unicode codepoint is U+10FFFF".

While this change is straight forward on architectures where long is 8
bytes, this is not so simple on Windows or 32bit architectures where long
is 4 bytes (and therefore the test fails there).  To account for that,
let's make use of the vimlong_T number type and make a few corresponding
changes in the regex engine code and cast the value to the expected data
type. This however may not work correctly on systems that doesn't have
the long long datatype (e.g. OpenVMS) and probably the test will fail
there.

fixes: #16949
closes: #16994

Signed-off-by: Christian Brabandt <cb@256bit.org>
2025-03-29 09:08:58 +01:00
John Marriott
8d4477ef22
patch 9.1.0828: string_T struct could be used more often
Problem:  string_T struct could be used more often
Solution: Refactor code and make use of string_T struct
          for key-value pairs, reformat overlong lines
          (John Marriott)

closes: #15975

Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-11-02 16:11:58 +01:00
zeertzjq
e8feaa354e
patch 9.1.0650: Coverity warning in cstrncmp()
Problem:  Coverity warning in cstrncmp()
          (after v9.1.0645)
Solution: Change the type of n2 to int.
          (zeertzjq)

________________________________________________________________________________________________________
*** CID 1615684:  Integer handling issues  (INTEGER_OVERFLOW)
/src/regexp.c: 1757 in cstrncmp()
1751                 n1 -= mb_ptr2len(s1);
1752                 MB_PTR_ADV(p);
1753                 n2++;
1754             }
1755             // count the number of bytes to advance the same number of chars for s2
1756             p = s2;
>>>     CID 1615684:  Integer handling issues  (INTEGER_OVERFLOW)
>>>     Expression "n2--", which is equal to 18446744073709551615, where "n2" is known to be equal to 0, underflows the type that receives it, an unsigned integer 64 bits wide.
1757             while (n2-- > 0 && *p != NUL)
1758                 MB_PTR_ADV(p);
1759
1760             n2 = p - s2;
1761
1762             result = MB_STRNICMP2(s1, s2, *n, n2);

closes: #15409

Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-08-01 22:48:53 +02:00
Christian Brabandt
22e8e12d9f
patch 9.1.0645: regex: wrong match when searching multi-byte char case-insensitive
Problem:  regex: wrong match when searching multi-byte char
          case-insensitive (diffsetter)
Solution: Apply proper case-folding for characters and search-string

This patch does the following 4 things:

1) When the regexp engine compares two utf-8 codepoints case
   insensitive it may match an adjacent character, because it assumes
   it can step over as many bytes as the pattern contains.

   This however is not necessarily true because of case-folding, a
   multi-byte UTF-8 character can be considered equal to some
   single-byte value.

   Let's consider the pattern 'ſ' and the string 's'. When comparing and
   ignoring case, the single character 's' matches, and since it matches
   Vim will try to step over the match (by the amount of bytes of the
   pattern), assuming that since it matches, the length of both strings is
   the same.

   However in that case, it should only step over the single byte value
   's' by 1 byte and try to start matching after it again. So for the
   backtracking engine we need to ensure:
   * we try to match the correct length for the pattern and the text
   * in case of a match, we step over it correctly

   There is one tricky thing for the backtracing engine. We also need to
   calculate correctly the number of bytes to compare the 2 different
   utf-8 strings s1 and s2. So we will count the number of characters in
   s1 that the byte len specified. Then we count the number of bytes to
   step over the same number of characters in string s2 and then we can
   correctly compare the 2 utf-8 strings.

2) A similar thing can happen for the NFA engine, when skipping to the
   next character to test for a match. We are skipping over the regstart
   pointer, however we do not consider the case that because of
   case-folding we may need to adjust the number of bytes to skip over.
   So this needs to be adjusted in find_match_text() as well.

3) A related issue turned out, when prog->match_text is actually empty.
   In that case we should try to find the next match and skip this
   condition.

4) When comparing characters using collections, we must also apply case
   folding to each character in the collection and not just to the
   current character from the search string.  This doesn't apply to the
   NFA engine, because internally it converts collections to branches
   [abc] -> a\|b\|c

fixes: #14294
closes: #14756

Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-07-30 20:39:18 +02:00
zeertzjq
3074137542
patch 9.1.0438: Wrong Ex command executed when :g uses '?' as delimiter
Problem:  Wrong Ex command executed when :g uses '?' as delimiter and
          pattern contains escaped '?'.
Solution: Don't use "*newp" when it's not allocated (zeertzjq).

closes: #14837

Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-05-24 07:37:36 +02:00
zeertzjq
789679cfc4
patch 9.1.0436: Crash when using '?' as separator for :s
Problem:  Crash when using '?' as separator for :s and pattern contains
          escaped '?'s (after 9.1.0409).
Solution: Always compute startplen. (zeertzjq).

related: neovim/neovim#28935
closes: 14832

Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-05-23 17:41:26 +02:00
John Marriott
d01e699348
patch 9.1.0410: warning about uninitialized variable
Problem:  warning about uninitialized variable
          (Tony Mechelynck, after 9.1.0409)
Solution: initialize variable (John Marriott)

closes: #14754

Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-05-12 09:01:38 +02:00
John Marriott
82792db631
patch 9.1.0409: too many strlen() calls in the regexp engine
Problem:  too many strlen() calls in the regexp engine
Solution: refactor code to retrieve strlen differently, make use
          of bsearch() for getting the character class
          (John Marriott)

closes: #14648

Signed-off-by: John Marriott <basilisk@internode.on.net>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-05-12 00:07:17 +02:00
Christian Brabandt
c97f4d61cd
patch 9.1.0297: Patch 9.1.0296 causes too many issues
Problem:  Patch 9.1.0296 causes too many issues
          (Tony Mechelynck, @chdiza, CI)
Solution: Back out the change for now

Revert "patch 9.1.0296: regexp: engines do not handle case-folding well"

This reverts commit 7a27c108e0509f3255ebdcb6558e896c223e4d23 it causes
issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It
needs more work.

fixes: #14487

Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-04-10 16:22:17 +02:00
Christian Brabandt
7a27c108e0
patch 9.1.0296: regexp: engines do not handle case-folding well
Problem:  Regex engines do not handle case-folding well
Solution: Correctly calculate byte length of characters to skip

When the regexp engine compares two utf-8 codepoints case insensitively
it may match an adjacent character, because it assumes it can step over
as many bytes as the pattern contains.

This however is not necessarily true because of case-folding, a
multi-byte UTF-8 character can be considered equal to some single-byte
value.

Let's consider the pattern 'ſ' and the string 's'. When comparing and
ignoring case, the single character 's' matches, and since it matches
Vim will try to step over the match (by the amount of bytes of the
pattern), assuming that since it matches, the length of both strings is
the same.

However in that case, it should only step over the single byte
value 's' so by 1 byte and try to start matching after it again. So for the
backtracking engine we need to ensure:
- we try to match the correct length for the pattern and the text
- in case of a match, we step over it correctly

The same thing can happen for the NFA engine, when skipping to the next
character to test for a match. We are skipping over the regstart
pointer, however we do not consider the case that because of
case-folding we may need to adjust the number of bytes to skip over. So
this needs to be adjusted in find_match_text() as well.

A related issue turned out, when prog->match_text is actually empty. In
that case we should try to find the next match and skip this condition.

fixes: #14294
closes: #14433

Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-04-09 22:53:19 +02:00
zeertzjq
e71022082d
patch 9.1.0105: Style: typos found
Problem:  Style: typos found
Solution: correct them
          (zeertzjq)

closes: #14023

Signed-off-by: zeertzjq <zeertzjq@outlook.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-02-13 20:32:04 +01:00
Christian Brabandt
7c71db3a58
patch 9.1.0043: ml_get: invalid lnum when :s replaces visual selection
Problem:  ml_get: invalid lnum when :s replaces visual selection
          (@ropery)
Solution: substitute may decrement the number of lines in a buffer,
          so validate, that the bottom lines of the visual selection
          stays within the max buffer line

fixes: #13890
closes: #13892

Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-01-22 20:12:34 +01:00
Christian Brabandt
d2cc51f9a1
patch 9.1.0011: regexp cannot match combining chars in collection
Problem:  regexp cannot match combining chars in collection
Solution: Check for combining characters in regex collections for the
          NFA and BT Regex Engine

Also, while at it, make debug mode work again.

fixes #10286
closes: #12871

Signed-off-by: Christian Brabandt <cb@256bit.org>
2024-01-04 22:54:08 +01:00
Yee Cheng Chin
d25021cf03
patch 9.0.1908: undefined behaviour upper/lower function ptrs
Problem:  undefined behaviour upper/lower function ptrs
Solution: Fix UBSAN error in regexp and simplify upper/lowercase
          modifier code

The implementation of \u / \U / \l / \L modifiers in the substitute
command relies on remembering the state by setting function pointers on
func_all/func_one in the code. The code signature of `fptr_T` is
supposed to return void* (due to C function signatures not being able to
return itself due to type recursion), and the definition of the
functions (e.g. to_Upper) didn't follow this rule, and so the code tries
to cast functions of different signatures, resulting in undefined
behavior error under UBSAN in Clang 17. See #12745.

We could just fix `do_Upper`/etc to just return void*, which would fix
the problem. However, these functions actually do not need to return
anything at all. It used to be the case that there was only one pointer
"func" to store the pointer, which is why the function needs to either
return itself or NULL to indicate whether it's a one time or ongoing
modification. However, c2c355df6f094cdb9e599fd395a78c14486ec697
(7.3.873) already made that obsolete by introducing `func_one` and
`func_all` to store one-time and ongoing operations separately, so these
functions don't actually need to return anything anymore because it's
implicit whether it's a one-time or ongoing operation. Simplify the code
to reflect that.

closes: #13117

Signed-off-by: Christian Brabandt <cb@256bit.org>
Co-authored-by: Yee Cheng Chin <ychin.git@gmail.com>
2023-09-18 19:51:56 +02:00
Christian Brabandt
15cbaae313
patch 9.0.1853: CI error on different signedness in regexp.c
Problem:  CI error on different signedness in regexp.c
          (after patch 9.0.1848)
Solution: Cast strlen() call to int

Signed-off-by: Christian Brabandt <cb@256bit.org>
2023-09-02 22:08:43 +02:00
Christian Brabandt
ced2c7394a
patch 9.0.1848: [security] buffer-overflow in vim_regsub_both()
Problem:  buffer-overflow in vim_regsub_both()
Solution: Check remaining space

Signed-off-by: Christian Brabandt <cb@256bit.org>
2023-09-02 21:37:04 +02:00
RestorerZ
68ebcee023 patch 9.0.1594: some internal error messages are translated
Problem:    Some internal error messages are translated.
Solution:   Consistently do not translate internal error messages.
            (closes #12459)
2023-05-31 17:12:14 +01:00
Bram Moolenaar
ab9a2d884b patch 9.0.1532: crash when expanding "~" in substitute causes very long text
Problem:    Crash when expanding "~" in substitute causes very long text.
Solution:   Limit the text length to MAXCOL.
2023-05-09 21:15:30 +01:00
Dominique Pelle
e764d1b421 patch 9.0.1403: unused variables and functions
Problem:    Unused variables and functions.
Solution:   Delete items and adjust #ifdefs. (Dominique Pellé, closes #12145)
2023-03-12 21:20:59 +00:00
Yegappan Lakshmanan
f97a295cca patch 9.0.1221: code is indented more than necessary
Problem:    Code is indented more than necessary.
Solution:   Use an early return where it makes sense. (Yegappan Lakshmanan,
            closes #11833)
2023-01-18 18:17:48 +00:00
Bram Moolenaar
01105b37a1 patch 9.0.0951: trying every character position for a match is inefficient
Problem:    Trying every character position for a match is inefficient.
Solution:   Use the start position of the match ignoring "\zs".
2022-11-26 11:47:10 +00:00
Bram Moolenaar
c96311b5be patch 9.0.0950: the pattern "\_s\zs" matches at EOL
Problem:    The pattern "\_s\zs" matches at EOL.
Solution:   Make the pattern "\_s\zs" match at the start of the next line.
            (closes #11617)
2022-11-25 21:13:47 +00:00
dundargoc
c57b5bcd22 patch 9.0.0828: various typos
Problem:    Various typos.
Solution:   Correct typos. (closes #11432)
2022-11-02 13:30:51 +00:00
Bram Moolenaar
a4e0b9785e patch 9.0.0634: evaluating "expr" options has more overhead than needed
Problem:    Evaluating "expr" options has more overhead than needed.
Solution:   Use call_simple_func() for 'foldtext', 'includeexpr', 'printexpr',
            "expr" of 'spellsuggest', 'diffexpr', 'patchexpr', 'balloonexpr',
            'formatexpr', 'indentexpr' and 'charconvert'.
2022-10-01 19:43:52 +01:00
Bram Moolenaar
9781d9c005 patch 9.0.0513: may not be able to use a pattern ad the debug prompt
Problem:    May not be able to use a pattern ad the debug prompt.
Solution:   Temporarily disable the timeout. (closes #11164)
2022-09-20 13:51:25 +01:00
zeertzjq
abd58d8aee patch 9.0.0480: cannot use a :def varargs function with substitute()
Problem:    Cannot use a :def varargs function with substitute().
Solution:   Use has_varargs(). (closes #11146)
2022-09-16 16:06:32 +01:00
zeertzjq
48db5dafec patch 9.0.0476: varargs does not work for replacement function of substitute()
Problem:    Varargs does not work for replacement function of substitute().
Solution:   Check the varargs flag of the function. (closes #11142)
2022-09-16 12:10:03 +01:00
Bram Moolenaar
0f61838636 patch 9.0.0282: a nested timout stops the previous timeout
Problem:    A nested timout stops the previous timeout.
Solution:   Ignore any nested timeout.
2022-08-26 21:33:04 +01:00
Bram Moolenaar
f50940531d patch 9.0.0105: illegal memory access when pattern starts with illegal byte
Problem:    Illegal memory access when pattern starts with illegal byte.
Solution:   Do not match a character with an illegal byte.
2022-07-29 16:22:25 +01:00
Bram Moolenaar
7f9969c559 patch 9.0.0067: cannot show virtual text
Problem:    Cannot show virtual text.
Solution:   Initial changes for virtual text support, using text properties.
2022-07-25 18:13:54 +01:00
Bram Moolenaar
32acf1f1a7 patch 9.0.0047: using freed memory with recursive substitute
Problem:    Using freed memory with recursive substitute.
Solution:   Always make a copy for reg_prev_sub.
2022-07-07 22:20:31 +01:00
Bram Moolenaar
abd56da30b patch 8.2.5154: still mentioning version8, some cosmetic issues
Problem:    Still mentioning version8, some cosmetic issues.
Solution:   Prefer mentioning version9, cosmetic improvements.
2022-06-23 20:46:27 +01:00
Bram Moolenaar
44ddf19ec0 patch 8.2.5146: memory leak when substitute expression nests
Problem:    Memory leak when substitute expression nests.
Solution:   Use an array of expression results.
2022-06-21 22:15:25 +01:00
Bram Moolenaar
155f2d1451 patch 8.2.5141: using "volatile int" in a signal handler might be wrong
Problem:    Using "volatile int" in a signal handler might be wrong.
Solution:   Use "volatile sig_atomic_t".
2022-06-20 13:38:33 +01:00
Bram Moolenaar
1f30caff8b patch 8.2.5129: timeout handling is not optimal
Problem:    Timeout handling is not optimal.
Solution:   Avoid setting timeout_flag twice.  Adjust the pointer when
            stopping the regexp timeout.  Adjust variable name.
2022-06-19 14:36:35 +01:00
Bram Moolenaar
616592e081 patch 8.2.5115: search timeout is overrun with some patterns
Problem:    Search timeout is overrun with some patterns.
Solution:   Check for timeout in more places.  Make the flag volatile and
            atomic.  Use assert_inrange() to see what happened.
2022-06-17 15:17:10 +01:00
Paul Ollis
6574577cac patch 8.2.5057: using gettimeofday() for timeout is very inefficient
Problem:    Using gettimeofday() for timeout is very inefficient.
Solution:   Set a platform dependent timer. (Paul Ollis, closes #10505)
2022-06-05 16:55:54 +01:00
Bram Moolenaar
4aaf3e7f4d patch 8.2.5046: vim_regsub() can overwrite the destination
Problem:    vim_regsub() can overwrite the destination.
Solution:   Pass the destination length, give an error when it doesn't fit.
2022-05-30 20:58:55 +01:00
LemonBoy
f3b4895f27 patch 8.2.4870: Vim9: expression in :substitute is not compiled
Problem:    Vim9: expression in :substitute is not compiled.
Solution:   Use an INSTR instruction if possible. (closes #10334)
2022-05-05 13:53:03 +01:00
Bram Moolenaar
56dba60216 patch 8.2.4810: missing changes in one file
Problem:    Missing changes in one file.
Solution:   Also change the struct initializers.
2022-04-23 11:03:58 +01:00
Bram Moolenaar
e8a4c0d91f patch 8.2.4687: "vimgrep /\%v/ *" may cause a crash
Problem:    "vimgrep /\%v/ *" may cause a crash.
Solution:   When compiling the pattern with the old engine fails, restore the
            regprog of the new engine instead of leaving it NULL.
            (closes #10079)
2022-04-04 18:14:34 +01:00
kylo252
ae6f1d8b14 patch 8.2.4402: missing parenthesis may cause unexpected problems
Problem:    Missing parenthesis may cause unexpected problems.
Solution:   Add more parenthesis is macros. (closes #9788)
2022-02-16 19:24:07 +00:00
Bram Moolenaar
424bcae1fb patch 8.2.4273: the EBCDIC support is outdated
Problem:    The EBCDIC support is outdated.
Solution:   Remove the EBCDIC support.
2022-01-31 14:59:41 +00:00
Bram Moolenaar
44a4d947bb patch 8.2.4262: some search tests fail
Problem:    Some search tests fail.
Solution:   Use a better way to reject searching for the Visual area.
2022-01-30 17:17:41 +00:00
Bram Moolenaar
679d66c2d2 patch 8.2.4261: accessing invalid memory in a regular expression
Problem:    Accessing invalid memory when a regular expression checks the
            Visual area while matching in a string.
Solution:   Do not try matching the Visual area in a string.
2022-01-30 16:42:56 +00:00
Bram Moolenaar
d82a47dd04 patch 8.2.4012: error messages are spread out
Problem:    Error messages are spread out.
Solution:   Move the last error messages to errors.h.
2022-01-05 20:24:39 +00:00
Bram Moolenaar
9d00e4a814 patch 8.2.4010: error messages are spread out
Problem:    Error messages are spread out.
Solution:   Move more error messages to errors.h.
2022-01-05 17:49:15 +00:00
Bram Moolenaar
677658ae49 patch 8.2.4008: error messages are spread out
Problem:    Error messages are spread out.
Solution:   Move more error messages to errors.h.
2022-01-05 16:09:06 +00:00