midfavila/awk - awk - SDF GIT Society

Author	SHA1	Message	Date
ozan yigit	3913329120	Merge branch 'fix-readrec' of https://github.com/mpinjr/awk into staging	2021-07-24 15:10:25 -04:00
ozan yigit	30fb6ef0da	Merge branch 'fix-RS' of https://github.com/mpinjr/awk into staging	2021-07-24 15:06:21 -04:00
Miguel Pineiro Jr	92f9e8a9be	Fix readrec's definition of a record I botched readrec's definition of a record, when I implemented RS regular expression support. This is the relevant hunk from the old diff: ``` - return c == EOF && rr == buf ? 0 : 1; + isrec = buf \|\| !feof(inf); + dprintf( ("readrec saw <%s>, returns %d\n", buf, isrec) ); + return isrec; ``` Problem #1 Unlike testing with EOF, `buf \|\| !feof(inf)` is blind to stdio errors. This can cause an infinite loop whose each iteration fabricates an empty record. The following demonstration uses standard terminal access control policy to produce a persistent error condition. Note that the "i/o error" message does not come from readrec(). It's produced much later by closeall() at shutdown. ``` $ trap '' SIGTTIN && awk 'END {print NR}' & [1] 33517 $ # After fg, type ^D $ fg trap '' SIGTTIN && awk 'END {print NR}' 13847376 awk: i/o error occurred on /dev/stdin input record number 13847376, file source line number 1 ``` Each time awk tries to read the terminal from the background, while ignoring SIGTTIN, the read fails with EIO, getc returns EOF, the stream's end-of-file indicator remains clear, and `!feof` erroneously promotoes the empty buffer to an empty record. So long as the error persists, the stream's position does not advance and end-of-file is never set. Problem #2: When RS is a regex, `buf \|\| !feof(inf)` can't see an empty record's terminator at the end of a stream. ``` $ echo a \| awk 1 RS='a\n' $ ``` That pipeline should have found one empty record and printed a blank line, but `buf \|\| !feof(inf)` considers reaching the end of the stream the conclusion of a fruitless search. That's only correct when the terminator is a single character, because a regex RS search can set the end-of-file marker even when it succeeds. The Fix `isrec` must be 0 iff no record is found. The correct definition of "no record" is a failure to find a record terminator and a failure to find any data (possibly from a final, unterminated record). Conceptually, for any RS: ``` isrec = (noTERM && noDATA) ? 0 : 1 ``` noDATA is an expression that's true if `buf` is empty, false otherwise. When RS is null or a single character, noTERM is an expression that is true when the sought after character is not found, false otherwise. Since the search for a single character can only end with that character or EOF, noTERM is `c == EOF`. ``` isrec = (c == EOF && rr == buf) ? 0 : 1 ``` When RS is a regular expression: noTERM is an expression that is true if a match for RS is not found, false otherwise. This is simply the inverse of the result of the function that conducts the search, `!found`. ``` isrec = (found == 0 && *buf == '\0') ? 0 : 1 ```	2021-04-23 20:08:58 -04:00
Miguel Pineiro Jr	feaf62d159	Fix regular expression RS ^-anchoring RS ^-anchoring needs to know if it's reading the first record of a file. Unfortunately, innew, the flag that the main i/o loop uses to track this, didn't make it from NetBSD unscathed. This commit restores the last of the wayward lines. Without this fix, when reading the first record of an input file named on the command line, the regular expression machinery will be misconfigured, precluding a successful match. Relevant commits: 1. `643a5a3dad` (Initial import) 2. `ffee7780fe` (Restoring innew)	2021-04-16 20:31:36 -04:00
Todd C. Miller	d54b703cae	Fix size computation in replace_repeat() for special_case REPEAT_WITH_Q. This resulted in the NUL terminator being written to the end of the buffer which was not the same as the end of the string. That in turn caused garbage bytes from malloc() to be processed. Also change the NUL termination to be less error prone by writing the NUL immediately after the last byte copied. Reproducible with the following under valgrind: echo '#!/usr/bin/awk' \| awk \ '/^#! ?\/.\/[a-z]{0,2}awk/ {sub(/^#! ?\/.\/[a-z]{0,2}awk/,"#! awk"); print}'	2021-03-02 12:58:50 -07:00
Arnold D. Robbins	c0f4e97e45	Fix compiling with g++.	2021-02-15 20:33:15 +02:00
ozan s. yigit	178f660b5a	Change T.errmsg print to file fail test. We cannot have a test that destroys eg. /etc/passwd if someone runs it as root.	2021-01-10 15:24:37 -05:00
ozan s. yigit	1fd5fa38cc	Fix a decision bug with trailing stuff in lib.c:is_valid_number after dec 18 changes. updated FIXES, adjusted version date.	2021-01-06 18:37:48 -05:00
ozan s. yigit	7d1848cfa6	Merge branch 'staging' for README.md	2020-12-25 16:55:02 -05:00
ozan s. yigit	fdc0388333	updated: new maintainer	2020-12-25 16:53:55 -05:00
Arnold D. Robbins	8909e00b57	Inf and NaN values fixed and printing improved. "This time for sure!"	2020-12-18 11:57:48 +02:00
Arnold D. Robbins	982a574e32	Update FIXES and version.	2020-12-15 14:49:18 +02:00
Michael Forney	38e525fb7b	Include <strings.h> for strcasecmp (#99 ) Though some implementations include this header indirectly through string.h by default, the POSIX header that declares strcasecmp is strings.h[0]. [0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/strcasecmp.html	2020-12-15 14:46:30 +02:00
Arnold D. Robbins	6535bd6c35	Update FIXES and version in main.c.	2020-12-08 09:20:58 +02:00
Arnold Robbins	cc9e9b68d1	Rework floating point conversions. (#98 )	2020-12-08 08:05:22 +02:00
Arnold D. Robbins	e508d2861c	Update version and FIXES.	2020-12-03 19:33:11 +02:00
Todd C. Miller	feb247a852	Don't print extra newlines on error before awk starts parsing. (#97 ) If awk prints an error message while when compile_time is still set to ERROR_PRINTING, don't try to print the context since there is none. This can happen due to a problem with, e.g., unknown command line options.	2020-12-03 19:30:36 +02:00
Arnold D. Robbins	a2a41a8e35	Add .TF macro to man page. Closes Issue #96 .	2020-11-24 19:14:26 +02:00
Arnold D. Robbins	3b42cfaf73	Make it compile with g++.	2020-10-13 20:52:43 +03:00
Arnold D. Robbins	9804285af0	Additional fixes for DJGPP.	2020-08-16 18:48:05 +03:00
Arnold D. Robbins	9c63cb6ccd	Update FIXES and version in main.c.	2020-08-07 13:15:17 +03:00
Chris	b785141019	printf: The argument p shall be a pointer to void. (#93 )	2020-08-07 13:10:20 +03:00
Arnold D. Robbins	1b3984634f	Fix Issue #92 ; see FIXES.	2020-08-04 10:02:26 +03:00
Arnold D. Robbins	9b80a7c137	Update version and FIXES.	2020-07-30 17:15:58 +03:00
Arnold D. Robbins	07f0438423	Move exclusively to bison as parser generator.	2020-07-30 17:12:45 +03:00
Todd C. Miller	453ce8642b	Avoid accessing pfile[] out of bounds on syntax error at EOF. (#90 ) When awk reaches EOF parsing the program file, curpfile is incremented. However, cursource() uses curpfile without checking it against npfile which can cause an out of bounds access of pfile[] if there is a syntax error at the end of the program file.	2020-07-29 21:31:29 +03:00
Tim van der Molen	e22bb7c625	Fix the T.errmsg test (#91 ) Co-authored-by: Tim van der Molen <tim@kariliq.nl>	2020-07-29 21:29:46 +03:00
Todd C. Miller	22ee26b925	Cast to uschar when storing a char in an int that will be used as an index (#88 ) * Cast to uschar when storing a char in an int that will be used as an index. Fixes a heap underflow when the input char has the high bit set and FS is a regex. * Add regress test for underflow when RS is a regex and input is 8-bit.	2020-07-29 21:27:45 +03:00
Todd C. Miller	b82b649aa6	Avoid using stdio streams after they have been closed. (#89 ) * In closeall(), skip stdin and flush std{err,out} instead of closing. Otherwise awk could fclose(stdin) twice (it may appear more than once) and closing stderr means awk cannot report errors with other streams. For example, "awk 'BEGIN { getline < "-" }' < /dev/null" will call fclose(stdin) twice, with undefined results. * If closefile() is called on std{in,out,err}, freopen() /dev/null instead. Otherwise, awk will continue trying to perform I/O on a closed stdio stream, the behavior of which is undefined.	2020-07-27 10:03:58 +03:00
Arnold D. Robbins	2a4146ec30	Add a note about low-level maintenance.	2020-07-02 21:39:56 +03:00
Arnold D. Robbins	b2554a9e3d	Add regression script for bugs-fixed directory.	2020-07-02 21:35:06 +03:00
Tim van der Molen	ee5b49bb33	Fix regression with changed SUBSEP in subscript (#86 ) Commit `0d8778bbbb` reintroduced a regression that was fixed in commit `97a4b7ed21`. The length of SUBSEP needs to be rechecked after calling execute(), in case SUBSEP itself has been changed. Co-authored-by: Tim van der Molen <tim@kariliq.nl>	2020-07-02 21:22:15 +03:00
Tim van der Molen	cc19af1308	Fix concatenation regression (#85 ) The optimization in commit `1d6ddfd9c0` reintroduced the regression that was fixed in commit `e26237434f`. Co-authored-by: Tim van der Molen <tim@kariliq.nl>	2020-07-02 21:21:10 +03:00
Arnold D. Robbins	f232de85f6	Update FIXES and date in main.c.	2020-06-25 21:36:24 +03:00
Arnold D. Robbins	0f25df0619	Merge branch 'staging'	2020-06-25 21:34:50 +03:00
awkfan77	e5a89e63fe	Fix onetrueawk#83 (#84 )	2020-06-25 21:33:52 +03:00
Todd C. Miller	292d39f7b7	Rename dprintf to DPRINTF and use C99 cpp variadic arguments. (#82 ) POSIX specifies a dprintf function that operates on an fd instead of a stdio stream. Using upper case for macros is more idiomatic too. We no longer need to use an extra set of parentheses for debugging printf statements.	2020-06-25 21:32:34 +03:00
Arnold D. Robbins	cef5180110	Fix Issue 78 and apply PR 80.	2020-06-12 14:30:03 +03:00
Todd C. Miller	b2de1c4ee7	Clear errno before using errcheck() to avoid spurious errors. (#80 ) The errcheck() function treats an errno of ERANGE or EDOM as something to report, so make sure errno is set to zero before invoking a function to check so that a previous such errno value won't result in a false positive. This could happen simply due to input line fields that looked enough like floating-point input to trigger ERANGE. Reported by Jordan Geoghegan, fix from Philip Guenther.	2020-06-12 14:16:12 +03:00
Arnold D. Robbins	754cf93645	In fldbld(), check that inputFS is set.	2020-06-05 12:25:15 +03:00
Arnold D. Robbins	1107437dce	Fix test for use of noreturn.	2020-05-15 15:12:15 +03:00
Arnold D. Robbins	93e5dd87a1	Fix noreturn for old compilers.	2020-04-16 20:56:49 +03:00
Arnold D. Robbins	c3d8f9c500	Update FIXES and version date.	2020-04-05 21:14:46 +03:00
awkfan77	bb538fe67e	Replace __attribute__((__noreturn__)) with _Noreturn. (#77 ) * Replace __attribute__((__noreturn__)) with _Noreturn. * Change _Noreturn to noreturn and #include <stdnoreturn.h>	2020-04-05 21:10:52 +03:00
Arnold D. Robbins	2017c2e6ea	Fixes from Christo Zoulas.	2020-02-28 13:47:42 +02:00
Arnold D. Robbins	92b775b3ec	Merge branch 'master' into staging	2020-02-28 13:24:19 +02:00
zoulasc	ffee7780fe	3 more fixes (#75 ) * LC_NUMERIC radix issue. According to https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html The period character is the character recognized in processing awk programs. Make it so that during output we also print the period character, since this is what other awk implementations do, and it makes sense from an interoperability point of view. * print "T.builtin" in the error message * Fix backslash continuation line handling. * Keep track of RS processing so we apply the regex properly only once per record.	2020-02-28 13:23:54 +02:00
enh-google	7b245a0266	Fix hwasan global overflow. (#76 ) * Fix hwasan global overflow. Crash found with https://source.android.com/devices/tech/debug/hwasan but also detectable by regular ASan. Here's an ASan crash: ==215690==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55d90f8da140 at pc 0x55d90f8b7503 bp 0x7ffd3dae6100 sp 0x7ffd3dae60f8 READ of size 4 at 0x55d90f8da140 thread T0 #0 0x55d90f8b7502 in word /tmp/awk/lex.c:496 #1 0x55d90f8b939f in yylex /tmp/awk/lex.c:191 #2 0x55d90f894ab9 in yyparse /tmp/awk/awkgram.tab.c:2366 #3 0x55d90f89edc2 in main /tmp/awk/main.c:216 #4 0x7ff263a78bba in __libc_start_main ../csu/libc-start.c:308 #5 0x55d90f8945a9 in _start (/tmp/awk/a.out+0x115a9) 0x55d90f8da141 is located 0 bytes to the right of global variable 'infunc' defined in 'awkgram.y:35:6' (0x55d90f8da140) of size 1 SUMMARY: AddressSanitizer: global-buffer-overflow /tmp/awk/lex.c:496 in word Shadow bytes around the buggy address: 0x0abba1f133d0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f133e0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f133f0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0abba1f13400: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00 0x0abba1f13410: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 =>0x0abba1f13420: 04 f9 f9 f9 f9 f9 f9 f9[01]f9 f9 f9 f9 f9 f9 f9 0x0abba1f13430: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 0x0abba1f13440: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13450: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13460: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 0x0abba1f13470: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 And here's the stack trace from hwasan: Stack Trace: RELADDR FUNCTION FILE:LINE 00000000000168d4 word external/one-true-awk/lex.c:496:18 000000000002d1ec yyparse y.tab.c:2460:16 000000000001c82c main external/one-true-awk/main.c:179:2 00000000000b41a0 __libc_init bionic/libc/bionic/libc_init_dynamic.cpp:151:8 As it says, we're doing a 4-byte read from a 1-byte global. `infunc` is declared as an int but defined as a bool. Signed-off-by: Evgenii Stepanov <eugenis@google.com> * Add ASan cflags to makefile. They're not used by default, but this way they're easily to hand next time they're wanted.	2020-02-28 13:18:29 +02:00
Arnold D. Robbins	91eaf7f701	Small fix to the man page.	2020-02-20 19:53:39 +02:00
Arnold D. Robbins	e92c8e4d0e	Update FIXES, version.	2020-02-19 20:47:40 +02:00

1 2 3 4 5

202 Commits