midfavila/awk - awk - SDF GIT Society

Author	SHA1	Message	Date
ozan yigit	aa8731ea81	PR #112 , #116 , #117	2021-07-25 14:37:03 -04:00
ozan yigit	3913329120	Merge branch 'fix-readrec' of https://github.com/mpinjr/awk into staging	2021-07-24 15:10:25 -04:00
Miguel Pineiro Jr	92f9e8a9be	Fix readrec's definition of a record I botched readrec's definition of a record, when I implemented RS regular expression support. This is the relevant hunk from the old diff: ``` - return c == EOF && rr == buf ? 0 : 1; + isrec = buf \|\| !feof(inf); + dprintf( ("readrec saw <%s>, returns %d\n", buf, isrec) ); + return isrec; ``` Problem #1 Unlike testing with EOF, `buf \|\| !feof(inf)` is blind to stdio errors. This can cause an infinite loop whose each iteration fabricates an empty record. The following demonstration uses standard terminal access control policy to produce a persistent error condition. Note that the "i/o error" message does not come from readrec(). It's produced much later by closeall() at shutdown. ``` $ trap '' SIGTTIN && awk 'END {print NR}' & [1] 33517 $ # After fg, type ^D $ fg trap '' SIGTTIN && awk 'END {print NR}' 13847376 awk: i/o error occurred on /dev/stdin input record number 13847376, file source line number 1 ``` Each time awk tries to read the terminal from the background, while ignoring SIGTTIN, the read fails with EIO, getc returns EOF, the stream's end-of-file indicator remains clear, and `!feof` erroneously promotoes the empty buffer to an empty record. So long as the error persists, the stream's position does not advance and end-of-file is never set. Problem #2: When RS is a regex, `buf \|\| !feof(inf)` can't see an empty record's terminator at the end of a stream. ``` $ echo a \| awk 1 RS='a\n' $ ``` That pipeline should have found one empty record and printed a blank line, but `buf \|\| !feof(inf)` considers reaching the end of the stream the conclusion of a fruitless search. That's only correct when the terminator is a single character, because a regex RS search can set the end-of-file marker even when it succeeds. The Fix `isrec` must be 0 iff no record is found. The correct definition of "no record" is a failure to find a record terminator and a failure to find any data (possibly from a final, unterminated record). Conceptually, for any RS: ``` isrec = (noTERM && noDATA) ? 0 : 1 ``` noDATA is an expression that's true if `buf` is empty, false otherwise. When RS is null or a single character, noTERM is an expression that is true when the sought after character is not found, false otherwise. Since the search for a single character can only end with that character or EOF, noTERM is `c == EOF`. ``` isrec = (c == EOF && rr == buf) ? 0 : 1 ``` When RS is a regular expression: noTERM is an expression that is true if a match for RS is not found, false otherwise. This is simply the inverse of the result of the function that conducts the search, `!found`. ``` isrec = (found == 0 && *buf == '\0') ? 0 : 1 ```	2021-04-23 20:08:58 -04:00
Miguel Pineiro Jr	feaf62d159	Fix regular expression RS ^-anchoring RS ^-anchoring needs to know if it's reading the first record of a file. Unfortunately, innew, the flag that the main i/o loop uses to track this, didn't make it from NetBSD unscathed. This commit restores the last of the wayward lines. Without this fix, when reading the first record of an input file named on the command line, the regular expression machinery will be misconfigured, precluding a successful match. Relevant commits: 1. `643a5a3dad` (Initial import) 2. `ffee7780fe` (Restoring innew)	2021-04-16 20:31:36 -04:00
ozan s. yigit	1fd5fa38cc	Fix a decision bug with trailing stuff in lib.c:is_valid_number after dec 18 changes. updated FIXES, adjusted version date.	2021-01-06 18:37:48 -05:00
Arnold D. Robbins	8909e00b57	Inf and NaN values fixed and printing improved. "This time for sure!"	2020-12-18 11:57:48 +02:00
Michael Forney	38e525fb7b	Include <strings.h> for strcasecmp (#99 ) Though some implementations include this header indirectly through string.h by default, the POSIX header that declares strcasecmp is strings.h[0]. [0] https://pubs.opengroup.org/onlinepubs/9699919799/functions/strcasecmp.html	2020-12-15 14:46:30 +02:00
Arnold Robbins	cc9e9b68d1	Rework floating point conversions. (#98 )	2020-12-08 08:05:22 +02:00
Todd C. Miller	feb247a852	Don't print extra newlines on error before awk starts parsing. (#97 ) If awk prints an error message while when compile_time is still set to ERROR_PRINTING, don't try to print the context since there is none. This can happen due to a problem with, e.g., unknown command line options.	2020-12-03 19:30:36 +02:00
Arnold D. Robbins	3b42cfaf73	Make it compile with g++.	2020-10-13 20:52:43 +03:00
Arnold D. Robbins	07f0438423	Move exclusively to bison as parser generator.	2020-07-30 17:12:45 +03:00
Todd C. Miller	292d39f7b7	Rename dprintf to DPRINTF and use C99 cpp variadic arguments. (#82 ) POSIX specifies a dprintf function that operates on an fd instead of a stdio stream. Using upper case for macros is more idiomatic too. We no longer need to use an extra set of parentheses for debugging printf statements.	2020-06-25 21:32:34 +03:00
Arnold D. Robbins	cef5180110	Fix Issue 78 and apply PR 80.	2020-06-12 14:30:03 +03:00
Arnold D. Robbins	754cf93645	In fldbld(), check that inputFS is set.	2020-06-05 12:25:15 +03:00
zoulasc	ffee7780fe	3 more fixes (#75 ) * LC_NUMERIC radix issue. According to https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html The period character is the character recognized in processing awk programs. Make it so that during output we also print the period character, since this is what other awk implementations do, and it makes sense from an interoperability point of view. * print "T.builtin" in the error message * Fix backslash continuation line handling. * Keep track of RS processing so we apply the regex properly only once per record.	2020-02-28 13:23:54 +02:00
zoulasc	c2c8ecbedf	More minor fixes: (#73 ) * More minor fixes: - add missing initializers - fix sign-compare warnings - fix shadowed variable	2020-02-19 20:44:49 +02:00
zoulasc	94e4c04561	argument parsing cleanups, dynamic program file allocation, fpe error enhancement. (#72 ) * - enhance fpe handler to print the error type - cleanup argument parsing - dynamically allocate program filename array * bison uses enums now, not #define's, make it work with that. * We need to use either the enums or the defines but not both. This is because bison -y will create both enums and #defines, while bison without -y produces only the enums, and byacc produces just #defines. * fix indentation * Set the tokentype when we have a match in the scan, and reset it later when we decide that the match was bad. Fixes nbyacc. * - don't use pattern rules for portability - try to move both flavors of generated names for portability * Amend tests for the new error messages	2020-02-18 21:20:27 +02:00
Michael Forney	69325710b1	Use MB_LEN_MAX instead of MB_CUR_MAX to avoid VLA (#70 ) MB_CUR_MAX is the maximum number of bytes in a multibyte character for the current locale, and might not be a constant expression. MB_LEN_MAX is the maximum number of bytes in a multibyte character for any locale, and always expands to a constant-expression.	2020-01-31 08:23:34 +02:00
zoulasc	6a8770929d	Small fixes (#68 ) * sprinkle const, static * account for lineno in unput * Add an EMPTY string that is used when a non-const empty string is needed. * make inputFS static and dynamically allocated * Simplify and in the process avoid -Wwritable-strings * make fs const to avoid -Wwritable-strings	2020-01-24 11:11:59 +02:00
Arnold D. Robbins	108224b484	Convert variables to bool and enum.	2019-11-10 21:19:18 +02:00
Arnold D. Robbins	c879fbf013	From Ori Bernstein, ori@eigenstate.org, for FS="" in multibyte locale.	2019-11-08 14:40:18 +02:00
zoulasc	0d8778bbbb	more cleanups (#55 ) * More cleanups: - sprinkle const - add a macro (setptr) that cheats const to temporarily NUL terminate strings remove casts from allocations - use strdup instead of strlen+strcpy - use x = malloc(sizeof(x)) instead of x = malloc(sizeof(type of x))) - add -Wcast-qual (and casts through unitptr_t in the two macros we cheat (xfree, setptr)). * More cleanups: - add const - use bounded sscanf - use snprintf instead of sprintf * More cleanup: - use snprintf/strlcat instead of sprintf/strcat - use %j instead of %l since we are casting to intmax_t/uintmax_t * Merge the 3 copies of the code that evaluated array strings with separators and convert them to keep track of lengths and use memcpy instead of strcat.	2019-10-25 10:59:09 -04:00
zoulasc	6589208eaf	More cleanups: (#53 ) - sprinkle const - add a macro (setptr) that cheats const to temporarily NUL terminate strings remove casts from allocations - use strdup instead of strlen+strcpy - use x = malloc(sizeof(x)) instead of x = malloc(sizeof(type of x))) - add -Wcast-qual (and casts through unitptr_t in the two macros we cheat (xfree, setptr)).	2019-10-24 09:40:15 -04:00
Arnold D. Robbins	7cae39dfa5	Make RS as regexp work without ifdef. Add doc, bump version.	2019-10-06 22:34:20 +03:00
Arnold D. Robbins	643a5a3dad	Add RS as regex code, ifdefed-out, from NetBSD.	2019-09-10 12:19:48 +03:00
Arnold D. Robbins	795a06b58c	Remove trailing whitespace on lines in all files.	2019-07-28 05:51:52 -06:00
M. Warner Losh	9310d452c9	Apply the following from FreeBSD / OpenBSD: 323965 \| imp \| 2017-09-23 23:04:06 -0600 (Sat, 23 Sep 2017) \| 8 lines Don't display empty error context. Context extraction didn't handle this case and showed uninitialized memory. Obtained from: OpenBSD lib.c 1.21 Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D12379	2019-07-16 22:23:19 +03:00
M. Warner Losh	0939e3392b	Apply r323963 from FreeBSD (pulling in the fix from OpenBSD) \| Fix uninitialized variable \| \| echo \| awk 'BEGIN {i=$1; print i}' prints a boatload of stack \| garbage. NUL terminate the memory returned from malloc to prevent it. \| \| Obtained from: OpenBSD run.c 1.40 \| Sponsored by: Netflix \| Differential Revision: https://reviews.freebsd.org/D12379	2019-07-16 22:23:02 +03:00
Cody Peter Mello	b463680594	Update field-splitting behaviour to match POSIX definition	2019-06-14 14:54:11 -07:00
Arnold D. Robbins	4189ef5d58	Fix Issue #38 - don't require non-= after = in cmd line assignment.	2019-05-29 21:04:18 +03:00
onetrueawk	79f008e853	Merge branch 'master' into nf-self-assign	2019-01-21 14:20:28 -05:00
onetrueawk	10da937340	Merge branch 'master' into subsep	2019-01-21 14:17:57 -05:00
Cody Peter Mello	7580235939	Fix initial "fields" buffer size	2018-11-12 10:34:19 -08:00
Cody Peter Mello	179536a516	Print an error message for negative NF values	2018-09-25 21:19:49 -07:00
Cody Peter Mello	52566c0aa4	Handle numeric FS, RS, OFS, and ORS values	2018-09-23 17:35:45 -07:00
Arnold D. Robbins	32093f5bbf	Fix multiple long-standing bugs, improve test suite.	2018-08-22 20:40:26 +03:00
Brian Kernighan	3ed9e245db	set baseline so Arnold can send pull request	2018-08-15 10:45:03 -04:00
Brian Kernighan	87b94932e6	initial commit for github	2012-12-22 10:35:39 -05:00

38 Commits