Commit Graph

20 Commits

Author SHA1 Message Date
ozan yigit
3913329120 Merge branch 'fix-readrec' of https://github.com/mpinjr/awk into staging 2021-07-24 15:10:25 -04:00
Miguel Pineiro Jr
92f9e8a9be Fix readrec's definition of a record
I botched readrec's definition of a record, when I implemented
RS regular expression support. This is the relevant hunk from the
old diff:

```
-	return c == EOF && rr == buf ? 0 : 1;
+	isrec = *buf || !feof(inf);
+	   dprintf( ("readrec saw <%s>, returns %d\n", buf, isrec) );
+	return isrec;
```

Problem #1

Unlike testing with EOF, `*buf || !feof(inf)` is blind to stdio
errors. This can cause an infinite loop whose each iteration fabricates
an empty record.

The following demonstration uses standard terminal access control
policy to produce a persistent error condition. Note that the "i/o
error" message does not come from readrec(). It's produced much later
by closeall() at shutdown.

```
$ trap '' SIGTTIN && awk 'END {print NR}' &
[1] 33517
$ # After fg, type ^D
$ fg
trap '' SIGTTIN && awk 'END {print NR}'
13847376
awk: i/o error occurred on /dev/stdin
 input record number 13847376, file
 source line number 1
```

Each time awk tries to read the terminal from the background,
while ignoring SIGTTIN, the read fails with EIO, getc returns EOF,
the stream's end-of-file indicator remains clear, and `!feof`
erroneously promotoes the empty buffer to an empty record.  So long
as the error persists, the stream's position does not advance and
end-of-file is never set.

Problem #2:

When RS is a regex, `*buf || !feof(inf)` can't see an empty record's
terminator at the end of a stream.

```
$ echo a | awk 1 RS='a\n'
$
```

That pipeline should have found one empty record and printed a blank
line, but `*buf || !feof(inf)` considers reaching the end of the
stream the conclusion of a fruitless search. That's only correct when
the terminator is a single character, because a regex RS search can
set the end-of-file marker even when it succeeds.

The Fix

`isrec` must be 0 **iff** no record is found. The correct definition
of "no record" is a failure to find a record terminator and a
failure to find any data (possibly from a final, unterminated
record). Conceptually, for any RS:

```
isrec = (noTERM && noDATA) ? 0 : 1
```

noDATA is an expression that's true if `buf` is empty, false otherwise.

When RS is null or a single character, noTERM is an expression
that is true when the sought after character is not found, false
otherwise. Since the search for a single character can only end with
that character or EOF, noTERM is `c == EOF`.

```
isrec = (c == EOF && rr == buf) ? 0 : 1
```

When RS is a regular expression: noTERM is an expression that is
true if a match for RS is not found, false otherwise. This is simply
the inverse of the result of the function that conducts the search,
`!found`.

```
isrec = (found == 0 && *buf == '\0') ? 0 : 1
```
2021-04-23 20:08:58 -04:00
Miguel Pineiro Jr
feaf62d159 Fix regular expression RS ^-anchoring
RS ^-anchoring needs to know if it's reading the first record of a file.
Unfortunately, innew, the flag that the main i/o loop uses to track
this, didn't make it from NetBSD unscathed. This commit restores the
last of the wayward lines.

Without this fix, when reading the first record of an input file named
on the command line, the regular expression machinery will be
misconfigured, precluding a successful match.

Relevant commits:
1. 643a5a3dad (Initial import)
2. ffee7780fe (Restoring innew)
2021-04-16 20:31:36 -04:00
ozan s. yigit
178f660b5a Change T.errmsg print to file fail test.
We cannot have a test that destroys eg. /etc/passwd if someone
runs it as root.
2021-01-10 15:24:37 -05:00
Tim van der Molen
e22bb7c625
Fix the T.errmsg test (#91)
Co-authored-by: Tim van der Molen <tim@kariliq.nl>
2020-07-29 21:29:46 +03:00
zoulasc
ffee7780fe
3 more fixes (#75)
* LC_NUMERIC radix issue.

According to https://pubs.opengroup.org/onlinepubs/7990989775/xcu/awk.html
The period character is the character recognized in processing awk
programs.  Make it so that during output we also print the period
character, since this is what other awk implementations do, and it
makes sense from an interoperability point of view.

* print "T.builtin" in the error message

* Fix backslash continuation line handling.

* Keep track of RS processing so we apply the regex properly only once
per record.
2020-02-28 13:23:54 +02:00
zoulasc
94e4c04561
argument parsing cleanups, dynamic program file allocation, fpe error enhancement. (#72)
* - enhance fpe handler to print the error type
- cleanup argument parsing
- dynamically allocate program filename array

* bison uses enums now, not #define's, make it work with that.

* We need to use either the enums or the defines but not both. This
is because bison -y will create both enums and #defines, while bison
without -y produces only the enums, and byacc produces just #defines.

* fix indentation

* Set the tokentype when we have a match in the scan, and reset it later
when we decide that the match was bad. Fixes nbyacc.

* - don't use pattern rules for portability
- try to move both flavors of generated names for portability

* Amend tests for the new error messages
2020-02-18 21:20:27 +02:00
Arnold D. Robbins
5068d20ef6 Restore zoulas fixes, step 1. 2020-02-06 22:27:31 +02:00
Arnold D. Robbins
d7a7e4d147 Revert zoulas changes until we can keep tests passing. 2020-02-06 22:08:20 +02:00
zoulasc
110bdc6b3e
misc fixes (#69)
* Add a test for german case folding.

* Add a function to copy a string with a string with a larger allocation
  (to be used by the case folding routines)
* Add printf attributes to the printf-like functions and fix one format
  warning
* Cleanup the tempfree macro
* make more functions static
* rename fp to frp (FRame Pointer) to avoid shadowing with fp (File Pointer).
* add more const
* fix indent in UPLUS case
* add locale-aware case folding
* make nfiles size_t
* fix bugs in file closing:
    - compare fclose to EOF and pclose to -1
    - use nfiles instead of FOPEN_MAX in closeall
    - don't close files we did not open (0,1,2) fpurge/fflush instead

* - use NUL instead of 0 for char comparisons
- add ISWS() macro
- use continue; instead of ;

* Check for existance of the german locale before using it.

* Add missing parentheses, thanks Arnold.
2020-02-06 21:25:36 +02:00
Arnold D. Robbins
78c79c06d0 Fix a{0}, update tests. 2020-01-31 08:40:11 +02:00
Martijn Dekker
fed1a562c3 Make I/O errors fatal instead of mere warnings (#63)
An input/output error indicates a fatal condition, even if it
occurs when closing a file. Awk should not return success on I/O
error, but treat I/O errors as it already treats write errors.

Test case:

$ (trap '' PIPE; awk 'BEGIN { print "hi"; }'; echo "E $?" >&2) | :
awk: i/o error occurred while closing /dev/stdout
 source line number 1
E 2

The test case pipes a line into a dummy command that reads no
input, with SIGPIPE ignored so we rely on awk's own I/O checking.
No write error is detected, because the pipe is buffered; the
broken pipe is only detected as an I/O error on closing stdout.

Before this commit, "E 0" was printed (indicating status 0/success)
because an I/O error merely produced a warning. A shell script
was unable to detect the I/O error using the exit status.
2020-01-17 14:02:57 +02:00
Martijn Dekker
2976507cc1 rename T.concat to T.csconcat to avoid case-insensitive conflict (#64)
On case-insensitive file systems (i.e.: macOS), T.concat and
t.concat are the same file, so these conflicted. This commit
renames T.concat to avoid the conflict.
2020-01-10 12:13:26 +02:00
Arnold D. Robbins
c7eeb57210 Fix merging of concatenated string constants. 2020-01-05 21:18:36 +02:00
Arnold D. Robbins
7db55ba13f Bug fix in interval expressions. 2019-12-27 12:03:35 +02:00
zoulasc
a96aebbbd6 Fix printf format conversions. (#59)
Further simplify printf % parsing by eating the length specifiers
during the copy phase, and substitute 'j' when finalizing the format.
Add some more tests for this.
2019-12-11 09:17:34 +02:00
Arnold D. Robbins
0e1bebcc09 Small fixes in the test suite. 2019-11-08 14:36:37 +02:00
Arnold D. Robbins
147521b831 Revise testdir/T.split per PR #42. 2019-07-16 20:50:23 +03:00
Arnold D. Robbins
891690942a Update T.split to match code changes. 2019-07-16 20:37:13 +03:00
Arnold D. Robbins
d6c466c367 Extract testdir. 2019-06-23 03:13:57 -06:00