Bug reports on any software

116 readers

2 users here now

When a bug tracker is inside the exclusive walled-gardens of MS Github or Gitlab.com, and you cannot or will not enter, where do you file your bug report? Here, of course. This is a refuge where you can report bugs that are otherwise unreportable due to technical or ethical constraints.

⚠of course there are no guarantees it will be seen by anyone relevant. Hopefully some kind souls will volunteer to proxy the reports.

founded 3 years ago

MODERATORS

freedomPusher@sopuli.xyz

grep/pdfgrep’s inability to match across lines (sopuli.xyz)

submitted 5 months ago* (last edited 5 months ago) by freedomPusher@sopuli.xyz to c/bugs@sopuli.xyz

7 comments fedilink hide all child comments

Some will regard this as an enhancement request. To each his own, but IMO *grep has always had a huge deficiency when processing natural languages due to line breaks. PDFGREP especially because most PDF docs carry a payload of natural language.

If I need to search for “the.orange.menace“ (dots are 1-char wildcards), of course I want to be told of cases like this:

A court whereby no one is above the law found the orange  
menace guilty on 34 counts of fraud..

When processing a natural language a sentence terminator is almost always a more sensible boundary. There’s probably no command older than grep that’s still in use today. So it’s bizarre that it has not evolved much. In the 90s there was a Lexis Nexus search tool which was far superior for natural language queries. E.g. (IIRC):

foo w/s bar :: matches if “foo” appears within the same sentence as “bar”
foo w/4 bar :: matches if “foo” appears within four words of “bar”
foo pre/5 bar :: matches if “foo” appears before “bar”, within five words
foo w/p bar :: matches if “foo” appears within the same paragraph as “bar”

Newlines as record separators are probably sensible for all things other than natural language. But for natural language grep is a hack.

you are viewing a single comment's thread
view the rest of the comments

[–] freedomPusher@sopuli.xyz 0 points 5 months ago* (last edited 5 months ago)

It’s not a bug if it works as designed.

What you claim here is that software cannot have a defective design. Of course you have design defects. These are the hardest to correct.

I’d also accept “it used to do this and it doesn’t any more and not on purpose”.

This is conventional wisdom. Past behavior is no more an indication of correctness than defectiveness. GREP’s purpose was to process natural language. A line feed is not a sensible terminator in that application. For 50 years people just live with the limitation or they worked around it. Or they adapt to single token searches. It does not cease to be defect because workarounds were available.

that doesn’t make it a bug if it was never designed in to the program.

The original design was implemented on an extremely resource-poor system by today’s standards, where 64k was HUGE amount of space. It was built to function under limitations that no longer exist. I would say the design is not defective so long as your target platform is a PDP-11 from the 1970s. Otherwise the design should evolve along with the tasks and machines.