this post was submitted on 03 Jun 2024
4 points (83.3% liked)
Bug reports on any software
116 readers
4 users here now
When a bug tracker is inside the exclusive walled-gardens of MS Github or Gitlab.com, and you cannot or will not enter, where do you file your bug report? Here, of course. This is a refuge where you can report bugs that are otherwise unreportable due to technical or ethical constraints.
⚠of course there are no guarantees it will be seen by anyone relevant. Hopefully some kind souls will volunteer to proxy the reports.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
TIL there are people who (try to) use grep for natural language.
Whatever the case, I'd rather not modify (GNU) grep for this purpose. If grep can be hacked to do this (
tr '\n' ' ' | grep 'the orange menace'
or something maybe only a little more sophisticated?) then more power to you. Otherwise, making a separate tool is probably better.If all you're advocating for is allowing grep to use some other character as a delimeter, I might be able to get behind something like bash's $IFS or awk's $FS variable (maybe). But I couldn't get behind anything backwards-incompatible.
Meanwhile, PDFGREP isn't associated with any maintainers of grep, is it? I've never used it and I don't know if you're saying "because PDFGREP is good at handling natural language, grep should be too" or "grep should be good at handling natural language and PDFGREP even more so, but neither is." Either way, I don't follow how PDFGREP is relevant to discussions about grep (unless they are related, but I'd be surprised if that was the case.)
Oh, and this is 100% feature/enhancement request territory. Not a bug report in any sense.
Of course. GREP has an immeasurable number of scripts dependant on it worldwide going back 50 years and it’s among Debian’s 23 essential packages:
Changing grep’s default behavior now would bring the world down. Dams would shatter. Nuclear power plants would melt down. Traffic lights would go berzerk. It would be like a Die Hard 3 “firesale”. Planes would fall out of the sky. Skynet would come online and wipe us all out. It would have to be a separate option.
The very first task grep was created for is specifically natural language input. Search “Federalist Papers grep”. There’S also a short documentary about this out in the wild somewhere but I don’t have any link handy.
This is conventional wisdom coming from a viewpoint that simultaneously misses grep’s intended purpose.
But now that the defect has been rooted in for ~50 years, perhaps fair enough to leave grep alone. For me it depends on how lean the improvement could be. Boating grep out too much would not be favorable, but substantial replication of code between two different tools is also unfavorable. Small is good, but swiss army knives of tools also bring great value if they can be lean and internally simple.
Not at all. They both have the same problem. But this same limitation in pdfgrep is a nuissance in more situations because PDFs are proportionally more likely to process natural language input.
They have the same expression language and roughly same options. PDFGREP is most likely not much more than a grep wrapper that extracts the text from the PDF first.
Not a defect. What is it with people equating "doesn't do this one hairbrained thing I want it to" with "broken?"
It's not a bug if it works as designed. Unless somewhere some official documentation says (some specific version of) grep supports what you're advocating for but the actual grep command doesn't, it's not a defect. It's a feature request.
To qualify as a "bug", I'd also accept "it used to do this and it doesn't any more and not on purpose".
Even if (say, GNU) grep maintainers decided they'd make grep support what you're going for, there'd still be design to do. Should it be a flag? Should the regex syntax be extended to support this? Should we add an environment variable? Some combination of the three? Something else? If we go with the flag, what should it be called and what should be its semantic meaning? Should it take an argument? Etc, etc, etc.
Even assuming this feature is necessary to fulfill "grep's intended purpose" (and I'm far from convinced it is), that doesn't make it a bug if it was never designed in to the program.
What you claim here is that software cannot have a defective design. Of course you have design defects. These are the hardest to correct.
This is conventional wisdom. Past behavior is no more an indication of correctness than defectiveness. GREP’s purpose was to process natural language. A line feed is not a sensible terminator in that application. For 50 years people just live with the limitation or they worked around it. Or they adapt to single token searches. It does not cease to be defect because workarounds were available.
The original design was implemented on an extremely resource-poor system by today’s standards, where 64k was HUGE amount of space. It was built to function under limitations that no longer exist. I would say the design is not defective so long as your target platform is a PDP-11 from the 1970s. Otherwise the design should evolve along with the tasks and machines.