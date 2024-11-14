I want to remove the 2nd token in some lines of file DATA.TXT (34,000 lines)

Remove the second token of each line if it is not an 8-digit integer.

How can I do this in bash? (probably with bash ‘explode’ on text file) Would I better off using bash, JS, or PhP for this task, which I have to do each month? Would it be easier and faster to initially process DATA.TXT as a text file or convert to an array?

EXAMPLE:

OLD:

COURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

NEW: “RAYMOND” is removed. Good!

COURTEMANCHE,STEVEN 10004331 07/31/2024 PA 1603*

ANOTHER EX.

OLD:

RACZKA,ALAN V 10001901 12/31/2099 MA 1469*

NEW: “V” is removed. Good!

RACZKA,ALAN 10001901 12/31/2099 MA 1469*

But, some of the other tuples already have a second token as an 8-digit integer, in which case no need to process that line. Like: “CAMPBELL,ROBERT”

DATA.TXT:

RACZKA,ALAN V 10001901 12/31/2099 MA 1469*

CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*

lCOURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

…