Each month, I process a new ~1.7MB ascii text file of ~35,000 lines as records. To purge the bad records, I use bash to sort and isolate the bad records which will be at the top of the file “temp.txt” and then delete them.
In this example, the first three tuples are bad.
temp.txt:
, 16309523 07/31/2025 KS 1171* 1220*
, 30270460 09/30/2025 WA 966* 830*
, 31352869 07/31/2024 NY 116/15
SMITH,ISMAEL A 16682841 10/31/2024 TX 729* S
AARANIG,ASHRITH L 15804806 02/29/2020 VA 855* 301*
AARE,RUTHVIK 30413910 02/29/2024 NC 681*
AARNDELL MARTINEZ,SAMUEL 31477094 10/31/2024 NY 241/08
#!/bin/bash
# My solution is to bash while loop over "temp.txt" and increment a counter by 1, for each bad record. Records with length <20 chars are of no use. Might be 0 bad records or 5000.
counter=0;
while read line; do
Y=${#line}; echo $Y; echo $line;
if [ "${Y}" < 20 ]; then
(($counter++)); echo $counter;
else
break;
fi
done <temp.txt
echo $counter;
Then I want to delete top lines in temp.txt based on $counter integer. If $counter = 3, delete the top 3 lines. I avoid processing all contents (~35,000 records) in “temp.txt”, thanks to the sorted file.
It seem that my coding of the “while read line” is bogus. Returns $Y = 0, often.
sed can delete the target records:
sed -i ‘1,3d’ temp.txt
This deletes first 3 lines from temp.txt. But how to apply $counter to sed command?
Any one have any ideas?