Bash remove token from text file or array (conditionally)

verlager · November 14, 2024, 7:53pm

I want to remove the 2nd token in some lines of file DATA.TXT (34,000 lines)
Remove the second token of each line if it is not an 8-digit integer.

How can I do this in bash? (probably with bash ‘explode’ on text file) Would I better off using bash, JS, or PhP for this task, which I have to do each month? Would it be easier and faster to initially process DATA.TXT as a text file or convert to an array?

EXAMPLE:
OLD:
COURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

NEW: “RAYMOND” is removed. Good!
COURTEMANCHE,STEVEN 10004331 07/31/2024 PA 1603*

ANOTHER EX.
OLD:
RACZKA,ALAN V 10001901 12/31/2099 MA 1469*

NEW: “V” is removed. Good!
RACZKA,ALAN 10001901 12/31/2099 MA 1469*

But, some of the other tuples already have a second token as an 8-digit integer, in which case no need to process that line. Like: “CAMPBELL,ROBERT”

DATA.TXT:

RACZKA,ALAN V 10001901 12/31/2099 MA 1469*
CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*
lCOURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

…

rpg_digital · November 14, 2024, 10:19pm

Very much a novice with bash, but here’s my attempt

replace.sh

#!/usr/bin/env bash
# colours
NORMAL=$(tput sgr0 setaf 15)
PRIMARY=$(tput setaf 10 bold)
SECONDARY=$(tput setaf 5 bold)

function find_file() {
    local name=$1
    local found_file=$(find . -type f -name $name)

    if [[ -z $found_file ]]; then
        return 1
    fi
    echo "$found_file"
}


function find_and_replace() {
    local name=$1
    local file=""

    if [[ -z $name ]]; then
        printf "${PRIMARY}No filename entered${NORMAL}\n\r"
        return 1
    fi

    file=$(find_file $name)

    if [[ -z $file ]]; then
        printf "${PRIMARY}File ${SECONDARY}$name${PRIMARY} does not exist${NORMAL}\n\r"
        return 1
    fi
    # works with given examples, but may need tweaking
    sed -E 's/([a-z]+,[a-z]+)\s[a-z]+/\1/gi' "$file" > "$(dirname $file)/new_$(basename $file)"
}

find_and_replace $1
return 0

sample file example.txt

RACZKA,ALAN V 10001901 12/31/2099 MA 1469*
CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*
lCOURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

command line linux(wsl)

. replace.sh example.txt

Outputs to file new_example.txt

RACZKA,ALAN 10001901 12/31/2099 MA 1469*
CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*
lCOURTEMANCHE,STEVEN 10004331 07/31/2024 PA 1603*

verlager · November 14, 2024, 11:14pm

Why mess with colors? This seems silly and unnecessary.

Why do you include this line twice?
local name=$1

Wouldn’t this be simpler:

sed -E -i ‘s/([a-z]+,[a-z]+)\s[a-z]+/\1/gi’ “DATA.TXT”

Thank you, Sir!

rpg_digital · November 14, 2024, 11:42pm

I worked from one of my previous bash files. Just opted to leave them in — it’s only error messages no big deal. Feel free to remove if you think it is silly.

I was presuming you didn’t want to overwrite the existing file — e.g. if there is an edge case where the regex doesn’t match and replace as intended. It doesn’t have to have a prefix of new_. Again feel free to change.

It’s an example, and as previously mentioned bash is not my area of expertise. Just thought I would have a go at it.

Zensei · November 15, 2024, 10:22pm

What am I missing here.

I tested this

sed -E -i ‘s/([a-z]+,[a-z]+)\s[a-z]+/\1/gi’ “DATA.TXT”

It does not do what is supposed to do. In fact it does not do anything.

m_hutley · November 15, 2024, 10:42pm

~~Well at least for me, executing it on the command line would require not putting quotes around the input filename~~. Your OS may be different. (Mine (Linux Mint 21.3 Cinnamon) borked a sed: can't read “data.txt”: No such file or directory). It also doesnt liker fancy quote marks, so make sure your apostrophes are apostrophes and not curly fancy things.

EDIT: Correction. Tripped myself up with my own words. The fancy quote thing is what ate the filename.

Zensei · November 16, 2024, 12:25am

Yes. My is Mint too.
However in my Mac it does not seems to work but it does work in my Linux.

sed -E ‘s/([a-z]+,[a-z]+)\s[a-z]+/\1/gi’ “test.txt” > “test_new.txt”

test.txt

RACZKA,ALAN V 10001901 12/31/2099 MA 1469*
CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*
lCOURTEMANCHE,STEVEN RAYMOND 10004331 07/31/2024 PA 1603*

text_new.txt

RACZKA,ALAN 10001901 12/31/2099 MA 1469*
CAMPBELL,ROBERT 10002826 12/31/2099 MA 1900*
lCOURTEMANCHE,STEVEN 10004331 07/31/2024 PA 1603*

rpg_digital · November 16, 2024, 12:52am

There is more than one flag option for extended regexes, so it maybe the -E flag.

Options:
-E
-r
--regexp-extended

Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard (http://austingroupbugs.net/view.php?id=528), so use -E for portability. GNU sed has accepted -E as an undocumented option for years, and *BSD seds have accepted -E for years as well, but scripts that use -E might not port to other older systems. See Extended regular expressions.

Without the extended version I could not use the + operator, and zero-or-many was tripping me up.

I will also add, it was late

Just to add, my version had sourceFile > destinationFile

sed -E 's/([a-z]+,[a-z]+)\s[a-z]+/\1/gi' "$file" > "$(dirname $file)/new_$(basename $file)"

m_hutley · November 16, 2024, 1:39am

without a pipe, sed will replace in-place.

rpg_digital · November 16, 2024, 1:52am

Ah ok, didn’t know that.

Sorry for being pedantic, but it is called a ‘redirection operator’ isn’t it? I have used the pipe ‘|’ operator and it works like compose.

m_hutley · November 16, 2024, 2:30am

well sed works with either. in general i was using pipe as “if it has no other defined outflow, it assumes to operate in-place in the file stream” (as sed stands for “stream editor”)

system · December 16, 2024, 2:30am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Process 35,000 lines in text file with sed or bash while loop General Web Dev	2	156	December 18, 2024
Removing lines from a txt file PHP	3	605	October 8, 2014
Bash while loop to truncate file with bad tuples Community	2	182	December 19, 2024
I need to extract data from a html file in bash General Web Dev	2	1232	July 3, 2016
Manipulating Flat Text Files Tab Delimated PHP	3	593	October 8, 2014

Bash remove token from text file or array (conditionally)

Related topics