Using sed to parse html with linebreaks?

I have got the biggest headache!

I hope I am posting this to the right forum… I am trying to parse an html file using sed in a bash script. My problem is that I need to strip everything up to a particular tag, but the “everything” in this case includes tabs, spaces, newlines and linebreaks.

The html that I am working on is:

 
    <div id="filters">
        <input type="hidden" name="filter-search-previous" value="">
        <input type="text" class="form-control" name="filter-search" placeholder="Search..." value="">

        <select class="form-control" name="filter-sort">
            <option >Newest products first</option>
            <option >Sort by expiry ascending</option>
            <option >Sort by expiry descending</option>
            <option >Sort by price (lowest first)</option>
            <option >Sort by price (highest first)</option>
        </select>
 
 
        <button type="button" class="btn btn-warning" style="top: -1px; position: relative; margin-left: 10px;" onclick="refresh_products(0)">Refresh / Search</button>
 
    </div>
    
    <div id="box-container-inner" style="position: relative">

        <div class="box" id="product_1208750">
        <div class="img-container">

Now, what I want to do is to strip out everything up to, and including

. I have tried the following commands:

sed -i 's/.*<div id="box-container-inner" style="position: relative">//' output.txt

sed -i 's/[\s\S]*<div id="box-container-inner" style="position: relative">//' output.txt

The first one just deletes that one line, and the second one doesn’t do anything. Can someone help me with this? Either that or send me a bottle of aspirin! Thanks :slight_smile:

@rebeltaz,

Welcome to these forums. :sunglasses:

I do not know i f you have chosen the correct
category but someone here will sort that out.

coothead

1 Like

lol… yeah, I always have trouble with category selection! Thanks :slight_smile:

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.