I have got the biggest headache!
I hope I am posting this to the right forum… I am trying to parse an html file using sed in a bash script. My problem is that I need to strip everything up to a particular tag, but the “everything” in this case includes tabs, spaces, newlines and linebreaks.
The html that I am working on is:
<div id="filters">
<input type="hidden" name="filter-search-previous" value="">
<input type="text" class="form-control" name="filter-search" placeholder="Search..." value="">
<select class="form-control" name="filter-sort">
<option >Newest products first</option>
<option >Sort by expiry ascending</option>
<option >Sort by expiry descending</option>
<option >Sort by price (lowest first)</option>
<option >Sort by price (highest first)</option>
</select>
<button type="button" class="btn btn-warning" style="top: -1px; position: relative; margin-left: 10px;" onclick="refresh_products(0)">Refresh / Search</button>
</div>
<div id="box-container-inner" style="position: relative">
<div class="box" id="product_1208750">
<div class="img-container">
Now, what I want to do is to strip out everything up to, and including
sed -i 's/.*<div id="box-container-inner" style="position: relative">//' output.txt
sed -i 's/[\s\S]*<div id="box-container-inner" style="position: relative">//' output.txt
The first one just deletes that one line, and the second one doesn’t do anything. Can someone help me with this? Either that or send me a bottle of aspirin! Thanks