Very Silly Method to strip off html to get url

I know this is a very silly method on stripping off html tag from the html source code in order to retrieve the jpg url but it work , so I record down for my own use.


1. Prepare a file with all the html code and save it as txt.txt.

< onblur=" try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="">http://2.bp.blogspot.com/_YF1tNfQVN8w/TN66Xjc_9VI/AAAAAAABjUw/y_ImZpICYEE/s1600/53.jpg" > < style=" float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 266px; height: 400px;" src=" http://2.bp.blogspot.com/_YF1tNfQVN8w/TN66Xjc_9VI/AAAAAAABjUw/y_ImZpICYEE/s400/53.jpg" alt=" " id=" BLOGGER_PHOTO_ID_5539069505528919378" border=" 0"> < /a>

2. awk 'BEGIN { RS="href=\"" } { print $1}' txt.txt >txt2.txt

http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s1600/1.jpg" > < style=" float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 266px;" src=" http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s400/1.jpg" alt=" " id=" BLOGGER_PHOTO_ID_5539067054739232626" border=" 0">


3. awk '{ FS="\""; print $1}' txt2.txt >txt3.txt

http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s1600/1.jpg

4. Then use the script to generate the picture link that I want.

#!/bin/bash
while read inputline
do
echo '<> < title=" title" rel=" lightbox" href=""> > txt4.txt
echo ${inputline} > > txt4.txt
echo '" > < src=""> > txt4.txt
echo ${inputline} > > txt4.txt
echo '" alt=" alt text" title=" title" width=" 600" /> < /a> < /p> ' > > txt4.txt
# echo ${inputline}
done < txt3.txt
exit 0

That it . I know it can be shorter , but my skill level only up to this. Welcome expert to teach me better way of doing . Thank you.



---

Comments

Popular posts from this blog

Setup mail server with ldap authentication in docker

Install VMware workstation 11 on Fedora 21 with kernel 3.17

How to allow non root user to execute hping command ?