Thursday, November 18, 2010

Very Silly Method to strip off html to get url

I know this is a very silly method on stripping off html tag from the html source code in order to retrieve the jpg url but it work , so I record down for my own use.


1. Prepare a file with all the html code and save it as txt.txt.

< onblur=" try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="">http://2.bp.blogspot.com/_YF1tNfQVN8w/TN66Xjc_9VI/AAAAAAABjUw/y_ImZpICYEE/s1600/53.jpg" > < style=" float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 266px; height: 400px;" src=" http://2.bp.blogspot.com/_YF1tNfQVN8w/TN66Xjc_9VI/AAAAAAABjUw/y_ImZpICYEE/s400/53.jpg" alt=" " id=" BLOGGER_PHOTO_ID_5539069505528919378" border=" 0"> < /a>

2. awk 'BEGIN { RS="href=\"" } { print $1}' txt.txt >txt2.txt

http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s1600/1.jpg" > < style=" float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 400px; height: 266px;" src=" http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s400/1.jpg" alt=" " id=" BLOGGER_PHOTO_ID_5539067054739232626" border=" 0">


3. awk '{ FS="\""; print $1}' txt2.txt >txt3.txt

http://1.bp.blogspot.com/_YF1tNfQVN8w/TN64I5jNF3I/AAAAAAABjOQ/Ykc2T_qJ3k4/s1600/1.jpg

4. Then use the script to generate the picture link that I want.

#!/bin/bash
while read inputline
do
echo '<> < title=" title" rel=" lightbox" href=""> > txt4.txt
echo ${inputline} > > txt4.txt
echo '" > < src=""> > txt4.txt
echo ${inputline} > > txt4.txt
echo '" alt=" alt text" title=" title" width=" 600" /> < /a> < /p> ' > > txt4.txt
# echo ${inputline}
done < txt3.txt
exit 0

That it . I know it can be shorter , but my skill level only up to this. Welcome expert to teach me better way of doing . Thank you.



---

No comments:

Post a Comment

Feel free to leave your question or comment here, we will reply you as soon as possible.