I just discovered Nokogiri and it makes HTML parsing so easy. Here is an updated script which also stores posts of interest in a sqlite DB. Make sure you have the necessary SQLITE DB in place before trying the script.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'sqlite3'
require 'digest/md5'
db = SQLite3::Database.open("Pastebin.sqlite")
search_string = "Keywords+of+interest site:pastebin.com"
enc_str = URI::encode(search_string)
doc = Nokogiri::HTML(open("http://www.google.com.au/search?sclient=psy-ab&hl=en&site=&source=hp&q=#{enc_str}&btnG=Search"))
doc.css("cite").each { |x|
time = Time.now()
url = "http://" + x.text
paste = Nokogiri::HTML(open(url))
blob = paste.css("textarea").text
blob_hash = Digest::MD5.new << blob
check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
if check.size == 0
st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
st.execute(url, time, blob, blob_hash)
puts "new entry"
else
puts "Already Exists"
end
}
Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'sqlite3'
require 'digest/md5'
db = SQLite3::Database.open("Pastebin.sqlite")
search_string = "Keywords+of+interest site:pastebin.com"
enc_str = URI::encode(search_string)
doc = Nokogiri::HTML(open("http://www.google.com.au/search?sclient=psy-ab&hl=en&site=&source=hp&q=#{enc_str}&btnG=Search"))
doc.css("cite").each { |x|
time = Time.now()
url = "http://" + x.text
paste = Nokogiri::HTML(open(url))
blob = paste.css("textarea").text
blob_hash = Digest::MD5.new << blob
check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
if check.size == 0
st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
st.execute(url, time, blob, blob_hash)
puts "new entry"
else
puts "Already Exists"
end
}
Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.
No comments:
Post a Comment