Monday, 12 November 2012

Updated pastebin script

I just discovered Nokogiri and it makes HTML parsing so easy. Here is an updated script which also stores posts of interest in a sqlite DB. Make sure you have the necessary SQLITE DB in place before trying the script.

require 'rubygems'
require 'open-uri'
require 'nokogiri'
require 'sqlite3'
require 'digest/md5'
 
db = SQLite3::Database.open("Pastebin.sqlite")
search_string = "Keywords+of+interest site:pastebin.com"
enc_str = URI::encode(search_string)

doc = Nokogiri::HTML(open("http://www.google.com.au/search?sclient=psy-ab&hl=en&site=&source=hp&q=#{enc_str}&btnG=Search"))

doc.css("cite").each { |x|
        time = Time.now()
        url = "http://" + x.text
        paste = Nokogiri::HTML(open(url))
        blob = paste.css("textarea").text
        blob_hash = Digest::MD5.new << blob
        check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
        if check.size == 0
            st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
            st.execute(url, time, blob, blob_hash)
            puts "new entry"
        else
            puts "Already Exists"
        end
}

Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.

No comments:

Post a Comment