Wednesday, 20 February 2013

Extending the Pastebin script

While looking for websites offering services similar to Pastebin, I found this blog post.

http://blog.c22.cc/2012/02/28/quick-post-list-of-paste-sites/

So it makes sense to extend the Pastebin script to multiple paste sites. Here are some snippets of the script to look beyond pastebin.

Create array to store the paste sites and then craft the search query using the "String of interest" (you can store a list of search string in a file) plus a different paste site.

paste_site = ["pastebin.com", "paste2.org", "pastie.org", "textsnip.com", "gist.github.com", "pastie.textmate.org"]
paste_site.each { |ps|
    f_tmp = open 'search.txt', 'r+'
    f_tmp.each_line {|tmp|
        #wrap the search string in quotes for exact string search = less and accurate results
        #e.g "string to search" site:pastebin.org       
        search_string = '"' + tmp.chomp + '" ' + "site:"+ps
        enc_str = URI::encode(search_string)
      
The search part is the same as the previous scripts. Each paste site will present the raw paste in different format, so there is some tinkering needed to get the raw paste. It is not very difficult and only needs some time and patience.

The below method/function checks if the DB already has the URL or the actual paste by comparing hash values to make the script run efficiently. It also gets the raw paste for the paste sites defined in the array above.

def pasteGrab url
    db = SQLite3::Database.open("Pastebin.sqlite")
    #if url is already in DB then dont fetch it again
    indb = db.execute("SELECT * from Pastie where URL Like '#{url}'")   
   
    if indb.size == 0
        begin       
            paste = Nokogiri::HTML(open(url))
        rescue
            puts "Error opening URL: " + url
        else
            #get the raw paste
            if url =~ /(pastebin.com|textsnip.com)/
                blob = paste.css("textarea").text
            elsif url =~ /(past[ei][2e].org|pastie.textmate.org|gist.github.com)/   
                blob = paste.css("span").text
            else
                puts "There was some error getting the paste " + url
            end
            #hash the paste so that you dont store the same content in different pastes
            blob_hash = Digest::MD5.new << blob
            check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
            if check.size == 0
                time = Time.now()       
                st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
                st.execute(url, time, blob, blob_hash)
                puts "new entry"
            else
                puts "Duplicate Paste - Different URL"
            # need to figure how to store multiple urls and point to same paste
            end
        end
    else
        puts "URL in DB"
    end
end


Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.