While looking for websites offering services similar to Pastebin, I found this blog post.
http://blog.c22.cc/2012/02/28/quick-post-list-of-paste-sites/
So it makes sense to extend the Pastebin script to multiple paste sites. Here are some snippets of the script to look beyond pastebin.
Create array to store the paste sites and then craft the search query using the "String of interest" (you can store a list of search string in a file) plus a different paste site.
paste_site = ["pastebin.com", "paste2.org", "pastie.org", "textsnip.com", "gist.github.com", "pastie.textmate.org"]
paste_site.each { |ps|
f_tmp = open 'search.txt', 'r+'
f_tmp.each_line {|tmp|
#wrap the search string in quotes for exact string search = less and accurate results
#e.g "string to search" site:pastebin.org
search_string = '"' + tmp.chomp + '" ' + "site:"+ps
enc_str = URI::encode(search_string)
The search part is the same as the previous scripts. Each paste site will present the raw paste in different format, so there is some tinkering needed to get the raw paste. It is not very difficult and only needs some time and patience.
The below method/function checks if the DB already has the URL or the actual paste by comparing hash values to make the script run efficiently. It also gets the raw paste for the paste sites defined in the array above.
def pasteGrab url
db = SQLite3::Database.open("Pastebin.sqlite")
#if url is already in DB then dont fetch it again
indb = db.execute("SELECT * from Pastie where URL Like '#{url}'")
if indb.size == 0
begin
paste = Nokogiri::HTML(open(url))
rescue
puts "Error opening URL: " + url
else
#get the raw paste
if url =~ /(pastebin.com|textsnip.com)/
blob = paste.css("textarea").text
elsif url =~ /(past[ei][2e].org|pastie.textmate.org|gist.github.com)/
blob = paste.css("span").text
else
puts "There was some error getting the paste " + url
end
#hash the paste so that you dont store the same content in different pastes
blob_hash = Digest::MD5.new << blob
check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
if check.size == 0
time = Time.now()
st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
st.execute(url, time, blob, blob_hash)
puts "new entry"
else
puts "Duplicate Paste - Different URL"
# need to figure how to store multiple urls and point to same paste
end
end
else
puts "URL in DB"
end
end
Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.
http://blog.c22.cc/2012/02/28/quick-post-list-of-paste-sites/
So it makes sense to extend the Pastebin script to multiple paste sites. Here are some snippets of the script to look beyond pastebin.
Create array to store the paste sites and then craft the search query using the "String of interest" (you can store a list of search string in a file) plus a different paste site.
paste_site = ["pastebin.com", "paste2.org", "pastie.org", "textsnip.com", "gist.github.com", "pastie.textmate.org"]
paste_site.each { |ps|
f_tmp = open 'search.txt', 'r+'
f_tmp.each_line {|tmp|
#wrap the search string in quotes for exact string search = less and accurate results
#e.g "string to search" site:pastebin.org
search_string = '"' + tmp.chomp + '" ' + "site:"+ps
enc_str = URI::encode(search_string)
The search part is the same as the previous scripts. Each paste site will present the raw paste in different format, so there is some tinkering needed to get the raw paste. It is not very difficult and only needs some time and patience.
The below method/function checks if the DB already has the URL or the actual paste by comparing hash values to make the script run efficiently. It also gets the raw paste for the paste sites defined in the array above.
def pasteGrab url
db = SQLite3::Database.open("Pastebin.sqlite")
#if url is already in DB then dont fetch it again
indb = db.execute("SELECT * from Pastie where URL Like '#{url}'")
if indb.size == 0
begin
paste = Nokogiri::HTML(open(url))
rescue
puts "Error opening URL: " + url
else
#get the raw paste
if url =~ /(pastebin.com|textsnip.com)/
blob = paste.css("textarea").text
elsif url =~ /(past[ei][2e].org|pastie.textmate.org|gist.github.com)/
blob = paste.css("span").text
else
puts "There was some error getting the paste " + url
end
#hash the paste so that you dont store the same content in different pastes
blob_hash = Digest::MD5.new << blob
check = db.execute("SELECT * from Pastie where PasteHash Like '#{blob_hash}'")
if check.size == 0
time = Time.now()
st=db.prepare("INSERT into Pastie (URL, Date, Paste, PasteHash) VALUES (?,?,?,?)")
st.execute(url, time, blob, blob_hash)
puts "new entry"
else
puts "Duplicate Paste - Different URL"
# need to figure how to store multiple urls and point to same paste
end
end
else
puts "URL in DB"
end
end
Disclaimer - The scripts I write are just a way for me to learn new ways of using Ruby. You can use the script but I am not responsible in any way if Google or Pastebin block your IP address if the script is run in a way that violates their T&C. It may also have bugs. All scripts supplied on this site are provided as-is and no liability is accepted for any accidental harm this script may cause.
No comments:
Post a Comment