titbits from the world less travelled

Archive for the 'python' Category

Python: extract all hyperlinks from a webpage

I know they must be thousands of programs to do this, but just thought i would give it a try. Its pretty easy. I will keep editing this as and when I improve my regular expression to do this.

import urllib as ul
import BeautifulSoup as bs
import re

myFile = ul.urlopen(http://www.sfbay.craigslist.org/roo/)
soup = bs.BeautifulSoup(myFile)
#print soup.prettify()
for anchor in soup.findAll(a):
#print re.match(href,anchor)
myString = str(anchor)
#print myString
try:
[a,b]= re.search(href=[0-9,-/~.a-zA-Z://]*{0,1},myString).span()
print myString[a:b]
except:
print error

posted by admin in python and have No Comments

How to strip mulitple and variable spaces in python

LJS93K 1300 10500
J38ZZ9 700 4750

What do you do when you have multiple spaces but not constant spaces in between words. You could you split in python but what that does is either strips all spaces or gives you a single space for each space found. this wouldnt be want you wanted. So you could do the following;

Read all the lines above into a String and then remove all new line chars or tabs and so on:

newString = re.sub([s], ,myListString)

Then join them to get a uniformly spaced string:

newString = .join(newString.split())

posted by admin in python and have No Comments

raw extraction of data from google blogger to wordpress

This is a small script I wrote to extract all posts from my blogger which was in the form a xml or aton feed and insert into a wordpress database.

import feedparser
import re
d = feedparser.parse(“C:\\Python25\\programs\\blog.xml”)
count = len(d['entries'])
loopVar = 0
p = re.compile(r’[\'\"]‘)
mainInsertString = “INSERT INTO `wp_posts` (`ID`, `post_author`,”
s2 = “`post_date`, `post_date_gmt`, `post_content`, `post_title`,”
s3 = “`post_category`, `post_excerpt`, `post_status`, `comment_status`,”
s4 = “`ping_status`, `post_password`, `post_name`, `to_ping`, `pinged`,”
s5 = “`post_modified`, `post_modified_gmt`, `post_content_filtered`,”
s6 = “`post_parent`, `guid`, `menu_order`, `post_type`, `post_mime_type`,”
s7 = “`comment_count`) VALUES”

finalInsertString1 = mainInsertString + s2 + s3 + s4 + s5 + s6 + s7
print finalInsertString1
blogID = 3300
finalDBString = “”
finalDBString = finalInsertString1
dbString2 = “, 1, ‘2009-06-17 06:11:54′, ‘2009-06-17 06:11:54′,”
dbString3 = “‘, 0, ”, ‘publish’, ‘open’, ‘open’, ”, ‘palm-pre-and-web-os’, ”,”
dbString4 = ” ”, ‘2009-06-17 21:13:18′, ‘2009-06-17 21:13:18′, ”, 0, ”
dbString5 = “‘http://kmdarshan.com/wordpress/?p=”
dbString6 = “‘, 0, ‘post’, ”, 0),”
f = open(‘C:/Python25/programs/sqlfile.txt’, ‘w+’)
f.write(finalInsertString1)
encoding = “ascii”
loopString = “”
for loopVar in range(48, count):

blogTitle = d.entries[loopVar].title
e = d.entries[loopVar]
data = e.content[0].value
blogContent = p.sub(”, data)
loopString = “(“+ str(blogID) + dbString2 +”‘”+ blogContent + “‘,” + “‘” + blogTitle + dbString3 + dbString4 + dbString5 + str(blogID) + dbString6

f.write(loopString.encode(encoding,”ignore”))
blogID = blogID + 1
loopString = “”

I did this at first by rreading through the XML, taking out all the apostrophe’s so that we dont get any errors while inserting into a the sql table. Also stored them into a file for future use. Not sure why I did this, ahh..mostly the imported on my wordpress was giving me some silly errors and not importing any of my posts.

posted by admin in python and have No Comments

Facebook app development

My 0.02C that a novice facebook developers need to do, when developing a new application is look at this console given by facebook itself.

the links are below:

http://developers.facebook.com/tools.php

Basically the above has both php and XML links for developers to easily develop.

Attaching a screenshot below:

posted by admin in Uncategorized, python and have No Comments

Python: Creating a DocTest / simple example

We use doctest to check whether a written method is giving the correct output or to check for the docstrings for interactive examples or for regression testing or to write documentation.

Consider this example

def multiplyTwoNumbers(a,b):
product = a*b
return product

this method is used for multiplying two numbers. But there is no way to verify that the code written does that actually. What you can do is add a doctest to this.

def multiplyTwoNumbers(a,b):
This test is for multiplication of two numbers
>>> multiplyTwoNumbers(3,4)
12

product = a*b
return product

def _test():
import doctest
doctest.testmod()

if __name__ == __main__:
_test()

This will test the output to the methods output and will give an error if they dont match. In this way doctest can be used. From command line it can be used by calling:

python -v to give a descriptive error sequence else you can just execute the program.

posted by admin in python and have No Comments

MySQLDB: Python: Inserting date, integer, float and other variables into MySQL

The best way to insert values into a database using MySQLDB would be to simply use ‘%s’. This placeholder does all the work on conversion into other data types. You dont need to worry about converting any values into their respective datatypes in mysql.

posted by admin in python and have No Comments

MySQLDb inserting data into a MySQL table with variables

Of the many ways to inset values into a MySQL database, we could use the below also:

insert_query = “INSERT INTO %s(symbol,\
date,open,high,low,close,volume) \
VALUES( %s,%s, %s , %s , %s , %s, %s )”
insert_cursor.execute(insert_query
%(symbol_Nasdaq,
symbol_Nasdaq,
row_Nasdaq[0],
row_Nasdaq[1],
row_Nasdaq[2],
row_Nasdaq[3],
row_Nasdaq[4],
volume_fin))

The above query will most likely give you an error stating the MySQL statement was not able to find a column name.

What actually happens is it translates one of your variables into a column by mistake or I am not sure about how this happens, but anyways the below query would definitely work.

insert_query = “INSERT INTO %s(symbol,\
date,open,high,low,close,volume) \
VALUES( ‘%s’,'%s’,'%s’,'%s’,'%s’,'%s’,'%s’)”
insert_cursor.execute(insert_query
%(symbol_Nasdaq,
symbol_Nasdaq,
row_Nasdaq[0],
row_Nasdaq[1],
row_Nasdaq[2],
row_Nasdaq[3],
row_Nasdaq[4],
volume_fin))

I just added a apostrophe for each variable. This correctly entered the data into my tables.

Tags:
posted by admin in mysqldb, python and have No Comments

Logic errors when searching for a repeated element in a loop

More than once I seem to have done this mistake. The situation is like this. I was inserting some elements into a database at irregular intervals. I needed to know that I dont insert duplicate symbols. So I was wrote a small method to check this. The below code is not exactly the same method, but its enough to explain the errors:

def rejected_symbols(symbol):
rejected_list = ['TRUE','CAST','ELSE','LONG', 'LOOP','TO']
for val in rejected_list:

if val == symbol:

{

return 0
else:

return 1

}

The above is a python snippet but I put the braces for clarity. Now what happens is we dont loop through the entire list. As soon as we see a single element doesnt match we return OR vice versa. Hence we are returning the incorrect value.

The correct logic would be

def rejected_symbols(symbol):
rejected_list = ['TRUE','CAST','ELSE','LONG', 'LOOP','TO']
for val in rejected_list:

{

if val == symbol:

{

return 0

}

}

to return at the end of the looping of the list, in this way we dont return when we find a true value.

posted by admin in python and have No Comments

mysql error 11001 in mysqldb

when trying to run a query in mysqldb from python I was getting the error 11001.

the best option would be run the query as below:

conn = ms.connect(host=”localhost”,port=3307, user = “root”, passwd = “”, db = “xxx”)

This error would be seen because I changed the port when I was installing MySql. So when mysql runs it checks for the default port in most installations 3306 and you get the error such that a particular localhost doesn’t exists and among others.

posted by admin in python and have No Comments

funny thing about float in python

try dividing 10/100 in python. You get 0.
you can do this 1/2. You get 0. This is not an error. Python is like this.

You can refer here http://www.python.org/doc/2.2.3/whatsnew/node7.html for more details.

To rectify this you can do the following:
import __future__ import division

This will return the floating point results.

posted by admin in python and have No Comments