Blog2

Blog2

[ Life is either a daring adventure or nothing! Helen Keller]

Blog2 RSS Feed
 
 
 
 

Extract Abstracts from PubMed

I’m in the research group that we want to extract protein-protein interaction by using partial matching. After a while we found we need some article’s abstracts from  National Center for Biotechnology Information by using PUBMED ID. Here is my program to get articles from NCBI, extract abstract section and save abstract part into new file name by PUBMED ID.

import urllib2
import re
import os
from BeautifulSoup import BeautifulSoup

DIRECTORY = “abstractFiles”
try:
        os.mkdir(DIRECTORY)

except OSError:
        print “the directory %s is already exist” %DIRECTORY

f=open(‘output-1.txt’, ‘r’)

tickers = []
for line in f:
        tickers.append(line[13:21])

os.chdir(DIRECTORY)
for t in tickers:
        try:

                rows=urllib2.urlopen( \
                ‘http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&cmd=search&term=%s’ \
                %t).read()

                soup = BeautifulSoup(rows)
                abs = soup.findAll(‘p’,attrs={‘class’ : re.compile(“abstract”)})

                ab = str(abs[0])
                ab = ab[20:]

                ab = ab.replace(‘</p>’,)
                t = open(t+‘.txt’,‘w’)

                t.write(ab)
                t.close()

        except IOError:

                errors = [t]
                errf = open(‘bad_trickers.txt’,‘w+’)

                errf.write(str(errors))
                errf.close()

                print errors
f.close()

Note: This program is written in Python. To run this program you will need an external library named BeautifulSoup.

Click here to download the source code.

Artificial Intelligence

Finally this semester I got AI. I always like to be involve much and much in AI world, and now I feel AI and ML are my final goal. I would spend all my spare time to study more and more AI. I found some useful links for everyone interested too.

Digvan.Com!

Finally, after several month I updated my personal homepage. In this version I use jQuery as my AJAX framework. About jQuery, I should say it’s amazing easy to learn and work with. In my homepage I will keep my resume and portfolio up to date. Also I will add my pictures in photo section. Another intersting feature I added last night is “Latest Status” in footer. Now, I can update my latest status from my iPhone anytime, anywhere. It would be cool to review my status after 20 years!

Life Data Visualization

There is a guy who made this video with help of data mining and data visualization. He try to define the meaning of life and he is successful! Enjoy it!

 

Python usful libraries

In this post I want to make a list for all useful Python libraries that I use until now.

The good news is I will update this every once I use new library.

So now Let’s chill with Python Libraties.

You are more than welcome to share your favorite Python library in comment section.

Stock Historical Data

In my DB project I want to design a site for stock mining. First thing I did is writing a bot for downloading all stock historical data  in Python. I really like Python it makes my life a lot easier and more effective. Unless I don’t feel I’m really live effectively!

Here is code:

import urllib2

f=open(’cusip.txt’, ‘r’)

tickers = []
for line in f:
tickers.append(line.split(’,')[0])
for t in tickers:
# Open the URL
try:
rows=urllib2.urlopen(’http://ichart.finance.yahoo.com/table.csv?’+\
’s=%s&d=09&e=05&f=2008&g=d&a=3&b=12&c=1997′%t +\
‘&ignore=.csv’).read()
t = open(t+’.csv’,'w’)
t.write(rows)
t.close()

except IOError:
errors = [t]
errf = open(’bad_trickers.txt’,'w’)
errf.write(str(errors))
errf.close()
f.close()
print errors

Click here to download!

lesson from Mark Cuban

Whats the best piece of advice you can give a young entrepreneur?

Ill tell you what I learned from Bobby Knight: everybody’s got the will to win but when it comes time to doing something, it’s always about someone else. Not many people have the will to prepare. You got to be willing to know your product and environment better than anybody. No matter what you do there is someone out there trying to kick your ass. You got to be the smartest guy in the room about your product. Then you need to have a revenue source. You need a company with a revenue to make money. Concept, competition, and where the money is — plus something you love doing. I’ve never had a day of work. When I die I want to come back as me.

Who’s the entrepreneur you respect most? Current or past?

I guess Bill Gates. Larry Ellison I respect. You know, old school entrepreneurs, it was just diffferent. There was a different crede. I used to want to be profitable every month, before going IPO. But then later I accepted running at a loss. From the Netscape moving on, that’s what has happened since — the whole idea now is to get pageviews and then figure out a revenue model. I think entrepreneurs these days have been cheated because for them, its not about understanding how to make money. But when the money goes dry, you’re shit out of luck. When the bubble burst, 9 out of 10 businesses went away. With weblogs, our mantra was sales cures all. We used to talk about bottom line, not top line. It always came down to what you’re putting into your pocket. I want a cash-in-pocket strategy not an exit strategy.When you walk down these halls, you dont have people making money yet.

Building a Successful Enterprise Software Company in the Microsoft Age

  • The bad news: It would be nice to collect money from thousands of customers in exchange for shrink-wrapped closed-source software, packaged by prison labor. Sadly this niche is occupied by Microsoft. Don’t be fooled by survivorship bias into thinking that your company has a chance of ascending to this privileged position.
  • The big question: “Why would a customer want to adopt software from any supplier other than Microsoft?”
  • You can’t be richer than Microsoft or smarter than its thousands of brilliant employees; you can attack a problem that Microsoft is not attacking (and the first company to deliver a solution to a customer is the first company that can learn from watching users)
  • If the problem area is new, and it probably is if Microsoft hasn’t been there already, requirements will be evolving rapidly; open-source is the best way of ensuring that requirements are met (note that “open source” need not mean “free”)
  • Achieving critical mass before Microsoft kills you, Part I: educational marketing via high-level papers and books, one-day high-level courses, and multi-day bootcamps
  • Achieving critical mass before Microsoft kills you, Part II: freely downloadable software, easy-to-understand software architecture, clear (not voluminous) documentation
  • Key to supra-normal return on investment in a free open-source world: you control what gets added to the next version of the software, i.e., you control which customers are running a standard version of the software and can upgrade painlessly and which customers are forced into ownership of customizations.
  • Consider releasing the product from a running system with real users (instead of the usual loop of users talk to marketing, marketing talks to product management, product management talks to programmers, new version gets released after 2 years and the cycle starts anew)

Resource:

1- http://philip.greenspun.com/teaching/short-talks

Some professional notes

  • If I am not for myself, who is for me?
    When I am for myself, what am I?
    If not now, when?
    – Hillel (circa 70 B.C. - 10 A.D.)
  • If I do not document my results, who will?
    If the significance of my work is not communicated to others, what am I?
    If not now, when?
    – philg
  • do remember that if Microsoft, Oracle, Red Hat, or Sun products either worked simply or simply worked, half of the people in the information technology industry would be out of jobs.
  • A comment header at the top of every source code file and an email address at the bottom of every page. That’s a good start toward building a professional reputation. But it isn’t enough. For every computer application that you build, you ought to prepare an overview document. This will be a single HTML page containing linear text that can be read simply by scrolling, i.e., the reader need not follow any hyperlinks in order to understand your achievement. It is probably reasonable to expect that you can hold the average person’s attention for four or five screens worth of text and illustrations. What to include in the overview illustrations? In-line images of Web or mobile browser screens that display the application’s capabilities. If the application supports a complex workflow, perhaps a graphic showing all the states and transitions.
  • keep in mind that for every person reading this chapter a poor villager in India is learning SQL and Java. A big salary can evaporate quickly. Between March 2001 and April 2004 roughly 400,000 American jobs in information technology were eliminated. Many of those who had coded Java in obscurity ended up as cab drivers or greeters at Walmart. A personal professional reputation, by contrast, is a bit harder to build than the big salary but also harder to lose. If you don’t invest some time in writing (prose, not code), however, you’ll never have any reputation outside your immediate circle of colleagues, who themselves may end up working at McDonald’s and be unable to help you get an engineering job during a recession.
  • Professional Definition:
  • they practice at the state of the art, writing computer programs that are used by millions of people worldwide (the GNU set of Unix tools and the Linux kernel)
  • they have innovated; Stallman having developed the Emacs text editor (one of the first multi-window systems) and Torvalds having developed a new method for coordinating development worldwide
  • they have taught others how to practice their innovation by releasing their work as open-source software and by writing documentation
  • Professional 7 Objectives:
  • 1- a professional programmer picks a worthwhile problem to attack; we are engineers, not scientists, and therefore should attempt solutions that will solve real user problems.
  • 2- a professional programmer has a dedication to the end-user experience; most computer applications built these days are Internet applications built by small teams and hence it is now possible for an individual programmer to ensure that end users aren’t confused or frustrated (in the case of a programmer working on a tool for other programmers, the goal is defined to be “dedication to ease of use by the recipient programmer”).
  • 3- a professional programmer does high quality work; we preserve the dedication to good system design, maintainability, and documentation, that constituted pride of craftsmanship.
  • 4- a professional programmer innovates; information systems are not good enough, the users are entitled to better, and it is our job to build better systems.
  • 5- a professional programmer teaches by example; open-source is the one true path for a professional software engineer.
  • 6- a professional programmer teaches by documentation; writing is hard but the best software documentation has always been written by programmers who were willing to make an extra effort.
  • 7- a professional programmer teaches face-to-face; we’ve not found a substitute for face-to-face interaction so a software engineering professional should teach fellow workers via code review, teach short overview lectures to large audiences, and help teach multi-week courses.
  • Presentation Format:
  • 1-elevator pitch, a 30-second explanation of what problem has been solved and why the system is better than existing mechanisms available to people
  • 2-demo of the completed system (see the “Content Management” chapter for some tips on making crisp demonstrations of multi-user applications) (5 minutes; make it clear whether or not the system has been publicly launched or not)
  • 3-a slide showing system architecture and what components were used to build the system (1 minute)
  • 4-discussion of the toughest technical challenges faced during the project and how they were addressed (2 minutes; possibly additional slides)
  • 5-tour of documentation (2 minutes) — you want to convince the audience that there is enough for long-term maintenance
  • 6-the future (1 minute) — what are the next milestones? Who is carrying on the work?
  • Panelists love documentation.
  • Panelists need to have the rationale for the application clearly explained at the beginning.
  • Decision-makers who are also good technologists like to have the scale of the challenge quantified.
  • You need to distinguish your application from packaged software and other systems that the panelists expect are easily available.
  • If at any time during a pitch someone points out that there is a Microsoft product that solves the same problem, the meeting is over.
  • At the same time, unless you’re being totally innovative, a good place to start is by framing your achievement in terms of something that the audience is already familiar with, e.g., Yahoo! Groups or generic online community toolkits and then talk about what is different.
  • Decision-makers often bring senior engineers with them to attend presentations, and these folks can get stuck on personal leitmotifs.

Resource:

1 - http://philip.greenspun.com/seia/writeup?

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!