13. January 2012 · Comments Off · Categories: Computers · Tags: , , ,

A function that calculates the mean in Python:

def mean(x):
    sum = 0
    for i in x:
        sum += i
    mean = sum / len(x)
    return mean

or the same function in R:

mean <- function(x) {
    sum = 0
    for (i in x) { sum = sum + i }
    mean = sum / length(x)
    print(mean)
}
15. August 2011 · Comments Off · Categories: Computers, Internet, Linux · Tags: , , , , ,

All right, I finally solved that problem of running multiple vhosts on Apache with WSGI. So now evolvingpodcast.net brings back a different result from the IP address itself. As always, the solution was simple. You just don’t know what you don’t know.

29. July 2011 · Comments Off · Categories: Internet, Personal · Tags: , , ,

I cannot for the life of me get vhosts to work properly while running WSGI. I’ve searched for a solution, but nothing works. Right now, requests for all domains land on the Python app or I get an error that the request is not redirecting properly (or will never complete). I don’t know if anybody actually reads this blog (it gets a couple hundred hits a day, but that could all be spam bots), but if you do, and you know how to properly configure vhosts on Apache with WSGI, please let me know.

18. July 2011 · Comments Off · Categories: Computers, Research · Tags: , ,

Sometimes I use batch files to run dozens of commands in Matlab, which can take days to process. This saves time, because if I entered each command manually, some amount of time would be wasted before I realized the previous job was done. Of course, writing the batch file can also take time. So I do something like this:

from os import listdir
 
# Get a list of files to batch process
filelist = listdir('/path/to/dir/')
 
# I may only want to process some of those files
selectfiles = [filename for filename in filelist if 'term' in filename]
 
batchfile = open('BatchFile.m', 'w')
 
for filename in selectfiles:
    batchfile.writelines("command('/path/to/files/', '" + filename + "')n")

That occasionally saves me half an hour of mindless typing.

12. July 2011 · Comments Off · Categories: Computers, Linux, Research · Tags: , ,

If you import a module like Numpy into your Python interactive session and inspect its contents with dir(), you’ll get a list with 530 items. A saner method is to search within that list:

import numpy
 
[item for item in dir(numpy) if 'poly' in item]
 
['poly',
 'poly1d',
 'polyadd',
 'polyder',
 'polydiv',
 'polyfit',
 'polyint',
 'polymul',
 'polysub',
 'polyval']

So we turn that into a function:

def derp(module, term):
    return [item for item in dir(module) if term in item]

Download your sequence and remove any header lines so it looks like this.

seq = ''
 
for line in open('seq.txt'):
    line = line.rstrip()
    seq += line
 
gc = 0
 
for char in seq:
    if char == 'G' or char == 'C':
        gc += 1
 
print 'GC content = %.2f%%' % (gc / len(seq) * 100)

Turns out that BioPython makes life easier. Without removing the FASTA header, we can do:

from Bio.SeqIO import parse
from Bio.SeqUtils import GC
 
gc = [GC(record.seq) for record in parse(open('seq.txt'), 'fasta')]
print "GC content = %.2f%%" % gc[0]

That will print the GC content of the first record, but if you have more than one record, the others are calculated and can be printed with a for loop.

15. June 2011 · Comments Off · Categories: Computers · Tags: , ,
from random import shuffle
 
def fancysort(thelist):
    while thelist != sorted(thelist):
        print thelist
        shuffle(thelist)
    print 'The list is sorted!', thelist
15. June 2011 · Comments Off · Categories: Computers, Research · Tags: , , , ,

I’ve been learning Numpy lately, and one thing that it has in common with Matlab is that the shape of an array is specified according to the m x n convention in matrix algebra.  For example,

import numpy as np
a = np.arange(1, 11).reshape(2, 5)
print a
 
[[ 1  2  3  4  5]
[ 6  7  8  9 10]]

As you can see, the reshape() method does what it says, and takes the first argument as m (the number of rows) and the second argument as n (the number of columns). Since the m x n convention is always rows x columns, with the rows specified first, you would expect this to be consistent elsewhere. This works for indexing, so a[1, 3] will correctly grab the value that is in the second row and fourth column (because Python starts indexing at 0). Now, when summing rows and columns, you specify the axis with 0 and 1. Obviously, 0 comes before 1. So if rows come first in reshape() and indexing, they should come first in sum().

print a.sum(axis=0)
[ 7  9 11 13 15]
 
print a.sum(axis=1)
[15 40]

Of course, it can’t be that easy, and I’m forced to memorize a special case.

Aside from this little annoyance, Numpy is pretty nice, though. You don’t need any special syntax for array operations like you do in Matlab, and it automatically handles ZeroDivisionErrors like Matlab (printing 0 instead of Inf, but whatever).

I’d really like to replace Matlab entirely with Numpy and related tools, which is why I’m going through a tutorial to evaluate it.

Update: And just to add to the confusion, the insert() function specifies the axis with 0 for row and 1 for column, the exact opposite of sum()!!

a = np.arange(1, 16).reshape(3, 5)
print a
 
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
 
print np.insert(a, 1, 777, axis=0)
 
[[  1   2   3   4   5]
 [777 777 777 777 777]
 [  6   7   8   9  10]
 [ 11  12  13  14  15]]

So basically, any sort of indexing feature follows 0 = row, 1 = column, while any descriptive feature (sum, mean, std) follows 0 = column, 1 = row.