13. January 2012 · Comments Off · Categories: Computers · Tags: , , ,

A function that calculates the mean in Python:

def mean(x):
    sum = 0
    for i in x:
        sum += i
    mean = sum / len(x)
    return mean

or the same function in R:

mean <- function(x) {
    sum = 0
    for (i in x) { sum = sum + i }
    mean = sum / length(x)
    print(mean)
}
10. January 2012 · Comments Off · Categories: Computers, Linux · Tags: , ,

My small contribution to the FOSS community: over 200 GB of Linux ISOs served via bittorrent.

27. December 2011 · Comments Off · Categories: Computers, Internet, Linux · Tags: , , ,

YouHaveDownloaded.com claims to track the IP addresses of people who use BitTorrent, but I’ve been torrenting Linux ISOs for months, pushing almost 200 GB of data, and the IP address of that box is not in their database. I don’t know what their methodology is, but they’re obviously not interested in people sharing legal files.

So I finally got a smart phone, and let me tell you, going from a phone that only does voice and text to the Galaxy Nexus is like being shotgunned into the next century. We call the Galaxy Nexus a mobile phone for historical reasons only. It’s really a mobile computer that happens to do voice and text messaging, among other things. I think there are enough reviews of the Galaxy Nexus that I don’t have to give another rundown of its features here, but I do have a few thoughts on the smart phone phenomenon.

First, my “phone” has 1 GB of RAM and 30 GB of storage. My laptop from five years ago had 512 MB of RAM and a 40 GB hard drive. My desktop computer from ten years ago had 128 MB of RAM and a PIII 133 MHz processor. I don’t know how that compares to the ARM Cortex 9 on standard benchmarks, but it’s not just hardware. Browsers today can render JavaScript 10 times faster than the same browsers just three years ago, and probably 50 times faster than Internet Explorer 6, on the same hardware.

So what will our “phones” be like in another five or ten years? In five years, they will probably have the computing power of today’s commodity desktops and laptops, and in ten years they will far surpass them. On top of that, protocol and software improvements like SPDY, Dart, NaCl, etc. (well, maybe), will push performance much farther than hardware improvements alone. I believe the future looks bright, at least from a purely technological perspective.

Of course, the changes we see are not just technological. My phone has GPS that can geolocate me to “within 30 meters” of my actual location. When I turned it on at home, the address that it gave me was my next door neighbor’s house, which is close enough. That’s really convenient when I want directions to the closest Chinese restaurant, but I’m also keenly aware that Google will never delete that data. Ever.

This always happens with technology — there’s always some catch, some unintended (or intended) side effect to our technological marvels. The combustion engine created the industrial revolution and allowed us to build cities (because it made rail and transport affordable), but it also pumped megatons of toxins and greenhouse gases into the atmosphere. For all the problems we solve, we create many new ones, but we keep going because usually the marginal benefits outweigh the costs, all things considered.

It’s just another thing to keep in mind. Email allows you to communicate easily with other people, but that doesn’t mean you should use it for every conversation. Some conversations should be reserved for face to face communication, but that fact doesn’t mean we should abandon email either. Likewise, we don’t have to abandon smart phones because we have (valid) privacy concerns. We just have to remember sometimes to turn off the GPS.

14. December 2011 · Comments Off · Categories: Computers, Internet, Linux · Tags: , , , , ,

I manage a Debian server that hosts the Evolving Scientist Podcast, among other things. I have no background in computer science or Linux administration. I’m just a Linux fan who loves learning, and administering a “production” server can be simultaneously entertaining, educational, perplexing, and infuriating.

A few weeks ago, I noticed htop reporting a load over 1.0, yet the CPU use was always close to 0%. I didn’t think much of it at the time, but as the load persisted, I got more worried. Was this harming my box in some way? Turns out, if it’s not CPU use, it’s probably I/O — continuous writing to the hard drive — and that’s not good for the integrity of your data.

If something was constantly read/writing to the hard drive, what was it? Where was it? I just happened to do this:

$ tail -n30 /var/log/syslog
 
kernel: [   76.392218] hub 1-0:1.0: unable to enumerate USB device on port 5
kernel: [   76.580230] hub 1-0:1.0: unable to enumerate USB device on port 5
kernel: [   76.768221] hub 1-0:1.0: unable to enumerate USB device on port 5
kernel: [   76.952222] hub 1-0:1.0: unable to enumerate USB device on port 5
# repeated 30 times

What the hell? The kernel was writing error messages at a rate of about 6 times per second. That might be the problem. But what does “unable to enumerate USB device” mean? Why didn’t I see it before?

A Google search turned up this bug, along with numerous forum posts, speculating and pontificating on the matter. I tried upgrading the kernel. I tried rebooting the server (losing 131 days of uptime), to no avail. I was ready to move everything to a new server. Finally I stumbled across this:

cd /sys/bus/pci/drivers/ehci_hcd/
sudo sh -c 'find ./ -name "0000:00:*" -print | sed "s/.///" > unbind'

Immediately the load dropped, and 15 minutes later, it sits comfortably near zero. It was just that simple, but as always, you don’t know what you don’t know.

This is yet another reminder that there are no hard problems — only problems that are hard to a certain level of intelligence and knowledge.

I don’t know why it worked; I was just desperate and wanted it to stop, so I pasted some code from (yet another) tutorial on the net. Now that I have some peace of mind, I can dig deeper.

Maybe when the grocery store automatic check out machine can determine whether I put my groceries in the bag at an accuracy exceeding random chance, I will worry about the Singularity.

18. September 2011 · Comments Off · Categories: Computers, Linux · Tags: , , ,

A standard way of measuring hard drive performance is to write a few hundred megabytes of zeros, like this:

$ dd if=/dev/zero of=test bs=64k count=6400 conv=fdatasync

Most hard drives can write in the range of 40 – 100 MB/s this way. We know RAM is faster, but how much faster? In Linux, we can conveniently mount a partition in RAM and write to that.

$ sudo mount -t tmpfs tmpfs /mnt -o size=1024m
 
$ df -H
 
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sda1               51G   5.4G    43G  12% /
/dev/sda5              680G   272G   374G  43% /home
tmpfs                  1.1G      0   1.1G   0% /mnt
 
$ dd if=/dev/zero of=/mnt/test bs=64k count=6400 conv=fdatasync
 
# DDR3 1066 RAM
6400+0 records in
6400+0 records out
419430400 bytes (419 MB) copied, 0.246482 s, 1.7 GB/s

Wow. Even solid state drives write at a maximum of about 250 MB/s. Imagine if hard drives were as fast as RAM. Well, for now, you can do disk IO-heavy work in a temporary RAM-mounted partition. When you’re done, remember to move or delete the files and unmount the partition:

$ rm /mnt/test
 
$ sudo umount -fl /mnt
15. August 2011 · Comments Off · Categories: Computers, Internet, Linux · Tags: , , , , ,

All right, I finally solved that problem of running multiple vhosts on Apache with WSGI. So now evolvingpodcast.net brings back a different result from the IP address itself. As always, the solution was simple. You just don’t know what you don’t know.

It’s always entertaining when I see computer scientists and engineers make analogies between biology and human artifacts. Then they make predictions about biomedical progress as if it marches at the same ineluctable pace as computer science.

Is the brain like a computer? Is the genome like a program? Let’s examine that. Let’s assume that evolution is a programmer.

First, our programmer has no clue what kind of program she wants to write. Unlike real programmers, who have some goal for their code, evolution just knows that she must write code. It is her nature. Second, she doesn’t know how to program. She’s blind, has no foresight, and doesn’t understand the syntax or logic of her programming language.

Luckily that doesn’t matter much, because she has several saving graces. For one, we gave her a programming language that is extremely forgiving of syntax, logic and execution. A missing parenthesis here or erroneous indentation there doesn’t crash the system. Neither does misspelling the names of variables, most of the time (the analogy here is of the vast neutral fitness zone, where most mutations are basically neutral, and only rarely are they significantly deleterious). It’s like a programming language where every statement is an implicit try: with a fallback of except: pass, but not really, since the point is that the language doesn’t care about trivial errors in syntax. It exhibits smooth fitness gradients.

The second saving grace is that our programmer gets immediate user feedback. They say the reason why FOSS works, despite the lack of revenue, is because of immediate user testing and feedback (see: The Cathedral and the Bazaar by Eric Raymond). Open source is open, and rather than hammering out code for months or years in a closed environment, FOSS developers get constant, immediate feedback from a large testing community, which improves the code faster. Well, that’s evolution. The genome is open source, and there is no production, no final release, only testing.

Our programmer doesn’t know how to code. She mostly types random stuff, mashes on the keyboard, and patches from /dev/urandom. Actually, she almost exclusively patches from /dev/urandom, for lack of any other insight, but everything she writes gets quickly thrown out to user testing (the environment), and she gets immediate feedback.

Along the way, she randomly forks the code into various branches and tries out different things. Over time, the branches diverge to the point where they are no longer interoperable, but some relics of their shared history remain. Eventually, many branches are discarded.

The third saving grace is that she literally has all the time in the world. She’s not constrained by deadlines, and it took her a few hundred million years to publish the first usable code. In this case, it was code that accomplished the singular task of making more copies of itself. And why not? Since she had no goal in mind, it should be obvious that the code most likely to persist would simply be code that replicates itself. All other tasks that it eventually accomplishes are secondary to that goal .

And that’s how it goes for millenia. Despite the massive inefficiency of this process, it works because our programmer is extremely productive. She shotguns the problem. She iterates on the code billions of times a day and handles a bug tracker of unfathomable proportions.

What is the end result? Incredibly complex code with little underlying logic that just works, most of the time. That is why Kurzweil and many computer scientists and engineers are wrong about their predictions of biomedical progress. You’re not just reverse engineering the Kinect or some proprietary code, which you know has a purpose and internal logic. Cracking the genome is not like cracking MD5, where the time it takes to arrive at a cryptographic solution is inversely proportional to computing power. To understand the genome, and biology more generally, you have to decipher every line of code empirically.

That requires research on real biological systems, which is hamstrung by things like reproductive output, generation time, ethics, and pure luck. David Linden is right: empirical progress in the biological sciences is much more linear than you imagine. And if some aspects of it are exponential, the exponent is much smaller than you think.

Kurzweil likes to use the Human Genome Project as an example of exponential growth. It took seven years to sequence the first 1% (or thereabouts), and most scientists directly involved in the project thought it would take much longer to finish the full genome. But they underestimated the power of exponential (sequencing technology) growth, and several doubling times later, a full (draft) sequence was published in 2003. That’s great, but then what?

Then the hard problem of empirically deciphering the genome became relevant, and a decade later, we still don’t have personalized medicine. We’re not even close to understanding the human genome in its entirety. We didn’t even fully appreciate the importance of non-coding RNAs until after it was completed. We have the source code but we don’t know what to make of it.

In the widest interpretation, the genome is code, but it’s nothing like the code that you write. The programmer is nothing like you. And deciphering biology is not a straightforward engineering task, because it wasn’t made by engineers. You can’t look for internal logic, because there is none. You have to decipher each line empirically, and that’s hard work which is not subject to Moore’s Law. You can’t project current computational trends to some distant point in the future and seriously predict the emergence of a specific discovery or innovation to within a few years.

That might work when the entire system is a fabrication of goal-oriented human logic. It doesn’t work for biology.

Just as sequencing the human genome didn’t automatically produce usable knowledge of genetics and development, Kurzweil’s predictions about high resolution neural imaging will not automatically produce usable knowledge of the brain and consciousness. That will require lab work.

In the end, the analogies are superficial, and you can’t reason your way to biological conclusions through the lens of computer science or engineering. You should really learn biology and the travails of biological research to make insightful statements about them.

18. July 2011 · Comments Off · Categories: Computers, Research · Tags: , ,

Sometimes I use batch files to run dozens of commands in Matlab, which can take days to process. This saves time, because if I entered each command manually, some amount of time would be wasted before I realized the previous job was done. Of course, writing the batch file can also take time. So I do something like this:

from os import listdir
 
# Get a list of files to batch process
filelist = listdir('/path/to/dir/')
 
# I may only want to process some of those files
selectfiles = [filename for filename in filelist if 'term' in filename]
 
batchfile = open('BatchFile.m', 'w')
 
for filename in selectfiles:
    batchfile.writelines("command('/path/to/files/', '" + filename + "')n")

That occasionally saves me half an hour of mindless typing.