So I finally got a smart phone, and let me tell you, going from a phone that only does voice and text to the Galaxy Nexus is like being shotgunned into the next century. We call the Galaxy Nexus a mobile phone for historical reasons only. It’s really a mobile computer that happens to do voice and text messaging, among other things. I think there are enough reviews of the Galaxy Nexus that I don’t have to give another rundown of its features here, but I do have a few thoughts on the smart phone phenomenon.

First, my “phone” has 1 GB of RAM and 30 GB of storage. My laptop from five years ago had 512 MB of RAM and a 40 GB hard drive. My desktop computer from ten years ago had 128 MB of RAM and a PIII 133 MHz processor. I don’t know how that compares to the ARM Cortex 9 on standard benchmarks, but it’s not just hardware. Browsers today can render JavaScript 10 times faster than the same browsers just three years ago, and probably 50 times faster than Internet Explorer 6, on the same hardware.

So what will our “phones” be like in another five or ten years? In five years, they will probably have the computing power of today’s commodity desktops and laptops, and in ten years they will far surpass them. On top of that, protocol and software improvements like SPDY, Dart, NaCl, etc. (well, maybe), will push performance much farther than hardware improvements alone. I believe the future looks bright, at least from a purely technological perspective.

Of course, the changes we see are not just technological. My phone has GPS that can geolocate me to “within 30 meters” of my actual location. When I turned it on at home, the address that it gave me was my next door neighbor’s house, which is close enough. That’s really convenient when I want directions to the closest Chinese restaurant, but I’m also keenly aware that Google will never delete that data. Ever.

This always happens with technology — there’s always some catch, some unintended (or intended) side effect to our technological marvels. The combustion engine created the industrial revolution and allowed us to build cities (because it made rail and transport affordable), but it also pumped megatons of toxins and greenhouse gases into the atmosphere. For all the problems we solve, we create many new ones, but we keep going because usually the marginal benefits outweigh the costs, all things considered.

It’s just another thing to keep in mind. Email allows you to communicate easily with other people, but that doesn’t mean you should use it for every conversation. Some conversations should be reserved for face to face communication, but that fact doesn’t mean we should abandon email either. Likewise, we don’t have to abandon smart phones because we have (valid) privacy concerns. We just have to remember sometimes to turn off the GPS.

Maybe when the grocery store automatic check out machine can determine whether I put my groceries in the bag at an accuracy exceeding random chance, I will worry about the Singularity.

24. July 2011 · Comments Off · Categories: Internet, Technology · Tags: , , ,

The recent controversy over deleted Google+ accounts got me thinking again about the ephemeral nature of our digital lives. My mantra is that life is full of trade offs, and this seems to be one of them. Digital content is much easier to distribute than physical content. The trade off is that your entire digital life can be wiped out in an instant. What do you do then?

Over the last six months I have been systematically scanning old pictures that were, up until now, stored away at my parents’ house. Many of these pictures are from the 80s, when I was a kid, but some are from the 50s and 60s, when my parents were kids. Although the pictures are 25-50 years old, they are in surprisingly good condition. The pages of the albums have yellowed and the glue decomposed over time, but the pictures are as vivid as the day they were made. With continued care, they will outlast me.

That’s a guarantee that we don’t get with our digital content. The first problem is the instrumental fact that hard drives crash. One of the main reasons we sign up for “the cloud” is automation of back ups. When you upload your images to Picasa, they make redundant copies across geographically separated data centers. However, the second and bigger problem is the one we’re witnessing now: Terms of Service and Acceptable Use Policy violations.

Of course, some people (like Eben Moglen and Richard Stallman) have been warning us for years that turning our digital lives over to the whims of capricious third parties is a bad idea. Most people don’t care, because online services are so easy to use. I think the first time a significant number of people became aware of this problem was last year when WikiLeaks got booted off Amazon’s servers and bounced around several hosts. Amazon claimed that WikiLeaks had violated their AUP, but it’s not entirely clear that they had. It’s hard to argue that they were engaged in criminal activity when they still haven’t been charged with a crime.

The real problem here is that the new gatekeepers of our digital lives can do whatever the hell they want. They can inconsistently apply their AUPs when a Congressman calls them up, just as G+ is inconsistently applying its policy on real names now. If all your media are digital, then it’s just your lifetime of memories that’s at stake.

I have 750 pictures in my Picasa Web account, most of which were scanned during my recent project. If Google deleted my account without explanation (as is usually the case), what would I do? Well, ironically, I have the physical pictures, which are the ultimate back up. Short of that, the best solution is simply to make as many back ups as you can, in as many different places: multiple hard drives, different hosts, etc. If you know a little scripting, this can be automated, but it’s a nontrivial solution for most people.

The other thing that we must impress on our new gatekeepers is that, if they expect us to turn our digital lives over to them, they need to start taking their responsibilities seriously.

As a first step, be responsive to your users. We hear over and over again about accounts being deleted, and the universal problem (at Facebook, Google and elsewhere) is that you can’t reach anyone. They’re asking us to put all our eggs in their basket. We’re “betting on them”, as Vic Gundotra said to one of the recent victims of a random deletion. That power comes with responsibility. Take it seriously, and create a mechanism where problems are reported and addressed quickly.

Their apparent insouciance on this point is creating a lot of doubt. Maybe Eben Moglen was right, and a plug server in every home is the safest mechanism for storing your data. First, we would own the bare metal, so there wouldn’t be AUPs to worry about. Second, there would be no spying, like all cloud services currently do. Only a court order or warrant would give third parties access to your data. Third, distributed, encrypted backups (with version control, even) would ensure the integrity of your data. People are already working on this solution. You can read more here:

http://freedomboxfoundation.org/
http://wiki.debian.org/FreedomBox

It’s always entertaining when I see computer scientists and engineers make analogies between biology and human artifacts. Then they make predictions about biomedical progress as if it marches at the same ineluctable pace as computer science.

Is the brain like a computer? Is the genome like a program? Let’s examine that. Let’s assume that evolution is a programmer.

First, our programmer has no clue what kind of program she wants to write. Unlike real programmers, who have some goal for their code, evolution just knows that she must write code. It is her nature. Second, she doesn’t know how to program. She’s blind, has no foresight, and doesn’t understand the syntax or logic of her programming language.

Luckily that doesn’t matter much, because she has several saving graces. For one, we gave her a programming language that is extremely forgiving of syntax, logic and execution. A missing parenthesis here or erroneous indentation there doesn’t crash the system. Neither does misspelling the names of variables, most of the time (the analogy here is of the vast neutral fitness zone, where most mutations are basically neutral, and only rarely are they significantly deleterious). It’s like a programming language where every statement is an implicit try: with a fallback of except: pass, but not really, since the point is that the language doesn’t care about trivial errors in syntax. It exhibits smooth fitness gradients.

The second saving grace is that our programmer gets immediate user feedback. They say the reason why FOSS works, despite the lack of revenue, is because of immediate user testing and feedback (see: The Cathedral and the Bazaar by Eric Raymond). Open source is open, and rather than hammering out code for months or years in a closed environment, FOSS developers get constant, immediate feedback from a large testing community, which improves the code faster. Well, that’s evolution. The genome is open source, and there is no production, no final release, only testing.

Our programmer doesn’t know how to code. She mostly types random stuff, mashes on the keyboard, and patches from /dev/urandom. Actually, she almost exclusively patches from /dev/urandom, for lack of any other insight, but everything she writes gets quickly thrown out to user testing (the environment), and she gets immediate feedback.

Along the way, she randomly forks the code into various branches and tries out different things. Over time, the branches diverge to the point where they are no longer interoperable, but some relics of their shared history remain. Eventually, many branches are discarded.

The third saving grace is that she literally has all the time in the world. She’s not constrained by deadlines, and it took her a few hundred million years to publish the first usable code. In this case, it was code that accomplished the singular task of making more copies of itself. And why not? Since she had no goal in mind, it should be obvious that the code most likely to persist would simply be code that replicates itself. All other tasks that it eventually accomplishes are secondary to that goal .

And that’s how it goes for millenia. Despite the massive inefficiency of this process, it works because our programmer is extremely productive. She shotguns the problem. She iterates on the code billions of times a day and handles a bug tracker of unfathomable proportions.

What is the end result? Incredibly complex code with little underlying logic that just works, most of the time. That is why Kurzweil and many computer scientists and engineers are wrong about their predictions of biomedical progress. You’re not just reverse engineering the Kinect or some proprietary code, which you know has a purpose and internal logic. Cracking the genome is not like cracking MD5, where the time it takes to arrive at a cryptographic solution is inversely proportional to computing power. To understand the genome, and biology more generally, you have to decipher every line of code empirically.

That requires research on real biological systems, which is hamstrung by things like reproductive output, generation time, ethics, and pure luck. David Linden is right: empirical progress in the biological sciences is much more linear than you imagine. And if some aspects of it are exponential, the exponent is much smaller than you think.

Kurzweil likes to use the Human Genome Project as an example of exponential growth. It took seven years to sequence the first 1% (or thereabouts), and most scientists directly involved in the project thought it would take much longer to finish the full genome. But they underestimated the power of exponential (sequencing technology) growth, and several doubling times later, a full (draft) sequence was published in 2003. That’s great, but then what?

Then the hard problem of empirically deciphering the genome became relevant, and a decade later, we still don’t have personalized medicine. We’re not even close to understanding the human genome in its entirety. We didn’t even fully appreciate the importance of non-coding RNAs until after it was completed. We have the source code but we don’t know what to make of it.

In the widest interpretation, the genome is code, but it’s nothing like the code that you write. The programmer is nothing like you. And deciphering biology is not a straightforward engineering task, because it wasn’t made by engineers. You can’t look for internal logic, because there is none. You have to decipher each line empirically, and that’s hard work which is not subject to Moore’s Law. You can’t project current computational trends to some distant point in the future and seriously predict the emergence of a specific discovery or innovation to within a few years.

That might work when the entire system is a fabrication of goal-oriented human logic. It doesn’t work for biology.

Just as sequencing the human genome didn’t automatically produce usable knowledge of genetics and development, Kurzweil’s predictions about high resolution neural imaging will not automatically produce usable knowledge of the brain and consciousness. That will require lab work.

In the end, the analogies are superficial, and you can’t reason your way to biological conclusions through the lens of computer science or engineering. You should really learn biology and the travails of biological research to make insightful statements about them.

Computation can be thought of as a set of operations on binary states: 1 or 0, True or False, left or right magnetization on a hard disk, pit or no pit on a compact disk. Boolean logic is a collection of such operations, which always evaluate to True or False. Logic gates perform sophisticated combinations of these operations. In the biological realm, since neural firing is an all-or-nothing event, it can also be thought of as a form of computation, even if the physiological properties of synapses are analog, and a lot of work has shown that neural structures function like logic gates.

By now you’ve probably heard of DNA computation, but have no idea how it works. After all, you’re used to thinking about computation as a property of human artifacts. But DNA molecules can also exhibit binary states: hybridized and not hybridized (among others). So, the essence of DNA computation is to take advantage of the hydrogen bonding properties of different oligomers to perform a series of hybridization / denaturation reactions in a reliable way. The rules of logic come from the sequences of the oligos. If one oligo always hybridizes at the same time as another, that’s an AND operation. If one always displaces another, that’s an OR operation. More sophisticated logic is derived from these simple rules.

A recent paper in Science describes the use of specially designed DNA oligos called “seesaws” to compute the square root of a four bit number. Granted, it took hours to solve that simple problem, which my netbook can do in microseconds, but it’s a start.

The potential benefits of this approach are, 1) DNA computation consumes far less energy than transistor based computation, and 2) DNA computation can be massively parallel, allowing it to solve certain kinds of problems much faster than traditional microchips. Energy consumption doesn’t seem like a problem if you only consider your own computer, but multiply that by 2 billion computers across the world, or just think of a single data center, and the energy consumption and heat dissipation problems it faces, and you’ll appreciate the benefits of even marginal improvements in efficiency.

Finally, keep in mind that DNA is a crappy molecule for just about everything except information storage. Biomolecular computation could be expanded to enzymes and ribozymes, opening up whole new possibilities and vastly improving the speed and efficiency of these systems.

I posted this on another forum, but it’s an important point, so I’ll repost it here.

Someone asked, “What’s wrong with using Windows to play games?”

The problem is that you’re trading the temporary convenience of gaming for the long term benefit of freedom. Software is the foundation of the 21st century infotech economy in the same way that steel was the foundation of the 20th century industrial economy. As our lives are increasingly managed by software, we have to decide whether we control the software or it (and the corporations, colluding with governments, that provide it) control us.

Information asymmetry is an important phenomenon to understand. You’re better off keeping secrets and spying on others. When governments do it to their people, they successfully control the people. When people maintain their privacy rights while forcing institutional and governmental transparency, they control their governments.

The same principle applies to software, but transparency in software means open source. If you can’t read the code, it may be spying on you, distributing badware, or who knows what else. The Chinese government recognizes this, which is why it requires access to parts of the Windows code (with properly signed NDAs) before it will buy Windows, to make sure the software hasn’t been backdoored by the US government (and the Chinese government knows a thing or two about information asymmetry, for better or worse). This is a privilege that Miscrosoft will never grant you or me.

Proprietary software creates arbitrary restrictions, artificial scarcity, and walled gardens. Do we want to live in a world where software promotes restrictions and control, or where it promotes freedom? This is an important choice to make at a time when software increasingly manages our photos, books, music, social interactions, finances — our very lives. We can’t wait to make the right choice until all of our data is on someone else’s server, locked in someone else’s proprietary formats, and subject to someone else’s capricious terms of service. We can’t wait until corporations and governments have abused their privileges before we choose free software.

We have to choose free software now.

I got my Ancestry Painting results.

I will post about my own results in the near future. For now, I want to reiterate the dangers of genetic determinism. Here are two quotes that I found on the 23andMe web site:

While it is good to know that I carry no debilitating genes and I am very healthy, running little risk of anything. It’s also disheartening to know that I am very much typically, genetically, average. I guess I just HOPED I was more. I’m currently working on my 2nd (and 3rd) bachelor degree and sporting a 3.9 GPA. I know genes aren’t everything, but I feel like mine aren’t even worth passing down. — Anon1

I don’t know why this is surprising. Most people seem to be underwhelmed by their genetic data, but rationality teaches us that we should assume we’re perfectly average. Culture makes everyone think they’re special.

And if you’re looking for validation in your genes, you should get counseling, not genotyping.

I had similar feelings after getting my results. The most insignificant thing affected me the most. I always pride myself on my intellect, although I always had a suspicion that I wasn’t as smart as I told myself. I have always had difficulty learning, and come to find out I carry (GG) for rs363050 which is a measure of non-verbal IQ. It was devastating to me, because everyone else on my sharing list are typical. — Anon2

Anon2 was “devastated” by his genotype at a single SNP, rs363050, located in the synaptosomal-associated protein, 25kDa, or SNAP25 gene. According to one study:

The synaptosomal-associated protein of 25 kDa (SNAP-25) gene plays an integral role in synaptic transmission, and is differentially expressed in the mammalian brain in the neocortex, hippocampus, anterior thalamic nuclei, substantia nigra and cerebellar granular cells. Recent studies have suggested a possible involvement of SNAP-25 in learning and memory, both of which are key components of human intelligence. In addition, the SNAP-25 gene lies in a linkage area implicated previously in human intelligence. In two independent family-based Dutch samples of 391 (mean age 12.4 years) and 276 (mean age 37.3 years) subjects, respectively, we genotyped 12 single-nucleotide polymorphisms (SNPs) in the SNAP-25 gene on 20p12-20p11.2. From all individuals, standardized intelligence measures were available. Using a family-based association test, a strong association was found between three SNPs in the SNAP-25 gene and intelligence, two of which showed association in both independent samples. The strongest, replicated association was found between SNP rs363050 and performance IQ (PIQ), where the A allele was associated with an increase of 2.84 PIQ points (P=0.0002). Variance in this SNP accounts for 3.4% of the phenotypic variance in PIQ.

You read that right. 96.6% of the variance in “performance IQ” has nothing to do with this SNP. A much higher percentage of the variance may be entirely environmental. It is important to look at the effect size of any association, along with the overall estimated heritability of the trait. You will find that most complex traits have many associations of small effect, so a single genotype or environmental condition should be taken with a grain of salt. Except in rare cases of Mendelian traits or large effect sizes, these results should not be devastating.

For the record, I also have a GG genotype at this marker, and I scored in the 99.8 percentile on an official, proctored IQ test as an adult.

22. March 2011 · Comments Off · Categories: Internet, Linux, Technology · Tags: , , , , ,

To get specs on a machine, there’s the obvious stuff:

cat /proc/cpuinfo
df -H
free -m
ifconfig -a
sudo lshw > hardware.txt

But to test the disk I/O performance, you can do this:

dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync

Then you’ll also have a 1 GB file that you can wget to test the bandwidth.

14. March 2011 · Comments Off · Categories: Internet, Linux, Technology · Tags: , , , , ,

I’m writing these posts to help myself remember. After all, you don’t really learn something until you teach someone else, right?

So I’ve been version tracking some Matlab scripts with git, and I decided to make a public repository. Here’s how I did it.

First, if you haven’t started tracking changes yet, install git and and initialize a repository:

sudo apt-get install git
cd /path/to/files
git init
git add .
git commit -am “I wish I was as commited as these files!”

Now, on the remote machine, install and setup gitosis:

sudo apt-get install gitosis
sudo -H -u gitosis gitosis-init < ~/.ssh/id_rsa.pub

That last part is my public SSH key. I created the public (gitosis) repo on the same machine as my development (git) repo. If you’re creating the gitosis repo on a remote machine, you’ll have to copy your public key over first. Just remember to put it in a file that ends in .pub.

Ok, back on the development machine, clone the admin repo:

git clone gitosis@mybox.net:gitosis-admin.git

Open up the gitosis config file:

nano [or vim or whatever] /path/to/gitosis-admin/gitosis.conf

And add two sections:

[repo MyProject]
description = “My witty description”
owner = me

[group MyProject]
writable = MyProject
members = me bro dannyboy blue gaga

Notice how repo, group and writable are all the same. Also, the members will be based on SSH keys that you add to /path/to/gitosis-admin/keyfiles, and they should all end in .pub.

We need to commit and push these changes:

git commit -am “added MyProject”
git push origin master

Now go to your personal project directory and set that up:

git remote add origin gitosis@mybox.net:myproject.git
git push origin master

That’s it. The other members of your team can clone the project with:

git clone gitosis@mybox.net:myproject.git

And start pushing and pulling to their heart’s desire.

BTW, did you know that Linus Torvalds, the founder of the Linux kernel, created git, and many people consider it to be his greatest software contribution?