05. January 2012 · Comments Off · Categories: Entertainment, Media, Personal · Tags: , ,

Here’s a test of the High HDR effect on the Camera360 app for Android. Not bad.

13. December 2011 · Comments Off · Categories: Personal · Tags: , ,

I haven’t posted in a while, but I should, especially now that I’m done with quals.

Oh, I know nobody reads this, but I enjoy writing. It’s a great way to organize my thoughts, and if somebody else finds my ramblings useful, all the better.

Maybe I’ll start posting more often.

A new version of DIY Dodecad was released earlier today, so I dutifully reanalyzed my 23andMe genotype data. You can see the results of the first analysis here. And now the second, which is pretty similar:

        12 ancestral populations
    166462 total SNPs
     20190 flipped SNPs
     52577 heterozygous SNPs
        91 missing values
  0.999453 genotype rate
      mode genomewide
 
    91 SNPs missing
 
  9130  dQ:  1.002E-07  goal:  1.000E-07
 
      9138 total iterations
 9.999E-08 final dQ
 
 ----------------------------
 FINAL ADMIXTURE PROPORTIONS:
 ----------------------------
 
 25.79%  East_European
 39.76%  West_European
 22.55%  Mediterranean
  0.12%  Neo_African
  9.37%  West_Asian
  0.05%  South_Asian
  0.02%  Northeast_Asian
  1.28%  Southeast_Asian
  0.00%  East_African
  0.91%  Southwest_Asian
  0.13%  Northwest_African
  0.03%  Palaeo_African

In this version, you can edit the configuration file and produce graphs like this. A complete list of graphs for my chromosomes can be found here.[1] You’ll notice that the top lines are usually orange or red, as expected by the high proportion of East and and West European alleles, but other populations show up in hot spots, for example the high proportion of West Asian alleles on chromosomes 5 and 6, and Mediterranean alleles on chromosome 13.

[1] You can write a for loop in R to generate all the graphs, like this:

for (CHROM in c(1:23)) paint_byseg(chr=CHROM, calc='dv3', tofile=T, width=1600, height=600)
30. July 2011 · Comments Off · Categories: Internet, Personal · Tags: , ,

This blog jumps around a lot. I just got a VM from AlienLayer: 190 MB RAM, 19 GB storage, 190 GB bandwidth, for — you guessed it, $19 a year. The server is in New York. The ping response is pretty good from my house (about 40 ms), which means there’s no lag over ssh. The node seems really fast, too. It’s comparable to my ChicagoVPS box, where The Evolving Scientist Podcast is hosted.

The Evolving Scientist Podcast runs on MediaCore, which is the Python app mentioned in the previous post. Since MediaCore talks to Apache through WSGI, and I can’t seem to get vhosts working properly with WSGI, for now my blog is a nomad. I was hosting it out of my lab for the last week, but figured I needed something more reliable than my personal computer.

Oh yeah, for the geeks out there, it’s an LNMP stack: Nginx running on Debian 6, with PHP-FPM pulled from Testing. It currently consumes ~130 of the 190 MB RAM.

29. July 2011 · Comments Off · Categories: Internet, Personal · Tags: , , ,

I cannot for the life of me get vhosts to work properly while running WSGI. I’ve searched for a solution, but nothing works. Right now, requests for all domains land on the Python app or I get an error that the request is not redirecting properly (or will never complete). I don’t know if anybody actually reads this blog (it gets a couple hundred hits a day, but that could all be spam bots), but if you do, and you know how to properly configure vhosts on Apache with WSGI, please let me know.

I definitely need to read more about how this works, but for now I followed Razib Khan’s ADMIXTURE tutorial and did an ancestry analysis on my genotype data. The assumption that I used was 8 ancestral populations (K), although the ADMIXTURE Manual basically says this should be determined empirically:

How do I choose the correct value for K?

Use ADMIXTURE’s cross-validation procedure. A good value of K will exhibit a low cross-validation error compared to other K values. Cross-validation is enabled by simply adding the –cv flag to the ADMIXTURE command line. In this default setting, the cross-validation procedure will perform 5-fold CV—you can get 10-fold CV, for example, using –cv=10. The cross-validation error is reported in the output.

It goes on to explain how to automate runs for multiple values of K, which I’m doing now. On a cluster :-P

Anyway, the output was this graph (it’s huge, so I’m not embedding it). It’s kind of “busy”, but I highlighted my result with a purple line, which is about 1/5 of the way from the right. The strange thing is that the Bengali ancestral proportions are exactly the same as (and immediately to the left of) my bar. So what does that mean? My interpretation (which may be wrong) is that, given 8 theoretical ancestral populations, Bengalis and I share the same proportions of all the founders. That doesn’t mean it’s the same alleles, but it does mean the same proportion of alleles. In other words, Bengalis and I may get 10% of our alleles from Founder Group Green, but it’s not necessarily the same 10%. However, if Bengalis and I have inherited 80% of our alleles from Founder Group Red, then we almost certainly share many alleles. The problem is that my Dodecad results suggest very few South Asian alleles, and by that I mean 0.01%.

Other groups with a high proportion of Red include various Jewish and Middle Eastern populations. Meanwhile, if I’m European (and I’m almost certain that I am), my bar should be mostly Orange.

It is also possible that I screwed up somewhere along the way (in appending my data to the file, for example). I don’t know. How can I check?

I suppose one set of controls will be the other K values that I’m running.

As I’ve pointed out before, I’m extremely white. So, I didn’t expect any surprises with the recently released DIY admixture analysis tool. However, your results may be more interesting, so here’s the program, and if you’re on a Debian-based Linux distro, here’s a shell script that automates the analysis (just extract your 23andMe genotype data, name the file mysnps.txt, and run the shell script from the same folder).

So, here are my results:

166462   markers
12   ancestral populations
Genotype rate is .99945
Beginning EM iterations:
# 19810  loglik: -1.5913858752E+05  delta: 9.993E-07  goal: 1.000E-06
-----------------------------
FINAL ADMIXTURE PROPORTIONS:
19810 iterations
Log Likelihood =  -1.5913858752E+05
-----------------------------
East_European                                                    25.80%
West_European                                                    39.77%
Mediterranean                                                    22.55%
Neo_African                                                       0.11%
West_Asian                                                        9.39%
South_Asian                                                       0.01%
Northeast_Asian                                                   0.00%
Southeast_Asian                                                   1.31%
East_African                                                      0.00%
Southwest_Asian                                                   0.92%
Northwest_African                                                 0.10%
Palaeo_African                                                    0.03%

As expected, I’m mainly European, but somewhat surprisingly, it claims that I’m 1.31% Southeast Asian.

That’s interesting. I will have to learn more about how this differs from 23andMe’s ancestry analysis.

27. June 2011 · Comments Off · Categories: Linux, Personal, Philosophy · Tags: , , ,

Learning the Linux command line has turned out to be a great decision, now that I work almost exclusively at the computer. Don’t get me wrong: I wouldn’t call myself an expert. I’ve been using the command line for the last two years, but I still learn something new every week. However, the command line has increased my productivity by 2 to 10 fold compared to how I would have done things the GUI way.

Quite often, if you dedicate yourself to learning something “hard”, you will get much bigger rewards in the end. A good example is typing. My mother has been using a computer for almost 20 years, and she still hunts-and-pecks. I took a keyboarding class in high school. It was onerous and boring, but now I can type 80-90 wpm, whereas I would have hit an asymptote of about 40 wpm if I were still doing it her way. That commitment has paid dividends for the last 15 years, and if I spent 200 hours in keyboarding class, I’ve gained 2000 back by typing faster.

Now the command line is paying similar dividends. Say that I need to find every CSV file spread out across 20 folders, copy them to a central location, and rename all of them by prepending a word. I can do that in two commands and about 10 seconds. Mucking around a graphical file manager, it would take me an hour.

It’s a good life lesson: learn to do things that are hard.

25. May 2011 · Comments Off · Categories: Internet, Linux, Personal · Tags: , , , , , ,

I’m going to run this blog on my BuyVM box for a while. It’s still an LNMP stack, but with Debian 6 instead of Ubuntu 10.10, which is fine, since Debian always runs lighter than Ubuntu anyway. The same stack on my Ubuntu box consumes ~235 MB of RAM, while on this box it consumes ~105 MB.

The new box has 256 MB of guaranteed RAM, 30 GB of storage space, and 1 TB of bandwidth, so I’m not close to exhausting my limits. The only thing I worry about is that it’s hosted at the Hurricane Electric data center in Fremont, CA, and ssh’ing to a box that far away can be laggy.

22. May 2011 · Comments Off · Categories: Internet, Personal · Tags: , ,

I got an email yesterday morning from Pingdom, informing me that my web site was down. Looking further into it, the whole node was offline for what ended up being almost 13 hours. This came right after I spoke so glowingly of my hosting provider.

Well, turns out it wasn’t their fault. In an email sent out this morning, they explained that the node had been DDoSed. I can kind of understand why a bunch of bored teenagers would DDoS Visa and Mastercard, since, you know, those companies were actively destroying democracy by blacklisting an organization that promotes government transparency, or something. But who on my node, in my data center, is important enough to get DDoSed? What were they doing and who did they upset? I’ll never know.

It’s a shame, too, because my node has otherwise had great uptime the last few months.