Noisy High-Throughput Biological Data

  • Noisy because
    • Biological phenomena are "noisy" ~ heterogeneity is the important feature of biology.
    • Experiments have many factors that add noise to data.
    • Highthroughput systems realize "highthoughput" by sacrificing preciseness somehow.
  • Noises are informative
    • Large-scale data have information on "shape" of noise and it can be used to quality-control/adjust the noisy data to purify "non-noisy signals".
  • 1000 samples, SNP chip for 50K SNP genotyping.
  • Success rate of individual samples.
N <- 1000
n1 <- 650
n2 <- 300

a <- rbeta(n1,50,2)
b <- rbeta(n2,100,20)
x <- rbeta(N-n1-n2,30,50)
hist(c(a,b,x),breaks = seq(from=0,to=1,length=40))