Hubert Life (out2)

2019년 3월 31일 일요일

오후 9:15

Phi X 174

by
Phi X 174

Phi X 174


What is it?

    1. a single-stranded DNA(ssDNA) virus that infects Escherichia coli
    2. the first DNA-based genome to be sequenced in 1977
    3. Well-defined, small(5,396bp), and diverse(45% GC, 55% AT) genome
    4. fasta file download link:
        1. PhiX_from_Illumina
        2. PhiX_from_NCBI
    5. Using it as a positive control in Illumina NGS



What are benefits of using PhiX control?

    1. Calibration Control: can be run alone and serves as a calibration control foe;
        1. Cluster generation: can be used as a positive control in the clustering process
      PlatformMode/ReagentsOptimal Raw Cluster Density
      HiSeqHigh Output, TruSeq v3750-850 K/mm²
      High Output, HiSeq v4
      (required upgrade)
      950-1050 K/mm²
      Rapid v2850-1,000 K/mm²
      MiSeqv21,000-1,200 K/mm²
      v31,200-1,400 K/mm²
      MiniSeqMid and High Output170-220 K/mm²
      NextSeqMid and High Output, v2170-220 K/mm²
      [table 1] Cluster density guidelines for Illumina sequencing platforms
        2. Cross talk matrix generation
            1. During an illumina sequencing run, the cross-talk due to spectral overlap between the 4 fluorescently labeled nucleotides is calculated during template generation in cycle 1-5
                2. https://www.slideshare.net/idtdna/unique-dualmatched-adapters-mitigate-index-hopping-between-ngs-samples
            3. Phasing and Prephasing
                1. During sequencing by synthesis, each DNA strand in a cluster extends by 1 base per cycle
                  2. A small proportion of strands may become out of phase with the current cycle, either falling a base behind(phasing) or jumping a base ahead(prephasing)
                    3. For best results, use a PhiX spike-in as a control with any library that does not comprise a balanced base composition
                      4. High GC samples(≧ 60%) typically show higher phasing rates, and in this case a PhiX control is required


              2. Run quality monitor: due to its small size and balanced nucleotide composition, it's an ideal in-run control (typically with >= 1% spike-in) for run quality monitoring

          PlatformPhiX Aligned(%)
          iSeq 100minimum 5%
          MiniSeq10~50%
          MiSeq
          (MCS 2.2 or higher)
          minimum 5%
          NextSeq10~50%
          HiSeq 2500
          (HCS 2.2.38 or higher)
          minimum 10%
          HiSeq 3000/4000
          (HCS 3.3.76 or lower)
          10~50%
          HiSeq 3000/4000
          (HCS 3.4.0 or higher)
          5~20%
          NovaSeqminimum 10%
          [table 2] PhiX Control v3 library Illumina recommends spiking in when running low diversity libraries
              3. Color balancing
                  1. For low diversity libraries, the PhiX Control v3 library provides balanced fluorescent signals at each cycle to improve the overall run quality
                  2. You can find why the nucleotide diversity is important in here

          How to remove PhiX reads from the fastq


          What is nucleotide diversity and why is it important?

          What is nucleotide diversity and why is it important?

              1. High nucleotide diversity: when a library has roughly equal proportions of all 4 nucleotides in every cycle of the run
              2. The diagram below illustrates the diversity and base-balance of well-balanced and unbalanced libraries, and how that can be reflected in the % base plot of Sequencing Analysis Viewer(SAV)
          [fig 1] Illustrates of the diversity and base-balance

          Why is nucleotide diversity important?

              1. Nucleotide diversity is required for effective template generation and is important for the generation of high-quality data
              2. Diversity is especially important during the first 4-7 cycles of the first sequencing read for MiniSeq, MiSeq, NextSeq, and HiSeq 1000-2500 systems. The Sequencing software uses images from these early cycles to identify the location of each cluster in a process called template generation
              3. Diversity is also important for the first 25 cycles because this is when phasing/pre-phasing, color matrix corrections, and the pass filter calculations occur
              4. Real-Time Analysis(RTA) software need a proper PhiX is spiked-in. You can find more specific data in here
          ref)
          https://support.illumina.com/bulletins/2016/07/what-is-nucleotide-diversity-and-why-is-it-important.html

          2019년 3월 29일 금요일

          [K-MOOC] Data Analytics for Forecasting and Classification: Syllabus

          • Course: Data Analytics for Forecasting and Classification
          • Professor: Chi-hyuk Jeon / POSTECH
          • Goals
            • Understanding data analysis methods for forecasting and classification based on statistics
            • Cultivating data analysis skill and application ability by using data analytics methods
          • Prerequisite
            • Probability and Statistics, Linear Algebra, Optimization
          • Schedule
          [K-MOOC] Data Analytics for Forecasting and Classification: 1-1. Regression analysis, Simple regression model, Model estimation
          • Regression Analysis
            • In order to explain a variable, to analyze statistical causal relationships between related variables
            • independent variable: causes
            • dependent variable: outcomes
          • Regression Model
            • Simple Regression Model
              • 𝑿 ⇨ 𝒀
              • Observation: (𝑿₁,𝒀₁), (𝑿₂,𝒀₂), ... , (𝑿𝘯,𝒀𝘯) (𝑛 is observation number)
              • Simple Regression Model: 
                • 𝒀𝑖 = 𝜷₀ + 𝜷₁𝑿𝑖 + 𝑖,    𝑖 = 1,2, ... , 𝑛
                  • 𝑖: error term. 
                    • Assume that it follows a normal distribution with mean 0 and variance 𝛔²
                    • 𝑖~𝙉𝙤𝙧(0,𝛔²)
                  • 𝑿 is not random variable, but a given value
                  • so, three parameters need to be estimated
                    • 𝜷₁: slope of the linear equation
                    • 𝜷₀: intercept
                    • 𝛔²: variance of the error term
              • Estimation of intercept 𝜷₀ and slope 𝜷₁
                • Using least squares method
                • to minimize the objective function 𝐐
                • objective function 𝐐
                  • sum of the square of the difference between the observed value of dependent variable 𝒀, and the fitted value provided by the model on the linear line 𝜷₀ + 𝜷₁𝑿𝑖
                  • 𝐐 = ∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖)²
              • How to?
                • (𝑿,𝒀) is observed value, so let 𝐐 be a function of 𝜷₀ and 𝜷₁ 
                • and partially differentiate 𝐐 with respect to 𝜷₀
                  = -2∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖) = 0
                • and partially differentiate 𝐐 with respect to 𝜷₁
                  = -2∑(𝒀𝑖 - 𝜷₀ - 𝜷₁𝑿𝑖)𝑿𝑖 = 0
                • estimated equation: 𝒀-hat = 𝜷₀-hat + 𝜷₁-hat * 𝑿
              • Estimation of variance of the error term 𝛔²
                • Using sample variance of the residuals
                  • residual
                    substract the estimated value from the observed value of 𝒀
                    𝒆𝑖 = 𝒀𝑖 - 𝒀-hat = 𝒀𝑖 - 𝜷₀-hat + 𝜷₁-hat * 𝑿𝑖
                  • SSE
                    resudual/error sum of squares
                    = ∑(𝒀𝑖 - 𝒀𝑖-hat)²
                  • estimate 𝛔² by using MSE
                    𝛔²-hat = MSE(Mean Squared Error) = SSE / 𝑛-2
                    (𝑛-2) is  degree of freedom

          2019년 3월 28일 목요일

          [A6000 + 30.4] Piazzale Michelangelo6

          2019. 03
          from Piazzale Michelangelo, Florence, Italy
          Sony A6000 + Sigma 30mm f1.4
          [A6000 + 30.4] Piazzale Michelangelo5

          2019. 03
          from Piazzale Michelangelo, Florence, Italy
          Sony A6000 + Sigma 30mm f1.4
          [A6000 + 30.4] Piazzale Michelangelo4

          2019. 03
          from Piazzale Michelangelo, Florence, Italy
          Sony A6000 + Sigma 30mm f1.4