ParaHaplo is developing parallel computing tools for genome-wide association studies.
Motivation: Programmable sequence-specific endonucleases are powerful tools for genome alteration with high precision. For example, the CRISPR system is an efficient tool for genome engineering in eukaryotic cells by simply specifying a 20-bp targeting sequence within its guide RNA. When studying large genomes, however, the design of target sequences is complicated by the redundancy of sequences.
Results: In this paper, I describe the development of a novel method, UF, for detecting unique 20-bp sequences in entire genomes, and use this method to assess the distribution of unique sequences in the human genome. Because approximately 80% of human genome sequences are unique, UF is expected to be useful for CRISPR design.