Grizzly Peak fitting algorithm

We developed a model-based multi-peak algorithm - Grizzly Peak - to accurately identify significant ZLD bound loci across the genome. Grizzly Peak is an iterative model-based peak fitting method, which we modified from Capaldi et al. In brief, Grizzly Peak estimates the expected shape of a binding event in ChIP-seq measurement. The algorithm then iteratively scans the genome and identifies enriched regions with high protein occupancy. These regions are expanded and analyzed, aiming at finding a minimal set # of peaks (each with a genomic position and an occupancy level) optimizing the fit to the measured data. To allow for overlapping peaks, we devised a simple heuristic for considering actions such as adding or removing peaks. Each step is then assigned a score, and steps are taken if a significant improvement in the score is achieved. Once a genomic region has been analyzed and fitted, the optimized set of peaks is recorded, and this genomic region is discarded from future fitting. This process is repeated until no significantly bound loci remain.

Grizzly Peak (implmented in MATLAB)

pf0.m

Data files from Zelda binding in flies (Harrison et al., 2011)

Genome file (UCSC dm3)
Zelda ChIP-seq data from mitotic cycles 8-9
Zelda ChIP-seq data from mitotic cycles 13-14
Zelda ChIP-seq data from late mitotic cycle 14

Please cite:

Thanks!

Tommy Kaplan
tomkap@berkeley.edu