I study molecular sequences and (gene, species, coalescent) trees as discrete objects using algorithms (in the traditional sense), probabilistic data structures, and a bit of probabilistic modeling.
During my PhD, I developed several metagenomic sequence analysis tools, targeting efficient taxonomic and/or phylogenomic analysis of large-scale datasets. Some examples are:
- CONSULT-II – using locality-sensitive hashing for taxonomic identification
- KRANK – a memory-bound and taxonomy-aware k-mer sampling algorithm
- krepp – read to genome distance estimation and phylogenetic placement
In the past, I briefly worked in network analysis and developed a community detection method for dynamic gene co-expression networks (MuDCoD). During my undergraduate and master’s studies, I focused mostly on (applied) machine learning, in particular time series analysis in computational ethology (behavioral analysis of sleep in fruit flies), and active learning in natural language processing. I have never worked in the industry.