Scalable bioinformatic pipelines to interpret the soil microbiome

A pipeline to automate the processing of whole genome sequencing files to generate microbiome insights


Our client wanted to create a platform to generate microbiome insights from agricultural samples to support precision agriculture. Bioinformatic pipelines are extremely complex, compute intensive, and require regular updates as new findings from scientific literature become adopted into practice.

Our client needed a system to automatically analyze whole genome sequencing files through various bioinformatic pipelines with the capacity to scale up and down depending on the load, without human intervention.


We partnered with our client and their academic advisor to identify the most applicable bioinformatic analyses to assemble the pipeline. We designed and deployed scalable cloud architecture inside their Microsoft Azure Kubernetes environment, enabling the automatic scaling of compute intensive workflows. We also developed separate Docker containers for each step of their bioinformatic pipeline allowing easier long term maintenance, testing, and modifications of the platform. Combining event triggers and Argo Workflows created a cost effective bioinformatic pipeline requiring no human intervention and allowing our client to  scale their service offerings to more customers.

