ABOUT US

Synthesize Bio is an early stage startup using generative AI to fundamentally change life sciences research and accelerate the pace of biomedical discovery.

Genomic data are at the heart of most modern molecular studies and tools, from basic research to clinical decision support, but these data are hard to work with. Genomic data require expensive laboratories, sophisticated computational infrastructure, and big teams to produce and analyze. Generating and analyzing genomic data takes weeks or months. We’re building generative AI models to cut that time down to minutes or hours.

Our platform addresses a huge need in the large and rapidly growing use of genomics in research, from basic science to therapeutic development. mRNA therapeutics alone - a subset of the overall market we serve - is projected to be a $100 billion market by 2026. This is an opportunity to be part of a mission to transform life sciences R&D by dramatically accelerating the pace of discovery.

WHO WE ARE

Our team has dedicated their careers to moving the needle in biomedical research and education and has a deep understanding of how to build a transformative product in this space. Our founders each have ~20 years of experience in life sciences, genomic data generation and analysis, and AI/ML and data science. Our team has leading experts in the design, execution, collection, and application of AI/ML methods to large biological studies; has previously led and scaled startups; and are experts in scientific training and enablement. We have lived the problem and are ready to solve it!

WHAT YOU’LL DO IN THIS ROLE

We are looking for a (senior) bioinformatics data engineer who wants to join our mission and be an early part of building our AI-powered platform at the intersection of machine learning and life sciences discovery. As an early member of the team, you'll have significant opportunities to shape both the technical stack and the company’s long-term direction.

You will help build out our data and bioinformatics architecture, including:

Be foundational in the development of the roadmap for scalable, efficient, and future-proof data and bioinformatics infrastructure
Collaborate across all our technical teams (Data, AI, and Platform) to develop data solutions that enable all teams. For example, improving performance and access to our training database (10s of millions of samples) for training our AI models
Implement new and improve existing data processing pipelines with an emphasis on automation, lowering costs, and improving robustness

TYPICAL QUALIFICATIONS

The best candidates do not always match the limited criteria mentioned in the job description. If you are excited about our vision and feel that you could be a valuable asset to our team, please apply.

This role requires experience with processing, storing, and building solutions to access genomics data and sample metadata; the exact types of data can vary, but a desire and ability to learn quickly is essential. An interest in applying your skills to our mission is the key ingredient.

Technical skills & Experience

Must have…

Fluency with data manipulation and processing in Python, R, and SQL (e.g., dplyr and pandas) and collaborating with git/GitHub
Experience managing and working with large datasets in the cloud (AWS and/or GCP)
Familiarity with cloud database solutions and emerging technologies (e.g., Athena, BigQuery, tileDB, more)
Familiarity with containers (Docker, Singularity) and workflow managers (Nextflow, WDL, CWL, etc.)
Bioinformatics experience, including working with common genetic and genomics data formats (e.g., BAM, FASTQ, VCF) and processing pipelines (e.g., nf-core pipelines)
Excitement about generative AI and interest in employing AI to change how science happens

Ideally you have some….

Experience applying LLMs for data discovery, curation, or processing
Experience at a commercial biotech company
Familiarity with approaches to automate data processing
Experience processing raw bulk and single-cell genomics data (including but not limited to RNA-seq)

No minimum number of years or degree requirements, provided demonstrated understanding of the range of approaches and tools used in genomics data processing and data storage

Approach

Open-minded and adaptable; willing to change focus and direction quickly
Comfortable working both independently and in close collaboration with your own team as well as other teams with complementary expertise
Strong communication and desire to share results and results liberally with others

WHY YOU’LL LOVE WORKING WITH US

Our founders have strong reputations as thoughtful, forward-thinking colleagues and mentors who cultivate talent. We’ll think intentionally and collaboratively about your career and how we build our team. Our founders’ experiences in workforce development and commitment to diversity, equity, and inclusion are central to our approach.

Impact — Our platform will accelerate the pace of scientific discovery and therapy development to advance human health. This is an opportunity to build something from the ground up in a space that has a real, positive impact.
Early employee impact — As a member of the very early team, you will play a pivotal role in helping shape our product, team, and culture. For the rest of our days, no matter how many thousands of people join after you, you will always have that honor and distinction. It looks great on a resume, too.
Early stage equity — A benefit of joining early. None of us (neither our founders nor our investors) would be here if we didn’t think that our company will create tremendous value over time.
Flexible location and hours – We are whole people with whole lives and expect that you are, too. We’ll trust each other to make progress as a team, in times and places that work for everyone. Preference for Seattle area. Travel to Seattle will be required from time to time – it’s a beautiful city!
Visa support - We cannot provide visa support at this time.

TIMELINE

All applications received by December 31st, 2024 will be reviewed. Interviews will start in early January 2025 for start dates between then and no later than late-February.

SALARY

Computational Biologist $110-145K
Senior Computational Biologist $140-170K
Principal Computational Biologist $165-185K

Bioinformatics Data Engineer - Open to Remote