Pratik Dutta প্রতীক দত্ত
Pratik Dutta
স্বাগতম · Welcome

Pratik Dutta

প্রতীক দত্ত
Decoding DNA sequence to disease mechanism through interpretable AI.
Senior Research Scientist
Department of Biomedical Informatics · MART 07-0601
Stony Brook University, Stony Brook Cancer Center
1 Lauterbur Dr, Stony Brook, NY 11794
About

Pratik Dutta is a Senior Research Scientist in the Department of Biomedical Informatics at Stony Brook University, working in Prof. Ramana V. Davuluri's lab. His research builds interpretable genomic foundation models that learn the language of DNA and can explain the biological mechanisms behind their predictions, with applications across cancer, neurodegeneration, viral genomics, spatial biology, and population-scale variation. He is broadly interested in closing the gap between the scale of modern foundation models and the interpretability we need to act on them clinically, and his recent work spans pre-training large genomic language models, predicting the regulatory impact of non-coding variants, and orchestrating multi-modal reasoning over biological knowledge. Before Stony Brook, he completed his PhD at IIT Patna under Dr. Sriparna Saha as a Visvesvaraya Research Fellow, with bachelor's and master's degrees from IIEST Shibpur. He is currently exploring tenure-track faculty opportunities and welcomes prospective collaborators and students.

DNA · embeddings · interpretation
Research

Five research threads, organized by capability.

Five interlocking threads, each sharing one goal: AI that doesn't just predict, but explains. From building foundation models for the genome to extending them across spatial tissue context and human population variation.

01 · Foundation models

Genomic language models

Pre-training transformers on raw DNA so they learn the regulatory grammar, splicing, and chromatin context biology cares about. Co-developer of the DNABERT series.

DNABERT-2 DNABERT-Enhancer HViLM DNABERT-MB
02 · Variant impact

Genomic variant interpretation

462 fine-tuned regulatory models that score the impact of non-coding variants across pan-cancer and neurodegenerative cohorts, with attention-based motif evidence. Next: agentic orchestration over the model bank.

DeepVRegulome Pan-cancer Neurodegeneration
03 · Mechanism

Mechanistic reasoning over models

Regulome-R orchestrates pretrained genomic models with biological knowledge graphs and literature, moving from "this variant is significant" to "this variant disrupts this TF in this cell type."

Regulome-R Agentic AI Knowledge graphs
04 · Tissue context Emerging

Spatial transcriptomics for variant effect

Extending interpretable foundation models into spatial tissue context: benchmarking deep learning architectures and quantifying uncertainty for cell-cell interaction inference. Active collaboration with Oak Ridge National Laboratory.

Spatial Cell-cell ORNL
05 · Diversity Emerging

Foundation models for population genomics

Using genomic language model embeddings to study human population structure and ancestry-specific regulatory variation, with the goal of building interpretable variant effect predictors that work across diverse populations.

1000 Genomes Population structure Equity
Recent

News & milestones

Selected work

Publications & impact

0 Citations Google Scholar
0 Open-source models released Hugging Face
2026
mSystems · ASM · 1st author

HViLM: A Foundation Model for Viral Genomics Enables Multi-Task Prediction of Pathogenicity, Transmissibility, and Host Tropism

Pratik Dutta, Jack Vaska, Pallavi Surana, Rekha Sathian, Max Chao, Zhihan Zhou, Han Liu, Ramana V. Davuluri

mSystems, American Society for Microbiology, 2026 Under review

2025
arXiv · 1st author

DeepVRegulome: DNABERT-based Deep-Learning Framework for Predicting the Functional Impact of Short Genomic Variants on the Human Regulome

Pratik Dutta, Matthew Obusan, Rekha Sathian, Max Chao, Pallavi Surana, Nimisha Papineni, Yanrong Ji, Zhihan Zhou, Han Liu, Alisa Yurovsky, et al.

arXiv:2511.09026, 2025

2024
ICLR 2024

DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genomes

Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana V. Davuluri, Han Liu

International Conference on Learning Representations (ICLR), 2024

2026
NAR Genomics & Bioinformatics

TSProm: Deep Learning Framework to Predict Tissue-Specific Regulatory Logic

Pallavi Surana, Pratik Dutta, Nimisha Papineni, Rekha Sathian, Zhihan Zhou, Han Liu, Ramana V. Davuluri

NAR Genomics and Bioinformatics, Oxford Academic, 2026 NARGAB-2025-333.R1

2025
ISMB/ECCB · 2nd place CAMDA

Predicting Antimicrobial Resistance Using Microbiome-Pretrained DNABERT-2 and DBGWAS-Derived Genomic Features

Jack Vaska, Pratik Dutta, Max Chao, Rekha Sathian, Zhihan Zhou, Han Liu, Ramana V. Davuluri

31st Annual Intelligent Systems for Molecular Biology / 22nd European Conference on Computational Biology (ISMB/ECCB), 2025 CAMDA 2nd place

2024
Bioinformatics · Oxford

TransTEx: Novel Tissue-Specificity Scoring Method for Grouping Human Transcriptome into Different Expression Groups

Pallavi Surana, Pratik Dutta, Ramana V. Davuluri

Bioinformatics, vol. 40, Oxford Academic, 2024

2023
Bioinformatics Advances

Deep Multi-Omics Integration by Learning Correlation-Maximizing Representation Identifies Prognostically Stratified Cancer Subtypes

Yanrong Ji, Pratik Dutta, Ramana V. Davuluri

Bioinformatics Advances, vol. 3, vbad075, Oxford University Press, 2023

2021
IEEE/ACM TCBB · 1st author

DeePROG: Deep Attention-based Model for Diseased Gene Prognosis by Fusing Multi-omics Data

Pratik Dutta, Aditya Prakash Patra, Sriparna Saha

IEEE/ACM Transactions on Computational Biology and Bioinformatics, IEEE Computer Society, 2021

2020
Scientific Reports · Nature

A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering

Pratik Dutta, Sriparna Saha, Sanket Pai, Aviral Kumar

Scientific Reports, vol. 10, Nature Publishing Group, 2020

2020
IEEE J-BHI

MultiPredGO: Deep Multi-Modal Protein Function Prediction by Amalgamating Protein Structure, Sequence, and Interaction

Swagarika Jaharlal Giri*, Pratik Dutta*, Parth Halani, Sriparna Saha  *equal contribution

IEEE Journal of Biomedical and Health Informatics, IEEE, 2020

2025
MLCB 2025

Augmenting DNABERT Embeddings with Multimodal DNA Features for Improved Regulatory Sequence Interpretation

Nimisha Papineni, Pratik Dutta, Max L. Chao, Rekha Sathian, Pallavi Surana, Ramana V. Davuluri

20th Machine Learning in Computational Biology (MLCB), 2025

2025
NeurIPS Workshop

Artificial Intelligence for Spatial Transcriptomics: A Scoping Review of Architectures and Models

Agampreet Saini, Supragya Gandotra, Abhijit Kumar, Pratik Dutta, Tirthankar Ghosal

NeurIPS 2025 Workshop on Imageomics: Discovering Biological Knowledge from Images Using AI, 2025

2020
ACL · Core A*

Amalgamation of Protein Sequence, Structure and Textual Information for Improving Protein-Protein Interaction Identification

Pratik Dutta, Sriparna Saha

Annual Meeting of the Association for Computational Linguistics (ACL), 2020 Core A*

Trajectory

Experience & education.

2009 — 2013

B.E., Computer Science & Technology

Indian Institute of Engineering Science and Technology (IIEST), Shibpur · Formerly Bengal Engineering and Science University

2013 — 2015

M.E., Information Technology

Indian Institute of Engineering Science and Technology (IIEST), Shibpur · Advisor: Dr. Hafizur Rahaman

2016 — 2020

Ph.D., Computer Science & Engineering

Indian Institute of Technology Patna · Visvesvaraya Research Fellow · Advisor: Dr. Sriparna Saha

2020 — 2021

Research Intern

Strand Life Sciences, Bangalore · BioBERT for clinical phenotype extraction · Advisors: Dr. Vamsi Veeramachaneni and Dr. Rajesh Sundaresan (IISc)

2021 — 2023

Postdoctoral Research Associate

Stony Brook Cancer Center · Stony Brook University · Advisor: Prof. Ramana V. Davuluri

2023 — Now Current

Senior Research Scientist

Department of Biomedical Informatics · Stony Brook University · Davuluri Lab

Recognition & service

To the research community.

Open-source releases used by groups worldwide, peer review for top journals and conferences, invited talks, and active mentorship of graduate and undergraduate researchers.

Get in touch

Looking for collaborators, students, and a faculty home.

The best way to reach me is by email. I am on the academic job market for tenure-track positions and welcome conversations with prospective collaborators, students, and search committees.