AIAS 2025: Papers with Abstracts| Papers |
|---|
Abstract. Diffusion models have revolutionized text-to-image generation, but their real-world applications are hampered by the extensive inference time needed for hundreds of diffusion steps. Although progressive distillation and consistency distillation have been proposed to speed up diffusion sampling to 2-8 steps, they still fall short in one-step generation due to poor abilities to generate high-frequency content. To overcome this issue, we introduce High-frequency-Promoting Adaptation (HiPA), a parameter-efficient approach to enable one-step text-to-image diffusion. Grounded in the insight that high-frequency information is essential but highly lacking in one-step diffusion, HiPA focuses on training one-step, low-rank adaptors to specifically enhance the under-represented high-frequency abilities of advanced diffusion models. The learned adaptors empower these diffusion models to generate high-quality images in just a single step. Compared with progressive distillation, HiPA achieves much better performance in one-step text-to-image generation (37.3 -> 23.8 in FID-5k on MS-COCO 2017) and 28.6x training speed-up (108.8 -> 3.8 A100 GPU days), requiring only 0.04% training parameters (7,740 million -> 3.3 million). We also demonstrate HiPA's effectiveness in text-guided image editing, inpainting and super-resolution tasks, where our adapted models consistently deliver high-quality outputs in just one diffusion step. | Abstract. Although generative AI (GenAI) is transforming healthcare, it may take decades for its clinical impact to be fully realized due to inefficiencies in clinical science. Here, we propose a new vision for open clinical AI science and present a practical framework that provides free GenAI prediction services for all doctors worldwide. This approach enables widespread participation in the generation and dissemination of new evidence for the responsible use of GenAI in clinical care across all diseases. Broad adoption of responsible use of open GenAI services by physicians will lead to more timely diagnoses and treatments, facilitate the sharing of synthetic data and fine-tuned AI models, and thus accelerate open clinical AI science by a factor of ten. | Abstract. Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships among diverse environmental variables over space and time? In this paper, we introduce a framework, FREE, that enables the use of varying features and available information to train a universal model. The core idea is to map available environmental data into a text space and then convert the traditional predictive modeling task in environmental science to a semantic recognition problem. Our evaluation on two societally important real-world applications, stream water temperature prediction and crop yield prediction, demonstrates the superiority of FREE over multiple baselines, even in data-sparse scenarios. | Abstract. Breast cancer subtypes defined by estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status guide treatment decisions, yet manual extraction of these biomarkers from pathology reports is time‑consuming and error‑prone. We present an end‑to‑end NLP pipeline that automates high‑risk subtype identification (HER2‑positive and triple‑negative) from digital core‑biopsy reports. A corpus of 2,722 reports (2,401 non‑synoptic, 321 synoptic) was annotated in Doccano, yielding 16,706 question–answer pairs. Reports were pre-processed and then split using a multi-stratified sampling approach into training (59%), validation (17%), and held‑out test (24%) sets. We fine‑tuned BioMedBERT on SQuAD 2.0 and then on our domain‑specific dataset, employing hyperparameter optimization and prediction post-processing. On the held-out test data, our model achieved 99.79% accuracy on synoptic reports and 98.83% on non‑synoptic reports, outperforming human annotators and maintaining robust performance across report formats and biomarker classes. By automatically flagging eligible patients for neoadjuvant chemotherapy triage, this pipeline has the potential to streamline clinical workflows, reduce treatment delays, and improve outcomes for high‑risk breast cancer patients. | Abstract. Large Language Models (LLMs) have demonstrated remarkable potential in advancing scientific knowledge and addressing complex challenges. In this work, we introduce OmniScience, a specialized large reasoning model for general science, developed through three key components: (1) domain adaptive pretraining on a carefully curated corpus of scientific literature, (2) instruction tuning on a specialized dataset to guide the model in following domain-specific tasks, and (3) reasoning-based knowledge distillation through fine-tuning to significantly enhance its ability to generate contextually relevant and logically sound responses. We demonstrate the versatility of OmniScience by developing a battery agent that efficiently ranks molecules as potential electrolyte solvents or additives. Comprehensive evaluations reveal that OmniScience is competitive with state-of-the-art large reasoning models on the GPQA Diamond and domain-specific battery benchmarks, while outperforming all public reasoning and non-reasoning models with similar parameter counts. We further demonstrate via ablation experiments that domain adaptive pretraining and reasoning-based knowledge distillation are critical to attain our performance levels, across benchmarks. | Abstract. Automated semantic knowledge extraction from scientific literature promises to open vast quantities of scientific knowledge to formal analysis and computationally-driven discovery. In this work we investigate the promise of Large Language Model (LLM) agents in extracting structured knowledge from biomedical texts, specifically for grounding protein-protein interaction (PPI) relations to terms in the PSI-MI ontology of molecular interactions. While LLMs excel at summarization, they struggle to interface with structured knowledge representations. We equipped agents with various knowledge graph interaction strategies and measured their PPI grounding performance. Our central finding is that PageRank-guided traversal, a method rooted in graph topology, consistently outperforms embedding-based approaches such as retrieval augmented generation (RAG) and top-down traversal strategies including breadth-first search (BFS), depth-first search (DFS), and local greedy search in extracting knowledge previously missed by human curators. Our initial results indicate that the structure of a well-curated knowledge base is itself a powerful source of information, an underutilized principle in current agentic knowledge base interaction methods. | Abstract. Sudden unexpected death in epilepsy (SUDEP) is the leading cause of epilepsy-related mortality. Low-cost and noninvasive interictal biomarkers of SUDEP risk can help clinicians identify high-risk patients and initiate preventive actions. However, the small sample size in SUDEP patients remains a bottleneck for discriminatory analysis or biomarker discovery. Machine-driven data augmentation (DA) techniques can potentially alleviate the sample insufficiency or imbalance problem using synthetic data. Here we revisit an old SUDEP risk prediction problem from a new DA and generative artificial intelligence (AI) perspective, using a multicenter cohort study consisting of multichannel interictal electroencephalography (EEG) and electrocardiography (ECG) data from SUDEP patients and age-matched living epilepsy patient controls. Our results show that DA strategies can not only significantly improve the cross-validated prediction accuracy but also generalize well in newly collected held-out data samples. | Abstract. "What really happened? Who was at fault? Did the pedestrian yield, or was the driver distracted?" In high stakes traffic incidents, understanding pedestrian vehicle interactions is essential for safety assessment, post crash analysis, and insurance decision making. We propose a novel vision language framework for traffic safety captioning and visual question answering (VQA), designed for the AI City Challenge 2025. Leveraging LLaVA 1.5 as our base vision language model, we introduce a multi frame collage input strategy to embed temporal context into image based architectures. We explored three input transformation techniques, Box Stitch, Blur Stitch, and Arrow Stitch, to emphasize semantic cues such as entity localization, contextual filtering, and motion trajectory. Structured captions are generated through a two stage process: LLaVA extracts fine grained semantic features via targeted question answering, which are then converted into narrative descriptions using Mistral 7B. For VQA, Mistral further reasons over structured scene features to identify the most contextually appropriate response. Our best performing configuration, Box Stitch, achieves an S_2 score of 33.93 on the official test set, demonstrating the effectiveness of structured prompting, modular caption pipelines, and strategic visual input augmentation in understanding pedestrian vehicle interactions. This work highlights the promise of combining static visual backbones with image based temporal fusion for traffic scenario comprehension. | Abstract. Large language models (LLMs) are transforming the social science toolkit. Synthetic AI agents are LLM-powered programs that can reason, converse, and act autonomously. This position paper argues that these developments offer three AI-human complementary research uses. First, as interactive partners, AI agents let scholars probe human-AI social dynamics to learn more about human dynamics. Second, finely-tuned AI models can be synthetic substitutes for human subjects. This will allow rapid piloting of novel experimental designs, cross-cultural “synthetic cultural agents”, and generating “out-of-this-world" speculations about experiments that can't be conducted with people. Third, as analytic tools, AI agents can code, summarize, and even simulate qualitative data at scales unreachable by humans alone. We synthesize recent evidence, highlight ethical and methodological pitfalls, including bias, prompt sensitivity, and the limits of fully automated analysis, and outline elements of a research agenda in which AI complements human subjects and scientists. | Abstract. We present a foundational transformer model for gut microbiome analysis, using self-supervised learning to extract universal principles of microbial community assembly from unlabeled data. Treating microbial communities analogously to languages, our model learns representations enabling reliable cross-study generalization and automated biological discovery. Pretrained on 18,480 samples, our model achieves state-of-the-art performance on multiple downstream tasks, generalizes effectively to independent cohorts with significant distribution shifts, and highlights novel taxa associated with inflammatory bowel disease (IBD). This work exemplifies how foundational AI models can transform scientific domains by learning generalizable patterns that traditional methods miss, opening new avenues for hypothesis generation and understanding in microbiome science. | Abstract. Recent advancements in Artificial Intelligence (AI) code generation have given rise to a practice called vibe-coding where a code-generation tool writes the code while the developer primarily provides feedback and corrections. Tools like Cursor and GitHub Copilot have popularized it by directly integrating AI code generation into the IDE, thus making it easier to use. However, users of these tools often err on the side of providing excessive content; often providing whole dependency trees and code bases despite the decreasing accuracy and increasing energy consumption. In the paper that follows, we outline a vision for a system that extracts minimal, but sufficient context for these code generation systems. We argue that current context management systems are a critical bottleneck and propose a research agenda based on semantic retrieval to address this issue. We explore open questions and future directions that could make AI-assisted development more efficient and cost-effective. | Abstract. Artificial Intelligence (AI) offers the learning sciences new possibilities to advance not only educational practice, but also the study and theory of learning itself. AI enables richer insights from multimodal data, operationalizes long-standing educational theories in real-world contexts, and supports rapid experimentation, creating opportunities to address persistent inequities in research participation and representation. At the same time, these opportunities carry risks, including reinforcement of systemic inequalities, erosion of theoretical grounding, and privileging of narrow cultural perspectives. This vision paper argues for an interdisciplinary and human-centered approach that integrates the values of the learning sciences with the technical capabilities of AI. We present a vision grounded in participatory methods, responsible data science, and inclusive design, supported by privacy-preserving infrastructures. By envisioning AI as a collaborator that adapts to both learners and evolving theories, we chart a path toward more equitable, rigorous, and impactful learning science. | Abstract. Urban waste management suffers from contamination, inefficiency, and poor adaptability to changing conditions. Existing AI-based waste classification systems act as static classifiers, unable to account for real-time factors such as bin capacity or evolving municipal policies. This paper presents Edge AI Agent, a framework that transforms smart bins into context-aware, decision-making systems. Built on a dual YOLOv5/YOLOv8 perception module, integrated with local policy databases and IoT-driven bin state monitoring, the agent uses an edge-native large language model to fuse perception, regulation, and infrastructure data for adaptive, user-oriented waste disposal guidance. Deployed on resource-constrained devices, the system can reroute waste when bins are full, update behavior with policy changes, and provide real-time educational feedback to users. In a federated network, these agents enable dynamic waste collection, reduce contamination, and enhance operational efficiency, offering a scalable pathway toward sustainable, circular urban economies. | Abstract. The next generation of 6G networks must seamlessly support applications ranging from holographic communications and XR to massive IoT and autonomous systems, all under diverse and stringent QoS demands. While network slicing enables service-specific virtual networks, current solutions remain siloed, with network-level slicing unaware of real-time spectrum conditions and physical-layer sensing lacking application and security context. This paper presents CogniSense-Slice, a cross-layer AI agent that unifies spectrum awareness, intelligent slice orchestration, and proactive security. The framework integrates a 1D CNN–based Perception Module for high-resolution spectrum mapping, a DNN–driven Orchestration Module for dynamic, context-aware slice allocation, and a Guardian Module for real-time, cross-layer threat mitigation. Through simulation using fused DeepSense and DeepSlice datasets augmented with attack scenarios, CogniSense-Slice demonstrates faster slice migration to optimal bands, improved throughput and latency, and superior detection of coordinated physical and network-layer attacks. This work advances AI-native networking by bridging the physical-network layer divide, enabling resilient, adaptive, and spectrum-efficient 6G systems. | Abstract. While digital twins have become an essential tool in many areas of engineering, they are still relatively rare in brain science. While gradient-descent-based machine learning approaches can model behavior at an aggregate level, they do not replicate neural mechanisms of individual subjects or patients, and therefore cannot serve as twins for them. Other approaches can, such as associatively connected neural maps. For example the BiLex model can be fit to individual patient's language history and impairment in stroke or dementia, and then used to evaluate different rehabilitation and mitigation strategies. This approach was found promising in an actual clinical trial, paving way for more digital twin studies in the future. | Abstract. Traditional methods for inverse optimization of non-linear PDE systems face challenges such as getting trapped in local minima. An AI-based approach can be an alternative to drive optimization through a stable diffusion learning process with a more global context. The concept of using diffusion for generating new control sequences and control functions has already been lightly explored in the literature, however, this architecture has primarily been tested on smooth control functions in PDE simulations. In this paper, we test the limits of attention-based diffusion in inverting a 2D heterogeneous control function coupled in an advection-diffusion PDE system. Recent work has noted that methods such as supervised learning and reinforcement learning have proven somewhat effective; however, they often produce non-physical dynamics or fail to remain optimal long-term. This paper tests a UNet-based diffusion model that uses attention to solve inverse optimization problems. These encouraging results suggest that attention could potentially be an effective mechanism for inverse optimization. | Abstract. We present Ask WhAI, a debugger for multi-agent LLM interactions that records and replays encounters, probes agent belief state out of band at encounter breakpoints, and injects counterfactual evidence to test belief revision under controlled perturbations.
We integrate Ask WhAI with a medical case simulator in which role-primed specialist agents write to a shared, time-stamped electronic medical record (EMR) and query an oracle LabAgent that releases ground-truth results only when ordered.
We stress-test the system on a synthetic multi-specialty diagnostic journey for abrupt-onset neuropsychiatric symptoms. Agents primed with strong role-specific priors (e.g., "act like a neurologist") interact with a moderator across sequential encounters; breakpoints enable pre- and post-event belief probes, allowing us to separate entrenched priors from evidence integration effects.
Across controlled changes in probe framing, evidence exposure, and encounter order, we observe role-conditioned priors, resistance to counter-evidence, and order effects on belief trajectories. By separating belief probes from the clinical dialogue and enabling replay under targeted perturbations, Ask WhAI offers a reproducible way to study belief formation and epistemic silos in multi-agent reasoning. | Abstract. Foundation segmentation models (such as SAM and SAM-2) perform well on natural images but struggle with brain MRIs where structures like the caudate and thalamus lack sharp boundaries and have poor contrast. Rather than fine-tune these models (e.g., MedSAM), we propose a compositional alternative where we treat the foundation model’s output as an additional input channel (like an extra color channel) and pass it alongside the MRI to highlight regions of interest.
We generate SAM-2 segmentation prompts (e.g., a bounding box or positive/negative points) using a lightweight 3D U-Net that was previously trained on MRI segmentation. However, the U-Net might have been trained on a different dataset. As such its guesses for prompts are often inaccurate but often in the right region. The edges of the resulting foundation segmentation "guesses" are then smoothed to allow better alignment with the MRI. We also test prompt-less segmentation using DINO attention maps within the same framework.
This “has-a” architecture avoids modifying foundation weights and adapts to domain shift without retraining the foundation model. It achieves 96% volume accuracy on basal ganglia segmentation, which is sufficient for our study of longitudinal volume change. Our approach is faster, more label-efficient, and robust to out-of-distribution scans. We apply it to study inflammation-linked changes in sudden-onset pediatric OCD. | Abstract. Large language model (LLM) agent evaluators leverage specialized tools to ground the rational decision-making of LLMs, making them well-suited to aid in scientific discoveries, such as constrained retrosynthesis planning. Constrained retrosynthesis planning is an essential, yet challenging, process within chemistry for identifying synthetic routes from commercially available starting materials to desired target molecules, subject to practical constraints. Here, we present LARC, the first LLM-based Agentic framework for Retrosynthesis planning under Constraints. LARC incorporates agentic constraint evaluation, through an Agent-as-a-Judge, directly into the retrosynthesis planning process, using agentic feedback grounded in tool-based reasoning to guide and constrain route generation. We rigorously evaluate LARC on a carefully curated set of 48 constrained retrosynthesis planning tasks across 3 constraint types. LARC achieves a 72.9% success rate on these tasks, vastly outperforming LLM baselines and approaching human expert-level success in substantially less time. The LARC framework is extensible, and serves as a first step towards an effective agentic tool or a co-scientist to human experts for constrained retrosynthesis. | Abstract. Protein–protein docking tools help in studying interactions between proteins, and are essential for drug, vaccine, and therapeutic development. However, the accuracy of a docking tool depends on a robust scoring function that can reliably differentiate between native and non-native complexes. PIsToN is a state-of-the-art deep learning–based scoring function that uses Vision Transformers in its architecture. Recently, the Mamba architecture has demonstrated exceptional performance in both natural language processing and computer vision, often outperforming Transformer-based
models in their domains. In this study, we introduce PUMBA (Protein-protein interface evaluation with Vision Mamba), which improves PIsToN by replacing its Vision Transformer backbone with Vision Mamba. This change allows us to leverage Mamba’s efficient long-range sequence modeling for sequences of image patches. As a result, the model’s ability to capture both global and local patterns in protein–protein interface features is significantly improved. Evaluation on several widely-used, large-scale public datasets demonstrates that PUMBA consistently outperforms its original Transformer-based predecessor, PIsToN. | Abstract. The role of Artificial Intelligence (AI) is growing in every stage of drug development. Nevertheless, a major challenge in drug discovery AI remains: Drug pharmacokinetic (PK) and Drug-Target Interaction datasets collected in different studies often exhibit limited overlap, creating data overlap sparsity. Thus, data curation becomes difficult, negatively impacting downstream research investigations in high-throughput screening, polypharmacy, and drug combination. We propose xImagand-DKI, a novel SMILES/Protein-to-Pharmacokinetic/DTI (SP2PKDTI) diffusion model capable of generating an array of PK and DTI target properties conditioned on SMILES and protein inputs that exhibit data overlap sparsity. We infuse additional molecular and genomic domain knowledge from the Gene Ontology and molecular fingerprints to further improve our model performance. We show that xImagand-DKI-generated synthetic PK data closely resemble real data univariate and bivariate distributions, and can adequately fill in gaps among PK and DTI datasets. As such, xImagand-DKI is a promising solution for data overlap sparsity and may improve performance for downstream drug discovery research tasks. | Abstract. Early detection of mild dementia is vital for timely intervention, yet most existing voice biomarker research relies on scripted, high-quality clinical recordings. Such settings fail to capture the acoustic variability of everyday life, limiting real-world applicability. This proof-of-concept study is the first to develop a voice-based biomarker for mild dementia using naturalistic in-vehicle audio data collected during routine driving. Audio recordings from 29 participants with sufficient speech content were processed to isolate speech segments through unsupervised clustering, followed by manual verification to ensure relevance. Speech embeddings were extracted using the Wav2Vec2 architecture and a multilayer perceptron classifier was trained and evaluated in a subject-level leave-one-subject-out (LOSO) framework. The model achieved an accuracy of 68.97%, precision of 75%, recall of 60%, and F1-score of 66.67%. These findings demonstrate the feasibility of deriving robust voice biomarkers from highly variable, noise-rich real-world audio. This work lays the groundwork for scalable, passive in-vehicle cognitive health monitoring, with future directions including larger datasets, multimodal integration, and longitudinal analysis for early dementia detection. | Abstract. The exponential growth of biomedical literature poses challenges for synthesizing thematic and emotional insights, particularly in underexplored conditions like fibromyalgia. We present a reproducible and modular pipeline that integrates BERTopic — an explainable topic modeling framework — with sentiment analysis to map 5,861 PubMed abstracts on fibromyalgia. It primarily spans publications from 1990 to 2020, with a small number of records predating 1990; this coverage enables longitudinal analysis of research themes and sentiment. Our approach combines Sentence-BERT embeddings, density-based clustering, and TF-IDF topic representation to extract 111 interpretable topics and one noise cluster. Key themes include sleep dysfunction, multimodal treatment, genetic biomarkers, and patient experience—an emergent area increasingly emphasized in chronic illness research. We benchmark BERTopic against Latent Dirichlet Allocation (LDA) and Contextual Topic Modeling (CTM) using four coherence metrics (C_V, UMass, NPMI, and C_UCI). While CTM achieved the highest coherence score (C_V = 0.6748), BERTopic (C_V = 0.6331) offered superior visualization, adaptability, and usability. Sentiment analysis, conducted using a DistilBERT classifier trained on the SST-2 dataset, revealed domain-specific polarity patterns — e.g., overwhelmingly negative tone in sleep-related studies and balanced sentiment in patient-centered topics. Although the sentiment model was not fine-tuned on biomedical text, it provided meaningful first-order approximations. This work contributes a scalable framework for scientific landscape mapping in low-data medical domains. We discuss limitations—including the presence of noise (Topic -1) and reliance on abstracts—and outline future directions such as domain-specific sentiment fine-tuning and full-text integration. | Abstract. Recent efforts have explored the use of large language models (LLMs) in drug discovery. As pioneers in this research line, we share our perspective on its current state and where it may lead. Our work, ChatDrug, demonstrates how LLM–human interaction, supported by a domain agent, can improve reliability: when LLMs generate incorrect or invalid outputs, the agent retrieves reference information to guide correction. While ChatDrug enhances answer accuracy and insight generation, a key limitation of current LLMs, their lack of direct perception of the physical world, remains. We believe that overcoming this boundary will require multi-modal tools integrating LLMs with domain-specific capabilities, an important direction for future research. |
|
|