AI Research Papers
Research papers on AI agents, large language models, neurosymbolic systems, and the future of autonomous intelligence.
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Proposes PACS (imPlicit Actor Critic coupling via a Supervised learning framework), a novel framework for Reinforcement Learning with Verifiable Rewards (RLVR) that addresses challenges like sparse rewards and unstable policy gradients. PACS reformulates the RLVR problem into a supervised learning task over a score function, optimized using cross-entropy loss. This formulation inherently recovers the classical policy gradient update while implicitly coupling the actor and critic roles, leading to more stable and efficient training and superior reasoning performance on challenging mathematical reasoning tasks compared to baselines like PPO and GRPO.
PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling
Addresses the challenge of selecting the appropriate Large Language Model (LLM) from a diverse set, considering that user preferences vary in terms of performance, cost, and response style. It proposes PersonalizedRouter, a graph-based framework that models diverse user profiles and performs personalized LLM selection by leveraging interaction data. The framework converts interaction data into a heterogeneous graph to capture contextual information, significantly outperforming existing LLM selection methods.
Where LLM Agents Fail and How They can Learn From Failures
Examines how sophisticated LLM agents are vulnerable to cascading failures, where initial errors propagate and lead to total task failure. The paper proposes a Taxonomy of Failure Modes for LLM agents and introduces AgentDebug, a debugging framework that isolates root-cause failures and provides targeted corrective feedback. AgentDebug enables agents to recover and iteratively improve, yielding significant relative improvements in task success across multiple benchmarks.
GraphRouter: A Graph-based Router for LLM Selections
Introduces GraphRouter, a novel inductive graph framework designed to enhance the LLM selection process by leveraging contextual information among tasks, queries, and LLMs. GraphRouter constructs a heterogeneous graph and uses an edge prediction mechanism to optimize the performance-cost trade-offs for recommendations. This approach substantially surpasses existing routers and achieves enhanced generalization across new LLMs and diverse tasks.
Neurosymbolic Reinforcement Learning and Planning: A Survey
A comprehensive literature survey on the emerging field of Neurosymbolic Reinforcement Learning (Neurosymbolic RL), focusing on the integration of neural, symbolic, and RL components. The paper categorizes existing works into three taxonomies based on the role of the components: Learning for Reasoning, Reasoning for Learning, and Learning-Reasoning. It analyzes the RL components of existing research and identifies future research opportunities and challenges within this dynamic field.
Overhearing LLM Agents: A Survey, Taxonomy, and Roadmap
Studies the alternative paradigm of "overhearing agents" in human-AI interaction, where LLM agents continuously monitor ambient activity and intervene only to provide contextual assistance, rather than demanding direct user attention. The paper provides the first analysis of this distinct paradigm and establishes a taxonomy of overhearing agent interactions and tasks, laying out a roadmap for future research.
Attention Is All You Need
Introduced the Transformer machine architecture in 2017, a novel invention that fundamentally changed AI and Machine Learning. The architecture dispenses with recurrence entirely and relies solely on self-attention mechanisms to model relationships in sequential data, enabling greater parallelization, improved long-range context understanding, and better scalability than predecessor models like RNNs and LSTMs.
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems
Presents MedAgentAudit, a diagnostic framework and empirical study designed to analyze the internal collaborative processes of medical multi-agent systems, shifting focus from final task accuracy to internal reliability. The study established the first comprehensive Taxonomy of Collaborative Failure Modes (e.g., Suppression of Correct Views) based on analysis of over 3,600 cases. The framework uses an AuditorAgent and quantitative mechanisms to generate a detailed AuditTrail, making the collaborative 'black box' transparent and analyzable.
Advancing Symbolic Integration in Large Language Models: Beyond Conventional Neurosymbolic AI
Addresses the lack of transparency in LLMs and the limitations of conventional Neurosymbolic AI (NeSy AI) approaches when applied to the unique features of LLMs. The paper proposes a novel taxonomy of symbolic integration in LLMs and a roadmap to merge symbolic techniques with LLMs across various stages and coupling mechanisms. The goal is to enhance transparency, logical reasoning, and explainability in LLMs.
Neuro-Symbolic AI in 2024: A Systematic Review
A systematic review of 167 papers published between 2020 and 2024 to map the landscape of Neuro-Symbolic AI. The review found that research is primarily concentrated in learning and inference, logic and reasoning, and knowledge representation. It identifies significant gaps in explainability, trustworthiness, and Meta-Cognition, stressing that addressing these areas is crucial for developing more reliable AI systems.
Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition
Proposes a novel neuro-symbolic task planner designed to combine the accuracy of symbolic planners with the speed of LLM-based approaches, addressing the limitations of both (slow speed vs. low accuracy). The method leverages an LLM to decompose complex tasks into multi-level subgoals, reducing the search space and using either a symbolic planner or MCTS-based LLM planner for subgoals. This results in significant reductions in planning time while maintaining high success rates in robotic task planning domains.
LLM-Enhanced Symbolic Control for Safety-Critical Applications
Introduces a framework to synthesize Abstraction-Based Controller Design (ABCD) for safety-critical reach-avoid problems directly from Natural Language (NL) specifications using LLMs. The system uses a Code Agent to translate NL descriptions into formal code for symbolic control software (Dionysos) and a Checker Agent to verify the code and identify specification mismatches, thereby enhancing safety. The approach lowers the barrier to formal control synthesis while maintaining safety guarantees.
Large Language Model-Enhanced Symbolic Reasoning for Knowledge Base Completion
Proposes a novel framework to improve the flexibility and reliability of Knowledge Base Completion (KBC) by integrating LLMs with rule-based reasoning. The framework consists of a Subgraph Extractor, an LLM Proposer (to generate diverse and meaningful rules from subgraphs), and a Rule Reasoner (to refine the rules and mitigate LLM hallucination). This approach effectively combines the strong semantic understanding of LLMs with the logical rigor of symbolic methods.
AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise
This paper introduces AgentArch, a comprehensive enterprise-specific benchmark that evaluates 18 distinct agentic configurations across state-of-the-art Large Language Models (LLMs) to understand how different design dimensions interact within complex multi-agent systems. The study examines key dimensions like orchestration strategy, agent prompt implementation (ReAct vs. function calling), memory architecture, and thinking tool integration. The findings challenge the one-size-fits-all paradigm, revealing model-specific architectural preferences and significant weaknesses in performance on enterprise tasks, with even the highest scores peaking at 35.3% success on complex tasks.
Small Language Models are the Future of Agentic AI
This paper advocates that Small Language Models (SLMs), rather than Large Language Models (LLMs), are the future of agentic AI, particularly for repetitive, specialized tasks. The authors argue that SLMs are more economical, offer lower latency, and are inherently more suitable for the modular, scoped sub-tasks typical in agentic systems. They propose adopting a heterogeneous system approach for tasks requiring general conversational ability and outline an LLM-to-SLM agent conversion algorithm to facilitate this paradigm shift.
Scalable Chain of Thoughts via Elastic Reasoning
The paper proposes Elastic Reasoning, a framework to achieve scalable Chain of Thought (CoT) by addressing the challenge of uncontrolled output lengths in Large Reasoning Models (LRMs) under strict inference-time budget constraints. The framework explicitly separates the reasoning process into two phases: thinking and solution, each with independent token budgets. A lightweight budget-constrained rollout strategy is introduced to train models to reason adaptively when the thinking phase is truncated. This approach improves reliability under constraints and produces more concise and efficient reasoning.
Self-Adapting Language Models
This paper introduces Self-Adapting LLMs (SEAL), a framework that addresses the static nature of LLMs by enabling them to self-adapt their weights. SEAL works by having the model generate its own finetuning data and update directives, called a self-edit, in response to new input. These self-edits result in persistent weight updates via supervised finetuning. The model learns to produce effective self-edits using a reinforcement learning loop, with the reward based on downstream performance. Experiments demonstrate efficacy in knowledge incorporation and few-shot generalization.
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
This survey provides a comprehensive overview of collaborative strategies for Large Language Models (LLMs) designed to address the challenges of maximizing efficiency and versatility due to varying model strengths. The paper categorizes these strategies into three primary approaches: Merging (integrating models in the parameter space), Ensemble (combining model outputs), and Cooperation (leveraging diverse LLMs for specific tasks). It offers in-depth introductions to these methods, discusses potential applications, and outlines future research directions.
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
This paper introduces Multiagent Finetuning, a novel self-improvement approach that applies finetuning to a multiagent society of language models to overcome the diminishing returns of single-agent self-improvement. The method initializes a group of models from the same base and then independently specializes each one using diverse data generated through their multiagent interactions (e.g., debate). This strategy enables specialization and diversification across the models, allowing the system to preserve diverse reasoning chains and achieve autonomous improvement over multiple finetuning rounds on various reasoning tasks.