publications | Moksh Jain

2026

ICLR
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Minsu Kim*, Jean-Pierre Falet*, Oliver E. Richardson, and 5 more authors

In International Conference on Learning Representations, 2026

Abs arXiv Bib

Chain-of-Thought (CoT) reasoning has advanced the capabilities and transparency of language models (LMs); however, reasoning chains can contain inaccurate statements that reduce performance and trustworthiness. To address this, we propose to augment each reasoning step in a CoT with a latent veracity (or correctness) variable. To efficiently explore this expanded space, we introduce Veracity Search (VS), a discrete search algorithm over veracity assignments. It performs otherwise intractable inference in the posterior distribution over latent veracity values by leveraging the LM’s joint likelihood over veracity and the final answer as a proxy reward. This efficient inference-time verification method facilitates supervised fine-tuning of an Amortized Veracity Inference (AVI) machine by providing pseudo-labels for veracity. AVI generalizes VS, enabling accurate zero-shot veracity inference in novel contexts. Empirical results demonstrate that VS reliably identifies errors in logical (ProntoQA), mathematical (GSM8K), and commonsense (CommonsenseQA) reasoning benchmarks, with AVI achieving comparable zero-shot accuracy. Finally, we demonstrate the utility of latent veracity inference for providing feedback during self-correction and self-improvement.
@inproceedings{kim2025search, title = {Latent Veracity Inference for Identifying Errors in Stepwise Reasoning}, author = {Kim*, Minsu and Falet*, Jean-Pierre and Richardson, Oliver E. and Chen, Xiaoyin and Jain, Moksh and Ahn, Sungjin and Ahn, Sungsoo and Bengio, Yoshua}, year = {2026}, booktitle = {International Conference on Learning Representations}, url = {https://openreview.net/forum?id=eux1cp8GqC}, }
AM
Navigating Ternary Doping in Li-ion Cathodes With Closed-Loop Multi-Objective Bayesian Optimization

Nooshin Zeinali Galabi, Cheng-Hao Liu, Moksh Jain, and 4 more authors

Advanced Materials, 2026

Abs Bib

To further improve secondary battery materials, we are increasingly exploring highly complex composition spaces in attempts to optimize multiple properties simultaneously. While our past work has done this in systematic manners using high-throughput experimentation, the exponential increase in the search space with triple doping makes grid search prohibitively expensive. Here, we demonstrate a closed-loop, multi-objective machine learning approach to guide the high-throughput workflow to efficiently navigate a space with approximately 14 million unique combinations. The test system is LiCoPO4, which we have previously explored using systematic codoping that was effective in optimizing one property only: energy density. To learn multiple electrochemical metrics, we first pretrain a set transformer on the public Materials Project database as a feature extractor, then attach a multi-task Gaussian process head and finetune the entire model on our high-throughput data. Through 3 rounds of active learning, we demonstrate that with a very small number of samples (as few as 125 random compositions and 63 predicted), we are able to simultaneously optimize four key electrochemical properties. Relative to the undoped system, the best composition raises our composite figure of merit up to five times. This establishes an end-to-end workflow for accelerated battery materials design to be used in the rapidly growing field of autonomous materials discovery.
@article{zeinali2025navigating, title = {Navigating Ternary Doping in Li-ion Cathodes With Closed-Loop Multi-Objective Bayesian Optimization}, author = {Zeinali Galabi, Nooshin and Liu, Cheng-Hao and Jain, Moksh and Kamel, Marc and Jia, Shipeng and Bengio, Yoshua and McCalla, Eric}, journal = {Advanced Materials}, pages = {e19790}, year = {2026}, publisher = {Wiley Online Library}, }

2025

arXiv
A Comedy of Estimators: On KL Regularization in RL Training of LLMs

Vedant Shah*, Johan Obando-Ceron*, Vineet Jain*, and 10 more authors

arXiv preprint arXiv:2512.21852, 2025

Abs arXiv Bib

The reasoning performance of large language models (LLMs) can be substantially improved by training them with reinforcement learning (RL). The RL objective for LLM training involves a regularization term, which is the reverse Kullback-Leibler (KL) divergence between the trained policy and the reference policy. Since computing the KL divergence exactly is intractable, various estimators are used in practice to estimate it from on-policy samples. Despite its wide adoption, including in several open-source libraries, there is no systematic study analyzing the numerous ways of incorporating KL estimators in the objective and their effect on the downstream performance of RL-trained models. Recent works show that prevailing practices for incorporating KL regularization do not provide correct gradients for stated objectives, creating a discrepancy between the objective and its implementation. In this paper, we further analyze these practices and study the gradients of several estimators configurations, revealing how design choices shape gradient bias. We substantiate these findings with empirical observations by RL fine-tuning \textttQwen2.5-7B, \textttLlama-3.1-8B-Instruct and \textttQwen3-4B-Instruct-2507 with different configurations and evaluating their performance on both in- and out-of-distribution tasks. Through our analysis, we observe that, in on-policy settings: (1) estimator configurations with biased gradients can result in training instabilities; and (2) using estimator configurations resulting in unbiased gradients leads to better performance on in-domain as well as out-of-domain tasks. We also investigate the performance resulting from different KL configurations in off-policy settings and observe that KL regularization can help stabilize off-policy RL training resulting from asynchronous setups.
@article{shah2025comedy, title = {A Comedy of Estimators: On KL Regularization in RL Training of LLMs}, author = {Shah*, Vedant and Obando-Ceron*, Johan and Jain*, Vineet and Bartoldson, Brian and Kailkhura, Bhavya and Mittal, Sarthak and Berseth, Glen and Castro, Pablo Samuel and Bengio, Yoshua and Malkin, Nikolay and Jain*, Moksh and Venkatraman*, Siddarth and Courville*, Aaron}, year = {2025}, journal = {arXiv preprint arXiv:2512.21852}, }
arXiv
Benchmarking World-Model Learning

Archana Warrier, Dat Nguyen, Michelangelo Naim, and 8 more authors

arXiv preprint arXiv:2510.19788, 2025

Abs arXiv Bib

Model-learning agents should gather information to learn world models that support many downstream tasks and inferences, such as predicting unobserved states, estimating near- and far-term consequences of actions, planning action sequences, and detecting changes in dynamics. Current methods for learning and evaluating world models diverge from this goal: training and evaluation are anchored to next-frame prediction, and success is scored by reward maximization in the same environment. We propose WorldTest, a protocol to evaluate model-learning agents that separates reward-free interaction from a scored test phase in a different but related environment. WorldTest is open-ended — models should support many different tasks unknown ahead of time — and agnostic to model representation, allowing comparison across approaches. We instantiated WorldTest with AutumnBench, a suite of 43 interactive grid-world environments and 129 tasks across three families: masked-frame prediction, planning, and predicting changes to the causal dynamics. We compared 517 human participants and three frontier models on AutumnBench. We found that humans outperform the models, and scaling compute improves performance only in some environments but not others. WorldTest provides a novel template — reward-free exploration, derived tests, and behavior-based scoring — to evaluate what agents learn about environment dynamics, and AutumnBench exposes significant headroom in world-model learning.
@article{warrier2025benchmarking, title = {Benchmarking World-Model Learning}, author = {Warrier, Archana and Nguyen, Dat and Naim, Michelangelo and Jain, Moksh and Liang, Yichao and Schroeder, Karen and Yang, Cambridge and Tenenbaum, Joshua B. and Vollmer, Sebastian and Ellis, Kevin and Tavares, Zenna}, year = {2025}, journal = {arXiv preprint arXiv:2510.19788}, }
arXiv
Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Siddarth Venkatraman*, Vineet Jain*, Sarthak Mittal*, and 9 more authors

arXiv preprint arXiv:2509.26626, 2025

Abs arXiv Bib Code

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains – not just the final answers – and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains.
@article{venkatraman2025recursive, title = {Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models}, author = {Venkatraman*, Siddarth and Jain*, Vineet and Mittal*, Sarthak and Shah, Vedant and Obando-Ceron, Johan and Bengio, Yoshua and Bartoldson, Brian R. and Kailkhura, Bhavya and Lajoie, Guillaume and Berseth, Glen and Malkin, Nikolay and Jain, Moksh}, year = {2025}, journal = {arXiv preprint arXiv:2509.26626}, }
NeurIPS
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson, Siddarth Venkatraman, James Diffenderfer, and 7 more authors

In Advances in Neural Information Processing Systems, 2025

Abs arXiv Bib Code

Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. However, existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers, which can be populated scalably by distributed off-policy actors to enhance exploration as compute increases. We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA), a massively scalable LLM RL system. In contrast to existing approaches, TBA uses a larger fraction of compute on search, constantly generating off-policy data for a central replay buffer. A training node simultaneously samples data from this buffer based on reward or recency to update the policy using Trajectory Balance (TB), a diversity-seeking RL objective introduced for GFlowNets. TBA offers three key advantages: (1) decoupled training and search, speeding up training wall-clock time by 4x or more; (2) improved diversity through large-scale off-policy sampling; and (3) scalable search for sparse reward settings. On mathematical reasoning, preference-tuning, and automated red-teaming (diverse and representative post-training tasks), TBA produces speed and performance improvements over strong baselines.
@inproceedings{bartoldson2025tba, title = {Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable {LLM} Post-Training}, author = {Bartoldson, Brian R. and Venkatraman, Siddarth and Diffenderfer, James and Jain, Moksh and Ben-Nun, Tal and Lee, Seanie and Kim, Minsu and Obando-Ceron, Johan and Bengio, Yoshua and Kailkhura, Bhavya}, year = {2025}, booktitle = {Advances in Neural Information Processing Systems}, volume = {38}, }
ICLR
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif, and 8 more authors

In International Conference on Learning Representations, 2025

Abs arXiv Bib

Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches.
@inproceedings{lee2025learning, title = {Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning}, author = {Lee, Seanie and Kim, Minsu and Cherif, Lynn and Dobre, David and Lee, Juho and Hwang, Sung Ju and Kawaguchi, Kenji and Gidel, Gauthier and Bengio, Yoshua and Malkin, Nikolay and Jain, Moksh}, booktitle = {International Conference on Learning Representations}, year = {2025}, }
ICLR
Action abstractions for amortized sampling

Oussama Boussif, Léna Néhale Ezzine, Joseph D Viviano, and 6 more authors

In International Conference on Learning Representations, 2025

Abs arXiv Bib

As trajectories sampled by policies used by reinforcement learning (RL) and generative flow networks (GFlowNets) grow longer, credit assignment and exploration become more challenging, and the long planning horizon hinders mode discovery and generalization. The challenge is particularly pronounced in entropy-seeking RL methods, such as generative flow networks, where the agent must learn to sample from a structured distribution and discover multiple high-reward states, each of which take many steps to reach. To tackle this challenge, we propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and ‘chunking’ them into a single action that is added to the action space. In empirical evaluation on synthetic and real-world environments, our approach demonstrates improved sample efficiency performance in discovering diverse high-reward objects, especially on harder exploration problems. We also observe that the abstracted high-order actions are interpretable, capturing the latent structure of the reward landscape of the action space. This work provides a cognitively motivated approach to action abstraction in RL and is the first demonstration of hierarchical planning in amortized sequential sampling.
@inproceedings{boussif2025action, author = {Boussif, Oussama and Ezzine, L{\'e}na N{\'e}hale and Viviano, Joseph D and Koziarski, Micha{\l} and Jain, Moksh and Malkin, Nikolay and Bengio, Emmanuel and Assouel, Rim and Bengio, Yoshua}, booktitle = {International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=ispjankYab}, }

2024

NeurIPS
Amortizing intractable inference in diffusion models for vision, language, and control

Siddarth Venkatraman*, Moksh Jain*, Luca Scimeca*, and 12 more authors

Advances in Neural Information Processing Systems, 2024

Abs arXiv Bib Code

In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.
@article{venkatraman2024amortizing, title = {Amortizing intractable inference in diffusion models for vision, language, and control}, author = {Venkatraman*, Siddarth and Jain*, Moksh and Scimeca*, Luca and Kim*, Minsu and Sendera*, Marcin and Hasan, Mohsin and Rowe, Luke and Mittal, Sarthak and Lemos, Pablo and Bengio, Emmanuel and Adam, Alexandre and Rector-Brooks, Jarrid and Bengio, Yoshua and Berseth, Glen and Malkin, Nikolay}, journal = {Advances in Neural Information Processing Systems}, volume = {37}, year = {2024}, }
arXiv
Automated Discovery of Pairwise Interactions from Unstructured Data

Zuheng Xu, Moksh Jain, Alisandra Kaye Denton, and 4 more authors

arXiv preprint arXiv:2405.18540, 2024

Abs arXiv Bib

Causal representation learning provides a suite of methods for inferring latent variables and their causal relationships with identifiability guarantees. These methods are theoretically appealing, but challenging to apply in practice because the underlying assumptions are typically untestable. In this paper we instead focus on testing for dependence between latent variables under very general conditions. We derive two interaction tests—one that tests independence and one that tests mutual exclusivity—that are based on pairwise interventions. We illustrate the value of these tests in the context of biology, where pairwise perturbation experiments are frequently used to reveal interactions that are not observable from any single perturbation. For example, in oncology, researchers seek drug combination therapies that have synergistic effects; and synthetic lethality experiments reveal genetic interactions that cause cell death when pairs of genes are knocked out, but not when either one of the genes is perturbed in isolation. Our tests can be run on unstructured data, such as the pixels in an image, which enables a more general notion of interaction than typical cell viability experiments, and can be run on cheaper experimental assays. We show that test statistics of these independence tests can be used as reward in an active learning algorithm, which enables us to overcome the quadratic experimental costs in finding pairs of perturbations that interact. We evaluate our approach on a real biological experiment where we knocked out 50 pairs of genes and measured the effect with microscopy images. We show that we are able to recover significantly more known biological interactions than random search and standard active learning baselines. In addition to this, our theoretical results give sufficient conditions that show when embeddings of single perturbations can be combined to predict embeddings of pairwise perturbations.
@article{xu2024automated, title = {Automated Discovery of Pairwise Interactions from Unstructured Data}, author = {Xu, Zuheng and Jain, Moksh and Denton, Alisandra Kaye and Whitfield, Shawn and Didolkar, Aniket and Earnshaw, Berton and Hartford, Jason}, journal = {arXiv preprint arXiv:2405.18540}, year = {2024}, }
TMLR
Multi-Fidelity Active Learning with GFlowNets

Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, and 2 more authors

Transactions on Machine Learning Research, 2024

Abs arXiv Bib Code

In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.
@article{hernandez2024multi, title = {Multi-Fidelity Active Learning with GFlowNets}, author = {Hernandez-Garcia, Alex and Saxena, Nikita and Jain, Moksh and Liu, Cheng-Hao and Bengio, Yoshua}, journal = {Transactions on Machine Learning Research}, year = {2024}, }
GemBio@ICLR
Generative Active Learning for the Search of Small-molecule Protein Binders

Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, and 31 more authors

In Generative and Experimental Perspectives for Biomolecular Design (GEMBio) workshop @ ICLR, 2024

Abs arXiv Bib

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
@inproceedings{korablyov2024generative, title = {Generative Active Learning for the Search of Small-molecule Protein Binders}, author = {Korablyov, Maksym and Liu, Cheng-Hao and Jain, Moksh and van der Sloot, Almer M. and Jolicoeur, Eric and Ruediger, Edward and Nica, Andrei Cristian and Bengio, Emmanuel and Lapchevskyi, Kostiantyn and St-Cyr, Daniel and Alexandra Schuetz, Doris and Butoi, Victor Ion and Rector-Brooks, Jarrid and Blackburn, Simon and Feng, Leo and Nekoei, Hadi and Gottipati, SaiKrishna and Vijayan, Priyesh and Gupta, Prateek and Rampášek, Ladislav and Avancha, Sasikanth and Bacon, Pierre-Luc and Hamilton, William L. and Paige, Brooks and Misra, Sanchit and Jastrzebski, Stanislaw Kamil and Kaul, Bharat and Precup, Doina and Hernández-Lobato, José Miguel and Segler, Marwin and Bronstein, Michael and Marinier, Anne and Tyers, Mike and Bengio, Yoshua}, booktitle = {Generative and Experimental Perspectives for Biomolecular Design (GEMBio) workshop @ ICLR}, year = {2024}, }
GemBio@ICLR
Towards DNA-Encoded Library Generation with GFlowNets

Michał Koziarski, Mohammed Abukalam, Vedant Shah, and 7 more authors

In Generative and Experimental Perspectives for Biomolecular Design (GEMBio) workshop @ ICLR, 2024

Abs arXiv Bib

DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates.
@inproceedings{malik2023batchgfn, author = {Koziarski, Michał and Abukalam, Mohammed and Shah, Vedant and Vaillancourt, Louis and Alexandra Schuetz, Doris and Jain, Moksh and van der Sloot, Almer and Bourgey, Mathieu and Marinier, Anne and Bengio, Yoshua}, booktitle = {Generative and Experimental Perspectives for Biomolecular Design (GEMBio) workshop @ ICLR}, year = {2024}, }
ICLR
Amortizing intractable inference in large language models

Edward Hu*, Moksh Jain*, Eric Elmoznino, and 4 more authors

In International Conference on Learning Representations, 2024

Abs arXiv Bib Code

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest—including sequence continuation, infilling, and other forms of constrained generation—involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.
@inproceedings{hu2024amortizing, title = {Amortizing intractable inference in large language models}, author = {Hu*, Edward and Jain*, Moksh and Elmoznino, Eric and Kaddar, Younesse and Lajoie, Guillaume and Bengio, Yoshua and Malkin, Nikolay}, booktitle = {International Conference on Learning Representations}, year = {2024}, }
ICLR
Pre-Training and Fine-Tuning Generative Flow Networks

Ling Pan, Moksh Jain, Kanika Madan, and 1 more author

In International Conference on Learning Representations, 2024

Abs arXiv Bib

Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
@inproceedings{pan2024pretraining, title = {Pre-Training and Fine-Tuning Generative Flow Networks}, author = {Pan, Ling and Jain, Moksh and Madan, Kanika and Bengio, Yoshua}, booktitle = {International Conference on Learning Representations}, year = {2024}, }
ICLR
PhyloGFN: Phylogenetic Inference with Generative Flow Networks

Ming Yang Zhou, Zichao Yan, Elliot Layne, and 5 more authors

In International Conference on Learning Representations, 2024

Abs Bib

Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.
@inproceedings{zhou2024phylogfn, title = {PhyloGFN: Phylogenetic Inference with Generative Flow Networks}, author = {Zhou, Ming Yang and Yan, Zichao and Layne, Elliot and Malkin, Nikolay and Zhang, Dinghuai and Jain, Moksh and Blanchette, Mathieu and Bengio, Yoshua}, booktitle = {International Conference on Learning Representations}, year = {2024}, }

2023

UAI
Stochastic Generative Flow Networks

Ling Pan*, Dinghuai Zhang*, Moksh Jain, and 2 more authors

In Uncertainty in Artificial Intelligence, 2023

Abs arXiv Bib Code

Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control". They have shown great potential in generating high-quality and diverse candidates from a given energy landscape. However, existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with stochastic dynamics, which can limit their applicability. To overcome this challenge, this paper introduces Stochastic GFlowNets, a new algorithm that extends GFlowNets to stochastic environments. By decomposing state transitions into two steps, Stochastic GFlowNets isolate environmental stochasticity and learn a dynamics model to capture it. Extensive experimental results demonstrate that Stochastic GFlowNets offer significant advantages over standard GFlowNets as well as MCMC- and RL-based approaches, on a variety of standard benchmarks with stochastic dynamics.
@inproceedings{pan2023stochastic, title = {Stochastic Generative Flow Networks}, author = {Pan*, Ling and Zhang*, Dinghuai and Jain, Moksh and Huang, Longbo and Bengio, Yoshua}, booktitle = {Uncertainty in Artificial Intelligence}, year = {2023}, }
ICML
Multi-objective gflownets

Moksh Jain, Sharath Chandra Raparthy, Alex Hernández-Garcı́a, and 4 more authors

In International Conference on Machine Learning, 2023

Abs arXiv Bib Code

In many applications of machine learning, like drug discovery and material design, the goal is to generate candidates that simultaneously maximize a set of objectives. As these objectives are often conflicting, there is no single candidate that simultaneously maximizes all objectives, but rather a set of Pareto-optimal candidates where one objective cannot be improved without worsening another. Moreover, in practice, these objectives are often under-specified, making the diversity of candidates a key consideration. The existing multi-objective optimization methods focus predominantly on covering the Pareto front, failing to capture diversity in the space of candidates. Motivated by the success of GFlowNets for generation of diverse candidates in a single objective setting, in this paper we consider Multi-Objective GFlowNets (MOGFNs). MOGFNs consist of a novel Conditional GFlowNet which models a family of single-objective sub-problems derived by decomposing the multi-objective optimization problem. Our work is the first to empirically demonstrate conditional GFlowNets. Through a series of experiments on synthetic and benchmark tasks, we empirically demonstrate that MOGFNs outperform existing methods in terms of Hypervolume, R2-distance and candidate diversity. We also demonstrate the effectiveness of MOGFNs over existing methods in active learning settings. Finally, we supplement our empirical results with a careful analysis of each component of MOGFNs.
@inproceedings{jain2023multi, title = {Multi-objective gflownets}, author = {Jain, Moksh and Raparthy, Sharath Chandra and Hern{\'a}ndez-Garc{\i}́a, Alex and Rector-Brooks, Jarrid and Bengio, Yoshua and Miret, Santiago and Bengio, Emmanuel}, booktitle = {International Conference on Machine Learning}, pages = {14631--14653}, year = {2023}, }
ICML
GFlowNet-EM for learning compositional latent variable models

Edward J Hu*, Nikolay Malkin*, Moksh Jain, and 3 more authors

In International Conference on Machine Learning, 2023

Abs arXiv Bib Code

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.
@inproceedings{hu2023gflownet, title = {GFlowNet-EM for learning compositional latent variable models}, author = {Hu*, Edward J and Malkin*, Nikolay and Jain, Moksh and Everett, Katie E and Graikos, Alexandros and Bengio, Yoshua}, booktitle = {International Conference on Machine Learning}, pages = {13528--13549}, year = {2023}, }
ICML
Gflowout: Dropout with generative flow networks

Dianbo Liu, Moksh Jain, Bonaventure FP Dossou, and 8 more authors

In International Conference on Machine Learning, 2023

Abs arXiv Bib Code

Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.
@inproceedings{liu2023gflowout, title = {Gflowout: Dropout with generative flow networks}, author = {Liu, Dianbo and Jain, Moksh and Dossou, Bonaventure FP and Shen, Qianli and Lahlou, Salem and Goyal, Anirudh and Malkin, Nikolay and Emezue, Chris Chinenye and Zhang, Dinghuai and Hassen, Nadhir and others}, booktitle = {International Conference on Machine Learning}, pages = {21715--21729}, year = {2023}, }
ICML
Learning GFlowNets from partial episodes for improved convergence and stability

Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, and 6 more authors

In International Conference on Machine Learning, 2023

Abs arXiv Bib

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(λ) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB(λ), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB(λ) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.
@inproceedings{madan2023learning, title = {Learning GFlowNets from partial episodes for improved convergence and stability}, author = {Madan, Kanika and Rector-Brooks, Jarrid and Korablyov, Maksym and Bengio, Emmanuel and Jain, Moksh and Nica, Andrei Cristian and Bosc, Tom and Bengio, Yoshua and Malkin, Nikolay}, booktitle = {International Conference on Machine Learning}, pages = {23467--23483}, year = {2023}, }
SPIGM@ICML
BatchGFN: Generative Flow Networks for Batch Active Learning

Shreshth A Malik, Salem Lahlou, Andrew Jesson, and 5 more authors

In Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML, 2023

Abs arXiv Bib Code

We introduce BatchGFN – a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks.
@inproceedings{malik2023batchgfo, title = {BatchGFN: Generative Flow Networks for Batch Active Learning}, author = {Malik, Shreshth A and Lahlou, Salem and Jesson, Andrew and Jain, Moksh and Malkin, Nikolay and Deleu, Tristan and Bengio, Yoshua and Gal, Yarin}, booktitle = {Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML}, year = {2023}, }
SPIGM@ICML
Thompson Sampling for Improved Exploration in GFlowNets

Jarrid Rector-Brooks, Kanika Madan, Moksh Jain, and 5 more authors

In Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML, 2023

Abs arXiv Bib

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
@inproceedings{rectorbrooks2023thompson, title = {Thompson Sampling for Improved Exploration in GFlowNets}, author = {Rector-Brooks, Jarrid and Madan, Kanika and Jain, Moksh and Korablyov, Maksym and Liu, Cheng-Hao and Chandar, Sarath and Malkin, Nikolay and Bengio, Yoshua}, booktitle = {Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML}, year = {2023}, }
DD
GFlowNets for AI-driven scientific discovery

Moksh Jain, Tristan Deleu, Jason Hartford, and 3 more authors

Digital Discovery, 2023

Abs arXiv Bib

Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key challenge for current machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating reducible (epistemic) uncertainty and generating sets of diverse and informative experiments to perform. This motivated a new probabilistic machine learning framework called GFlowNets, which can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. GFlowNets learn to sample from a distribution given indirectly by a reward function corresponding to an unnormalized probability, which enables sampling diverse, high-reward candidates. GFlowNets can also be used to form efficient and amortized Bayesian posterior estimators for causal models conditioned on the already acquired experimental data. Having such posterior models can then provide estimators of epistemic uncertainty and information gain that can drive an experimental design policy. Altogether, here we will argue that GFlowNets can become a valuable tool for AI-driven scientific discovery, especially in scenarios of very large candidate spaces where we have access to cheap but inaccurate measurements or to expensive but accurate measurements. This is a common setting in the context of drug and material discovery, which we use as examples throughout the paper.
@article{jain2023gflownets, title = {GFlowNets for AI-driven scientific discovery}, author = {Jain, Moksh and Deleu, Tristan and Hartford, Jason and Liu, Cheng-Hao and Hernandez-Garcia, Alex and Bengio, Yoshua}, journal = {Digital Discovery}, volume = {2}, number = {3}, pages = {557--577}, year = {2023}, publisher = {Royal Society of Chemistry}, }
TMLR
DEUP: Direct Epistemic Uncertainty Prediction

Salem Lahlou*, Moksh Jain*, Hadi Nekoei, and 5 more authors

Transactions on Machine Learning Research, 2023

Abs arXiv Bib Code

Epistemic Uncertainty is a measure of the lack of knowledge of a learner which diminishes with more evidence. While existing work focuses on using the variance of the Bayesian posterior due to parameter uncertainty as a measure of epistemic uncertainty, we argue that this does not capture the part of lack of knowledge induced by model misspecification. We discuss how the excess risk, which is the gap between the generalization error of a predictor and the Bayes predictor, is a sound measure of epistemic uncertainty which captures the effect of model misspecification. We thus propose a principled framework for directly estimating the excess risk by learning a secondary predictor for the generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability. We discuss the merits of this novel measure of epistemic uncertainty, and highlight how it differs from variance-based measures of epistemic uncertainty and addresses its major pitfall. Our framework, Direct Epistemic Uncertainty Prediction (DEUP) is particularly interesting in interactive learning environments, where the learner is allowed to acquire novel examples in each round. Through a wide set of experiments, we illustrate how existing methods in sequential model optimization can be improved with epistemic uncertainty estimates from DEUP, and how DEUP can be used to drive exploration in reinforcement learning. We also evaluate the quality of uncertainty estimates from DEUP for probabilistic image classification and predicting synergies of drug combinations.
@article{lahlou2023deup, title = {DEUP: Direct Epistemic Uncertainty Prediction}, author = {Lahlou*, Salem and Jain*, Moksh and Nekoei, Hadi and Butoi, Victor I and Bertin, Paul and Rector-Brooks, Jarrid and Korablyov, Maksym and Bengio, Yoshua}, journal = {Transactions on Machine Learning Research}, year = {2023}, }

2022

NeurIPS
Trajectory balance: Improved credit assignment in gflownets

Nikolay Malkin, Moksh Jain, Emmanuel Bengio, and 2 more authors

Advances in Neural Information Processing Systems, 2022

Abs arXiv Bib

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.
@article{malkin2022trajectory, title = {Trajectory balance: Improved credit assignment in gflownets}, author = {Malkin, Nikolay and Jain, Moksh and Bengio, Emmanuel and Sun, Chen and Bengio, Yoshua}, journal = {Advances in Neural Information Processing Systems}, volume = {35}, pages = {5955--5967}, year = {2022}, }
HILL@NeurIPS
Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions

Chanakya Ekbote, Moksh Jain, Payel Das, and 1 more author

In Workshop on Human in the Loop Learning @ NeurIPS, 2022

Abs arXiv Bib

Generative Flow Networks (GFlowNets) have demonstrated significant performance improvements for generating diverse discrete objects given a reward function , indicating the utility of the object and trained independently from the GFlowNet by supervised learning to predict a desirable property given . We hypothesize that this can lead to \textitincompatibility between the inductive optimization biases in training and in training the GFlowNet, potentially leading to worse samples and slow adaptation to changes in the distribution. In this work, we build upon recent work on jointly learning energy-based models with GFlowNets and extend it to learn the joint over multiple variables, which we call Joint Energy-Based GFlowNets (JEBGFNs), such as peptide sequences and their antimicrobial activity. Joint learning of the energy-based model, used as a reward for the GFlowNet, can resolve the issues of incompatibility since both the reward function and the GFlowNet sampler are trained jointly. We find that this joint training or joint energy-based formulation leads to significant improvements in generating anti-microbial peptides. As the training sequences arose out of evolutionary or artificial selection for high antibiotic activity, there is presumably some structure in the distribution of sequences that reveals information about the antibiotic activity. This results in an advantage to modeling their joint generatively vs. pure discriminative modeling. We also evaluate JEBGFN in an active learning setting for discovering anti-microbial peptides.
@inproceedings{ekbote2022consistent, title = {Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions}, author = {Ekbote, Chanakya and Jain, Moksh and Das, Payel and Bengio, Yoshua}, booktitle = {Workshop on Human in the Loop Learning @ NeurIPS}, year = {2022}, }
ICML
Biological Sequence Design with GFlowNets

Moksh Jain, Emmanuel Bengio, Alex Hernandez-Garcia, and 8 more authors

In International Conference on Machine Learning, 2022

Abs arXiv Bib Code

Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates, in addition to a reward function, to speed up learning in GFlowNets. We present empirical results on several biological sequence design tasks, and we find that our method generates more diverse and novel batches with high scoring candidates compared to existing approaches.
@inproceedings{jain2022biological, title = {Biological Sequence Design with GFlowNets}, author = {Jain, Moksh and Bengio, Emmanuel and Hernandez-Garcia, Alex and Rector-Brooks, Jarrid and Dossou, Bonaventure FP and Ekbote, Chanakya Ajit and Fu, Jie and Zhang, Tianyu and Kilgour, Michael and Zhang, Dinghuai and others}, booktitle = {International Conference on Machine Learning}, pages = {9786--9801}, year = {2022}, }
MLDD@ICLR
Evaluating Generalization in GFlowNets for Molecule Design

Andrei Cristian Nica, Moksh Jain, Emmanuel Bengio, and 4 more authors

In Machine Learning for Drug Discovery workshop @ ICLR, 2022

Abs Bib PDF

Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly and time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.
@inproceedings{nica2022evaluating, title = {Evaluating Generalization in GFlowNets for Molecule Design}, author = {Nica, Andrei Cristian and Jain, Moksh and Bengio, Emmanuel and Liu, Cheng-Hao and Korablyov, Maksym and Bronstein, Michael M and Bengio, Yoshua}, booktitle = {Machine Learning for Drug Discovery workshop @ ICLR}, year = {2022}, }

2021

NeurIPS
Flow network based generative models for non-iterative diverse candidate generation

Emmanuel Bengio, Moksh Jain, Maksym Korablyov, and 2 more authors

Advances in Neural Information Processing Systems, 2021

Abs arXiv Bib Code

This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions, such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.
@article{bengio2021flow, title = {Flow network based generative models for non-iterative diverse candidate generation}, author = {Bengio, Emmanuel and Jain, Moksh and Korablyov, Maksym and Precup, Doina and Bengio, Yoshua}, journal = {Advances in Neural Information Processing Systems}, volume = {34}, pages = {27381--27394}, year = {2021}, }

2020

ICML
DROCC: Deep robust one-class classification

Sachin Goyal, Aditi Raghunathan, Moksh Jain, and 2 more authors

In International Conference on Machine Learning, 2020

Abs arXiv Bib Code

Classical approaches for one-class problems such as one-class SVM and isolation forest require careful feature engineering when applied to structured domains like images. State-of-the-art methods aim to leverage deep learning to learn appropriate features via two main approaches. The first approach based on predicting transformations (Golan & El-Yaniv, 2018; Hendrycks et al., 2019a) while successful in some domains, crucially depends on an appropriate domain-specific set of transformations that are hard to obtain in general. The second approach of minimizing a classical one-class loss on the learned final layer representations, eg, DeepSVDD (Ruff et al., 2018) suffers from the fundamental drawback of representation collapse. In this work, we propose Deep Robust One Class Classification (DROCC) that is both applicable to most standard domains without requiring any side-information and robust to representation collapse. DROCC is based on the assumption that the points from the class of interest lie on a well-sampled, locally linear low dimensional manifold. Empirical evaluation demonstrates that DROCC is highly effective in two different one-class problem settings and on a range of real-world datasets across different domains: tabular data, images (CIFAR and ImageNet), audio, and time-series, offering up to 20% increase in accuracy over the state-of-the-art in anomaly detection. Code is available at https://github.com/microsoft/EdgeML
@inproceedings{goyal2020drocc, title = {DROCC: Deep robust one-class classification}, author = {Goyal, Sachin and Raghunathan, Aditi and Jain, Moksh and Simhadri, Harsha Vardhan and Jain, Prateek}, booktitle = {International Conference on Machine Learning}, pages = {3711--3721}, year = {2020}, }

2019

SGO@NeurIPS
Proximal Policy Optimization for Improved Convergence in IRGAN

Moksh Jain, and Sowmya Kamath

Smooth Games Optimization and Machine Learning, NeurIPS 2019, 2019

Abs arXiv Bib

IRGAN is an information retrieval (IR) modeling approach that uses a theoretical minimax game between a generative and a discriminative model to iteratively optimize both of them, hence unifying the generative and discriminative approaches. Despite significant performance improvements in several information retrieval tasks, IRGAN training is an unstable process, and the solution varies largely with the random parameter initialization. In this work, we present an improved training objective based on proximal policy optimization objective and Gumbel-Softmax based sampling for the generator. We also propose a modified training algorithm which takes a single gradient update on both the generator as well as discriminator for each iteration step. We present empirical evidence of the improved convergence of the proposed model over the original IRGAN and a comparison on three different IR tasks on benchmark datasets is also discussed, emphasizing the proposed model’s superior performance.
@article{jain2019proximal, title = {Proximal Policy Optimization for Improved Convergence in IRGAN}, author = {Jain, Moksh and Kamath, Sowmya}, journal = {Smooth Games Optimization and Machine Learning, NeurIPS 2019}, year = {2019}, }