The Mathematical Sciences Colloquium invites speakers from all areas of mathematics and are open to all members of the RPI community.

### 2024

### 2023

Ever since Joseph Fourier used sines and cosines to diagonalize the Laplacian and solve the heat equation in 1822, spectral decompositions of linear operators have empowered scientists to disentangle complex physical phenomena. However, the spectrum of a self-adjoint operator can be more sophisticated than its familiar matrix counterpart; it may contain a continuum and the operator may not be diagonalized by eigenvectors alone. Until now, computational techniques for operators with continuous spectrum have typically focused on narrow classes with known analytical structure. Algorithms that achieve generality and rigor have been scarce.

In this talk, we present a tool kit of algorithms for computing spectral properties associated with the continuous spectrum of self-adjoint operators. These algorithms use the resolvent to construct high-order accurate approximations of inherently infinite-dimensional spectral properties, including smooth spectral measures, projections onto absolutely continuous subspaces, and non-normalizable modes. The algorithms are embarrassingly parallelizable and capable of leveraging state-of-the-art software for the resolvent of differential, integral, and lattice operators. Their flexibility and power are illustrated with applications in quantum and condensed matter physics and we highlight several exciting new developments in the growing class of resolvent-based algorithms for modal analysis in infinite-dimensional spaces.

Traditional numerical methods based on expensive matrix factorizations struggle with the scale of modern scientific applications. For example, kernel-based algorithms take a data set of size N, form the kernel matrix of size N x N, and then perform an eigendecomposition or inversion at a cost of O(N^3) operations. For data sets of size N >= 10^5, kernel learning is too expensive, straining the limits of personal workstations and even dedicated computing clusters. Randomized iterative methods have emerged as faster alternatives to the classical approaches. These algorithms combine random exploration of matrix structures with information about which structures are most important, leading to significant speed-ups.

In this talk, I review recent developments concerning two randomized algorithms. The first is "randomized block Krylov iteration", which uses an array of random Gaussian test vectors to probe a large data matrix in order to provide a randomized principal component analysis. Remarkably, this approach works well even when the matrix of interest is not low-rank. The second algorithm is "randomly pivoted Cholesky decomposition", which iteratively samples columns from a positive semidefinite matrix using a novelty metric and reconstructs the matrix from the randomly selected columns. Ultimately, both algorithms furnish a randomized approximation of an N x N matrix with a reduced rank k << N, which enables fast inversion or singular value decomposition at a cost of O(N k^2) operations. The speed-up factor from N^3 to N k^2 operations can be 3 million. The newest algorithms achieve this speed-up factor while guaranteeing performance across a broad range of input matrices.

Nonsmooth, nonconvex-nonconcave minimax optimization has gained widespread interest in recent years in machine learning and data science. However, much of the existing work focuses on a variant of the gradient descent-ascent (GDA) algorithm, which exhibits several limitations: (1) They are only applicable to smooth problems, where gradient information is available. (2) They may not converge globally, and they may suffer from limit cycles. (3) No single algorithm can be universally applied to all minimax optimization problems, such as nonconvex-concave and convex-nonconcave minimax settings.

In this talk, we will tackle these challenges for a class of structured nonsmooth, nonconvex-nonconcave minimax optimization problems. The main difficulty in algorithmic design for minimax optimization is balancing the primal decrease and dual increase in an efficient way. To overcome this difficulty, we develop a novel primal-dual error bound that serves as the foundation for both convergence analysis and algorithmic design. This gives rise to a new algorithm design principle known as optimal primal-dual balancing. Following this principle, we develop a single algorithm, doubly smoothed (prox-linear)/ gradient descent ascent, which universally works for all minimax optimization problems. Our algorithm finds an $\epsilon$-stationary point in $O(epsilon^-4)$ iterations. If additional regularity is assumed (weaker than standard assumptions imposed in the literature), we obtain sharper, even optimal, iteration complexity results. We showcase the effectiveness of our algorithm in getting rid of limit cycles in challenging nonconvex-nonconcave minimax optimization problems.

Time-harmonic scattering in variable media can often be modeled by linear elliptic partial differential equations. Such equations pose several numerical challenges. For example, they can be intrinsically ill-conditioned, necessitate the imposition of radiation conditions, and produce pollution errors when discretized with standard finite difference or finite element methods.

To avoid these issues, it is often convenient to first reformulate the differential equation as an integral equation. The tradeoffs are that an integral operator with a singular kernel must be discretized and that the resulting linear system that must be inverted is dense. Sometimes, the latter issue can be handled using a fast matrix-vector product algorithm (e.g., the fast Fourier transform or the fast multipole method) paired with an iterative solver (e.g., GMRES). However, this approach can be prohibitively slow when there is a large amount of backscattering and when multiple incident fields are of interest. In these cases, it is better to use direct solvers.

In this talk, I describe some recent projects on developing direct solvers in this regime with applications to acoustic scattering and metasurface design.

Solving real-world stochastic optimization problems presents two key challenges: the messiness of real-world data, which can be noisy, biased, or corrupted due to factors like outliers, distribution shifts, and even adversarial attacks; and the laborious, time-intensive requirement of manually tuning step sizes in many existing algorithms.

I will introduce a simple adaptive optimization framework that avoids the need for manual step size tuning by adaptively adjusting the step size in each iteration based on the progress of the algorithm. To address the issue of messy data, the algorithm only assumes access to function-related information through probabilistic oracles, which may be biased and corrupted.

This framework is very general, encompassing a wide range of algorithms for both unconstrained and constrained optimization. It is applicable to multiple problem settings, such as expected loss minimization in machine learning, simulation optimization, and derivative-free optimization. Uder reasonable conditions on the oracles, we provide a meta-theorem to bound the sample complexity for any algorithm in the framework.

Optimization problems involving uncertainty are central to various domains, including modern statistics and data science. Despite their ubiquity, finding efficient algorithms which provide a solution to these problems remains a major challenge. Interestingly, many random optimization problems share a common feature, dubbed as a statistical-computational gap: while the optimal value can be pinpointed non-constructively, known polynomial-time algorithms find strictly sub-optimal solutions and no better algorithm is known.

In this talk, I will discuss a theoretical framework for understanding the fundamental computational limits of such problems. This framework is based on the Overlap Gap Property (OGP), an intricate geometrical property which is known to be a barrier for large classes of algorithms. I will first focus on the symmetric binary perceptron (SBP), a model for classifying/storing random patterns as well as a random constraint satisfaction problem, widely studied in probability, computer science, and statistical physics. I will show how the OGP framework yields nearly sharp lower bounds against the classes of stable algorithms and online algorithms for the SBP. These classes capture the best known algorithms for the SBP and many other random models. I will then demonstrate how the same framework extends to other models, including the random number balancing problem, which is closely related to the design of randomized controlled trials, as well as its planted counterpart.

Our results for online algorithms are the first essentially tight unconditional lower bounds in the online setting, and our results for stable algorithms are the first applications of Ramsey Theory in the study of the limits of algorithms. More importantly, our techniques yield new toolkits for studying statistical-computational gaps in other random models.

Abstract:** **There is growing interest in solving large-scale statistical machine learning problems over decentralized networks, where data are distributed across the nodes of the network and no centralized coordination is present (we termed these systems as “mesh” networks). Inference from massive datasets poses a fundamental challenge at the nexus of the computational and statistical sciences: ensuring the quality of statistical inference when computational resources, like time and communication, are constrained. While statistical-computation tradeoffs have been largely explored in the centralized setting, our understanding over mesh networks is limited: (i) distributed schemes, designed and performing well in the classical low-dimensional regime, can break down in the high-dimensional case; and (ii) existing convergence studies may fail to predict algorithmic behaviors, with some findings directly contradicted by empirical tests. This is mainly due to the fact that the majority of distributed algorithms have been designed and studied only from the optimization perspective, lacking the statistical dimension. This talk will discuss some vignettes from high-dimensional statistical inference suggesting new analyses (and designs) aiming at bringing statistical thinking in distributed optimization.

**Bio:** Gesualdo Scutari is a Professor with the School of Industrial Engineering and Electrical and Computer Engineering (by courtesy) at Purdue University, West Lafayette, IN, USA, and he is a Purdue Faculty Scholar. His research interests include continuous optimization, equilibrium programming, and their applications to signal processing and statistical learning. Among others, he was a recipient of the 2013 NSF CAREER Award, the 2015 IEEE Signal Processing Society Young Author Best Paper Award, and the 2020 IEEE Signal Processing Society Best Paper Award. He serves as an IEEE Signal Processing Distinguish Lecturer (2023-2024). He served on the editorial broad of several IEEE journals and he is currently an Associate Editor of SIAM Journal on Optimization. He is IEEE Fellow.

Abstract: Sparse coding is a technique of representing data as a sparse linear combination of a set of vectors. This representation facilitates computation and analysis of high-dimensional data that is prevalent in many applications. We study sparse coding in the setting where the set of vectors define a unique Delaunay triangulation. We propose a weighted l1 regularizer and show that it provably yields a sparse solution. Further, we show stability of sparse codes using the Cayley-Menger determinant. We make connections to dictionary learning, manifold learning and computational geometry. We discuss an optimization algorithm to learn the sparse codes and optimal set of vectors given a set of data points. Finally, we show numerical experiments to show that the resulting sparse representations give competitive performance in the problem of clustering.

Abstract: We first review the problem of the curse of dimensionality when approximating multi-dimensional functions. Several approximation results from Barron, Petrushev, Bach, and etc . will be explained. Then we present a deterministic approach using the Kolmogorov superposition theorem. Mainly I will show that the rate of convergence by using the Kolmogorov superposition theorem is O(d^2/n) for a class of functions which is K-Lipschitz continuous, where d is the dimensionality and n is the number of neurons. Moreover, I will establish the rate of convergence of a neural network computation based on the two layers for any continuous function. In addition, I will introduce the neural network approximation based on higher order ReLU functions to explain the powerful approximation of multivariate functions using deep learning algorithms with multiple layers. Finally I will use our theory to explain why the deep learning algorithm works for image classification.

Neuronal network connectivity demonstrates sparsity on multiple spatial scales and natural stimuli also possess sparse representations in numerous domains. In this talk, we underline the role of sparsity in the efficient encoding of network connectivity and inputs through nonlinear neuronal network dynamics. Addressing the fundamental challenge of recovering the structural connectivity of large-scale neuronal networks, we leverage properties of the balanced dynamical regime and compressive sensing theory to develop a theoretical framework for efficiently reconstructing sparse network connections through measurements of the network response to a relatively small ensemble of random stimuli. We further utilize sparse recovery ideas to probe the neural correlates of binocular rivalry through dynamic percept reconstructions based on the activity of a two-layer network model with competing downstream pools driven by disparate image stimuli. The resultant model dynamics agree with key experimental observations and give insights into the excitation/inhibition hypothesis for autism.

Embedded/immersed/unfitted boundary methods obviate the need for continual re-meshing in many applications involving rapid prototyping and design. Unfortunately, many finite element embedded boundary methods (cutFEM, Finite Cell Method, etc. ) are also difficult to implement due to: (a) the need to perform complex cell cutting operations at boundaries, (b) the necessity of specialized quadrature formulas, and (c) the consequences that these operations may have on the overall conditioning/stability of the ensuing algebraic problems. We present a new, stable, and simple embedded boundary method, named “Shifted Boundary Method” (SBM), which eliminates the need to perform cell cutting. Boundary conditions are imposed on a surrogate discrete boundary, lying on the interior of the true boundary interface. We then construct appropriate field extension operators by way of Taylor expansions, with the purpose of preserving accuracy when imposing the boundary conditions. We demonstrate the SBM in applications involving solid and fracture mechanics; thermomechanics; CFD and porous media flow problems. In the specific case of fracture mechanics, the SBM takes the name of Shifted Fracture Method (SFM), which can be thought of a method with the data structure of classical cohesive fracture FEM but with the accuracy and meshindependence properties of XFEM. We show how the SFM has superior accuracy in capturing the energy released during the fracture process and how the method can be combined with phase-field approaches to simulate crack branching and merging.

Abstract: Multi-agent machine learning has seen tremendous achievements in recent years; yet, translation of single-agent optimization technique to multi-agent domain may not be straightforward. Two of the basic models for multi-agent machine learning -- minimax optimization problem and variational inequality problem -- are both computationally intractable in general. However, the gains from leveraging the special structures can be huge and understanding the optimal structure-driven algorithm is important from both theoretical and practical viewpoints. In this talk, I will provide the results on the optimal structure-driven algorithm design for (1) convex-concave and highly unbalanced minimax optimization problems and (2) monotone and highly smooth variational inequality problems. In particular, I explain why the accelerated proximal point scheme and the adaptive closed-loop scheme perfectly fit the unbalanced structure and the highly smooth structure, respectively, leading to optimal acceleration in our problem of interest.

Abstract:

Strongly correlated quantum systems include some of the most challenging problems in science. I will present the first numerical analysis for the coupled cluster method tailored by matrix product states, which is a promising method for handling strongly correlated systems. I will then discuss recent applications of the coupled cluster method and matrix product states for solving the magic angle twisted bilayer graphene system at the level of interacting electrons.

Abstract:

Submodularity is an important concept in integer and combinatorial optimization. Submodular set functions capture diminishing returns, and for this desirable property, they are found in numerous real-world applications. A submodular set function models the effect of selecting items from a single ground set, and whether an item is chosen can be modeled using a binary variable. However, in many problem contexts, the decisions consist of choosing multiple copies of homogenous items, or heterogenous items from multiple sets. These scenarios give rise to generalizations of submodularity that require mixed-integer variables or multi-set functions. We call the associated optimization problems Generalized Submodular Optimization (GSO). GSO arises in numerous applications, including infrastructure design, healthcare, online marketing, and machine learning. Due to the mixed-integer decision space and the often highly nonlinear (even non-convex and non-concave) objective function, GSO is a broad subclass of challenging mixed-integer nonlinear programming problems. In this talk, we will consider two subclasses of GSO, namely k-submodular optimization and Diminishing Returns (DR)-submodular optimization. We will discuss the polyhedral theory for the mixed-integer set structures that arise from these problem classes, which leads to exact solution methods. Our algorithms demonstrate effectiveness and versatility in experiments with real world datasets, and they can be readily incorporated into the state-of-the-art optimization solvers.

Abstract: We present a class of information metric optimization methods for high-dimensional Bayesian sampling problems. First, two examples of information geometries in probability spaces, such as the Fisher-Rao and the Wasserstein-2 spaces, are studied. Then, focusing on the Wasserstein-2 metric, we introduce accelerated gradient and Newton flows to design fast and efficient sampling algorithms. We also present practical sampling approximations for the proposed dynamics in high-dimensional sample spaces. They contain optimal transport natural gradient methods, projected Wasserstein gradient methods, and convex neural network approximation of the Wasserstein gradient. Finally, numerical experiments, including PDE-constrained Bayesian inferences and parameter estimations in COVID-19 modeling, demonstrate the effectiveness of the proposed optimization-oriented sampling algorithms.

Abstract: In this talk, I will present a framework for analyzing dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD+M)) when both the number of samples and dimensions are large. For the analysis, I will introduce a stochastic differential equation, called homogenized SGD. We show that homogenized SGD is the high-dimensional equivalent of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples n and number of features d are polynomially related. By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the training dynamics and generalization performance of SGD in terms of a solution of a Volterra integral equation. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak form of delocalization of sample-side singular vectors of the data. By analyzing these limiting dynamics, we can provide insights into learning rate, momentum parameter, and batch size selection. For instance, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $O(1/ \kappa)$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets.

### 2022

Abstract:

Materials' electronic properties arise from the complex dynamics of electrons flowing through the material. These dynamics are quantum mechanical and realize many surprising phenomena without classical analogues. I will present analytical and numerical work clarifying these dynamics in three novel materials which have attracted intense theoretical and experimental attention in recent years: graphene, the first ``2D'' material, whose electronic properties can be captured by an effective Dirac equation; topological insulators, whose edges host surprising one-way edge currents; and twisted bilayer graphene, an aperiodic material whose properties can be captured by an effective system of Dirac equations with periodic coefficients. I will then present ongoing and future work focused on further clarifying the remarkable properties of twisted bilayer graphene, which was recently shown to superconduct when twisted to the ``magic'' twist angle 1 degree.

Abstract: Many statistical machine learning problems, where one aims to recover an underlying low-dimensional signal, are based on optimization. Existing work often either overlooked the computational complexity in solving the optimization problem, or requires case-specific algorithm and analysis -- especially for nonconvex problems. This talk addresses the above two issues from a unified perspective of conditioning. In particular, we show that once the sample size exceeds the intrinsic dimension, (1) a broad class of convex and nonsmooth nonconvex problems are well-conditioned, (2) well conditioning, in turn, ensures the efficiency of out-of-box optimization methods and inspires new algorithms. Lastly, we show that a conditioning notion called flatness leads to accurate recovery in overparametrized matrix factorization models.

Abstract: The perspectives and opinions of people change and spread through social interactions on a daily basis. In the study of opinion dynamics on social networks, one often models social entities (such as twitter accounts) as nodes and their relationships (such as followship) as edges, and examines how opinions evolve as dynamical processes on networks, including graphs, hypergraphs, multi-layer networks, etc. In the first part of my talk, I will introduce a model of opinion dynamics and derive its mean-field limit as the total number of agents goes to infinity. The mean-field opinion density satisfies a kinetic equation of Kac type. We prove properties of the solution of this equation, including nonnegativity, conservativity, and steady-state convergence. The parameters of such opinion models play a nontrivial role in shaping the dynamics and can also be in the form of functions. In reality, it is often impractical to measure these parameters directly. In the second part of the talk, I will approach the problem from an `inverse’ perspective and present how to infer the parameters from limited partial observations. I will provide sufficient conditions of measurement for two scenarios, such that one is able to identify the parameters uniquely. I will also provide a numerical algorithm of the inference when the data set only has a limited number of data points.

Abstract:

Solving partial differential equations (PDEs) and PDE-based model reduction are challenging problems, particularly when PDEs have multiscale features. The data-driven approach has become an excellent option for some scientific computing problems. It becomes even more effective for some engineering applications with available data. There are various data-driven treatments for PDE-related problems. Many of them can be implemented in the operator learning framework as the underlying mathematical computation problems construct the operator. I will focus on and discuss operator learning. In particular, I will introduce a new framework: basis enhanced learning (Bel). Bel does not require a specific discretization of functions and achieves great prediction accuracy. Some applications, including some newly proposed engineering applications, will be discussed.

**Abstract:** The remarkable success of first-order methods on huge-scale problems has mostly been in the realm of problems where the objective function or its gradient satisfy a Lipschitz property. Here we consider important convex optimization problems where neither the objective nor its gradient are Lipschitz, and typically blows up on part of the (relative) boundary of the feasible region. Such problems appear in a wide range of applications across numerous fields, including statistical machine learning, medical imaging, experimental design, quantum physics, etc. Unfortunately, the vast majority of existing first-order methods cannot be applied to solve these problems due to their seemingly pathological behavior. In this talk we present new structures underlying these problems that lead to two new methods (the Multiplicative- Gradient method and a version of the Frank-Wolfe method) that successfully exploit these structures – both theoretically and computationally. Our theory shows that these two methods have simple and elegant computational guarantees, and our numerical experiments demonstrate the rather remarkable efficiency and efficacy of these methods in practice.

**Abstract:**

Modern data science applications require solving high dimensional optimization problems with large number of data points. Min-max optimization provides a unified framework for many problems in this context ranging from empirical risk minimization in machine learning to medical imaging and nonlinear programming. This talk will present two approaches for using randomization to design simple, practical and adaptive optimization algorithms that improve the best-known complexity guarantees for convex-concave optimization. I will describe first order primal-dual algorithms with random coordinate updates and discuss their complexity guarantees as well as practical adaptivity properties. I will then present an extragradient algorithm with stochastic variance reduction that harnesses the finite-sum min-max structure to obtain sharp complexity bounds.

Abstract: Recent years have witnessed tremendous progress in developing and analyzing quantum computing algorithms for quantum dynamics simulation of bounded operators (Hamiltonian simulation). However, many scientific and engineering problems require the efficient treatment of unbounded operators, which frequently arise due to the discretization of differential operators. Such applications include molecular dynamics, electronic structure theory, quantum control and quantum machine learning. We will introduce some recent advances in quantum algorithms for efficient unbounded Hamiltonian simulation, including Trotter type splitting and the quantum highly oscillatory protocol (qHOP) in the interaction picture. The latter yields a surprising superconvergence result for regular potentials. In the end, I will discuss briefly how Hamiltonian simulation techniques can be applied to a quantum learning task achieving optimal scaling. (The talk does not assume a priori knowledge on quantum computing.)

Abstract: Natural selection in complex biological and social systems can simultaneously operate across multiple levels of organization, ranging from genes and cells to animal groups and complex human societies. Such scenarios typically present an evolutionary conflict between the incentive of individuals to cheat and the collective incentive to establish cooperation within a group. In this talk, we will explore this conflict by considering a game-theoretic model of multilevel selection in a group-structured population featuring two types of individuals, cooperators and defectors. Assuming that individuals compete based on their payoff and groups also compete based on the average payoff of group members, we consider how the distribution of cooperators within groups changes over time depending on the relative strength of within-group and between-group competition. In the limit of infinitely many groups and of infinite group size, we can describe the state of the population through the probability density of the fraction of cooperators within groups, and derive a hyperbolic partial differential equation for the changing composition of the population. We show that there exists a threshold relative selection strength such that defectors will take over the population for sufficiently weak between-group competition, while cooperation persists in the long-time population when the strength of between-group competition exceeds the threshold. Surprisingly, when groups are best off with an intermediate level of within-group cooperation, individual-level selection casts a long shadow on the dynamics of multilevel selection: no level of between-group competition can erase the effects of the individual incentive to defect. This is joint work with Yoichiro Mori.

Abstract:

Machine learning technologies have been increasingly used in high-stakes decision making systems,

which raises a new challenge of avoiding unfair and discriminatory decisions for protected classes.

Among the techniques for improving the fairness of AI systems, the optimization-based method, which

trains a model through optimizing its prediction performance subject to fairness constraints, is most

popular because of its intuitive idea and the Pareto efficiency it provides when trading off prediction

performance against fairness. The criteria of fairness based on the area under the ROC curve (AUC) are

emerging recently because they are threshold-agnostic and effective for unbalanced data. In this work,

we formulate the training of a machine learning model under the AUC-based fairness constraints into a

min-max optimization problem with min-max constraints. Based on this formulation, we develop

stochastic first-order methods for learning predictive models with a balance between accuracy and

fairness. We present the theoretical complexity of the obtained methods and numerically demonstrate

their effectiveness on real-world data under different fairness metrics.