The Mathematical Sciences Colloquium series is held each semester, generally on Mondays at 4pm, and is sponsored by the math department. Faculty in the math department invite speakers from all areas of mathematics, and the talks are open to all members of the RPI community. The calendar is organized by the colloquium chair Yangyang Xu.

### 2021

Abstract: **Reinforcement Learning (RL) **has attracted considerable interest from both industry and academia recently. The study of RL algorithms with provable rates of convergence, however, is still in its infancy. In this talk, we discuss some recent progresses on the solutions of two fundamental RL problems, i.e., stochastic policy evaluation and policy optimization. For** policy evaluation, **prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. We address all these issues by developing novel analysis of TD, and introducing new algorithms, including conditional TD algorithm (CTD) and fast FD (FTD) algorithm to achieve the best-known so-far convergence rate for policy evaluation...

Abstract: Operator splitting methods for convex optimization and monotone inclusions have their roots in the solution of partial differential equations, and have since become popular in machine learning and image processing applications. Their application to "operations-research-style" optimization problems has been somewhat limited.

A notable exception is their application to stochastic programming. In a paper published in 1991, Rockafellar and Wets proposed the progressive hedging (PH) algorithm to solve large-scale convex stochastic programming problems. Although they proved the convergence of the method from first principles, it was already known to them that PH was an operator splitting method.

This talk will present a framework for convex stochastic programming and show that applying the ADMM (and thus Douglas-Rachford splitting) to it yields the PH algorithm. The equivalence of PH to ADMM has long been known but not explicitly published.

Next, the talk will apply the projective splitting framework of Combettes and Eckstein to the same formulation, yielding a method which is similar to PH but can be implemented in an aynchronous manner. We call this method "asynchronous projective hedging" (APH). Unlike most decomposition methods, it does not need to solve every subproblem at every iteration; instead, each iteration may solve just a single subproblem or a small subset of the available subproblems.

Finally, the talk will describe work integrating the APH algorithm into mpi-sppy, a Python package for modeling and distributed parallel solution of stochastic programming problems. Mpi-sppy uses the Pyomo Python-based optimization modeling sytem. Our experience includes using 8,000 processor cores to solve a test problem instance with 1,000,000 scenarios.

Portions of the work described in this talk are joint with Patrick Combettes (North Carolina State University), Jean-Paul Watson (Lawrence Livermore National Laboratory, USA), and David Woodruff (University of California, Davis).

### 2020

Abstract:

This talks reviews augmented Lagrangian type methods for solving cone convex (and hence including linearly) constrained

nonconvex composite optimization (CCC-NCO) problems and discusses their corresponding complexity results. It then describes

a more recent inner accelerated proximal inexact augmented Lagrangian (NL-IAPIAL) method for solving the CCC-NCO problem which

consists of: a) applying an inexact proximal point method to approximately solve a sequence of quadratic penalty subproblems

for increasing values of the penalty parameter, and; b) solving all the generated prox subproblems by an accelerated composite

gradient (ACG) method. It is shown that the method, started from any infeasible point $x_0$ generates a $\rho$-approximate

stationary point in ${\cal O}(\rho^{-3})$ ACG iterations. This improves upon results obtained for previous algorithms in that:

1) it allows for the presence of a composite term in the objective function; 2) the feasible region does not have to be bounded;

3) the initial point does not have to be feasible; and 4) substantially reduces the previously known iteration complexity of

${\cal O}(\rho^{-6})$.

Abstract: Deep representation learning seeks to learn a data representation that transfers to downstream tasks. In this talk, we study two forms of representation learning: supevised pre-training and self-supervised learning.

Supervised pre-training uses a large labled source dataset to learn a representation, then trains a classifier on top of the representation. We prove that supervised pre-training can pool the data from all source tasks to learn a good representation which transfers to downstream tasks with few labeled examples.

Self-supervised learning creates auxilary pretext tasks that do not require labeled data to learn representations. These pretext tasks are created solely using input features, such as predicting a missing image patch, recovering the color channels of an image, or predicting missing words. Surprisingly, predicting this known information helps in learning a representation effective for downtream tasks. We prove that under a conditional independence assumption, self-supervised learning provably learns representations.

Abstract: The quadratic Wasserstein distance has recently been proposed as an alternative to the classical $L^2$ distance for measuring data mismatch in computational inverse problems. Extensive computational evidences showing the advantages of using the Wasserstein distance has been reported. The objective of this talk is to provide some simple observations that might help us explain the numerically-observed differences between results based on the quadratic Wasserstein distance and those based on the $L^2$ distance for general linear and nonlinear inverse problems.

Abstract: We will consider the inverse problem of determining the sound speed or index of refraction of a medium by measuring the travel times of waves going through the medium. This problem arises in global seismology in an attempt to determine the inner structure of the Earth by measuring travel times of earthquakes. It has also several applications in optics and medical imaging among others.The problem can be recast as a geometric problem: Can one determine the Riemannian metric of a Riemannian manifold with boundary by measuring the distance function between boundary points? This is the boundary rigidity problem. The linearization of this problems involves the integration of a tensor along geodesics, similar to the X-ray transform. We will also describe some recent results, joint with Plamen Stefanov and Andras Vasy, on the partial data case, where you are making measurements on a subset of the boundary. No previous knowledge of Riemannian geometry will be assumed.

Abstract: Stochastic gradient and related methods for solving stochastic optimization problems have been studied extensively in recent years. In addition, it has been shown that such algorithms and much of their convergence and complexity guarantees extend in straightforward ways when one considers problems involving simple constraints, such as when one can perform projections onto the feasible region of the problem. However, settings with general nonlinear constraints have received less attention, and many of the approaches that have been proposed for solving such problems resort to using penalty or (augmented) Lagrangian methods, which are often not the most effective strategies. In this work, we propose and analyze stochastic optimization methods based on the sequential quadratic optimization (commonly known as SQP) methodology. We discuss advantages and disadvantages of such techniques. This is joint work with Albert Berahas, Daniel P. Robinson, and Baoyu Zhou.

Graph regularization plays an important role in electroencephalography (EEG) source imaging and hyperspectral unmixing, both are challenging and ill-posed inverse problems. Specifically for EEG brain source localization, we develop a graph Laplacian-placed model that explicitly takes into account the label information (happiness, sadness, surprise, etc). We also incorporate some modern regularizations, such as the L1 norm, total variation, and nuclear norm (low rank) to further improve the accuracy. Ont the other hand, we consider the use of the graph total variation (gTV) regularization for blind hyperspectral unmixing, which is about identifying the pure spectra of individual materials(i.e., endmembers) and their proportaions (i.e., abundances) at each pixel. To further alleviate the computational cost, we apply the Nystrom method to approximate a fully-connected graph by a small subset of samplet points and adopt the Merriman-Bence-Osher (MBO) scheme to solve the gTV-involved subproblem by decomposing a grayscale image into a bit-wise form...

Abstract: Abstract: Many data in real-world applications lie in a high-dimensional space but are concentrated on or near a low-dimensional manifold. Our goal is to estimate functions on the manifold from finite samples of data. This talk focuses on an efficient approximation theory of deep ReLU networks for functions supported on low-dimensional manifolds. We construct a ReLU network for such function approximation where the size of the network grows exponentially with respect to the intrinsic dimension of the manifold. When the function is estimated from finite samples, we proved that the mean squared error for the function approximation converges as the training samples increases with a rate depending on the intrinsic dimension of the manifold instead of the ambient dimension of the space. These results demonstrate that deep neural networks are adaptive to low-dimensional geometric structures of data. This is a joint work with Minshuo Chen, Haoming Jiang, Tuo Zhao (Georgia Institute of Technology).

Abstract: Quantized neural networks are attractive due to their inference efficiency. However, training quantized neural networks involves minimizing a discontinuous and piecewise constant loss function. Such a loss function has zero gradient almost everywhere (a.e.) with respect to weight variables, which makes the conventional back-propagation inapplicable. To this end, we study a class of biased first-order oracle, termed coarse gradient, for overcoming the vanishing gradient issue in the regression and classification settings, respectively. In addition, we propose an efficient computational method featuring a blended coarse gradient step, for network quantization, which achieves the state-of-the-art accuracies in image classification without extensive tuning of hyper-parameters.

Abstract:

We introduce a novel idea, the WaveHoltz iteration, for solving the Helmholtz equation inspired by recent work on exact controllability (EC) methods. As in EC methods our method make use of time domain methods for wave equations to design frequency domain Helmholtz solvers but unlike EC methods we do not require adjoint solves. We show that the WaveHoltz iteration we propose is symmetric and positive definite in the continuous setting. We also present numerical examples, using various discretization techniques, that show that our method can be used to solve problems with rather high wave numbers...

Abstract:

"We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize his regret by exploiting this information, where the regret is its performance difference against a benchmark policy which knows the best action ahead of time. In the absence of structural information, the classical UCB and Thomson sampling algorithms are well known to suffer only minimal regret. Neither algorithms is, however, capable of exploiting structural information which is commonly available in practice. We propose a novel learning algorithm which we call ``DUSA'' whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual/robust counterpart of regret lower bound at the empirical reward distribution and follows the suggestion made by the dual problem. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that suffers asymptotic minimal regret.

### 2019

Abstract: We present a new statistical framework to quantify uncertainty (UQ) for recovering low-rank matrices from incomplete and noisy observations. We further develop a sequential active sampling approach guided by the uncertainties. The motivation comes from two related and widely studied problems, matrix completion, which aims to recover a low-rank matrix X from a partial, noisy observation of its entries, and low-rank matrix recovery, which recovers X from a set of linear combination its entries with additive noise. The proposed framework reveals several novel insights on the role of coherence and uncertainty quantification for noisy matrix completion and recovery. Using such insights, we develop an efficient posterior sampler for UQ, which is then used to guide a closed-form sampling scheme for matrix entries. We showed the competitive performance of this integrated sampling / UQ methodology in simulation studies and applications to collaborative filtering and imaging compared with existing approaches. This is joint work with Simon Mak at Duke University and Shaowu Yuchi at Georgia Tech.

__Abstract:__ The primary visual cortex (V1) of mammals is an area of the brain that receives and interprets visual input. Neurons in V1 respond preferentially to a particular orientation angle of a visual stimulus. In mammals such as cats, V1 contains an ordered map of the orientation preference of each neuron, with cells preferring similar angles residing close to one another. In mice, however, the map of orientation preference appears random and disordered, with little correlation between preferred orientation and location in cortical space. Though much is known about orientation-preference maps in adult mammals, the mechanism underlying the formation of these maps during development is still unknown. In particular, I am interested in understanding under which circumstances does the map that forms appear ordered (like in cats) or disordered (like in mice). In this talk, I will discuss a mathematical model used to describe the development of neuronal networks in V1 and suggest a testable hypothesis for the mechanism underlying the formation of either an ordered or disordered orientation-preference map.

Abstract: Randomized algorithms and greedy algorithms are both widely used in practice. In this talk, I'll present a random-then-greedy procedure, and showcase how it improves two important machine learning algorithms: stochastic gradient descent and gradient boosting machine. In the first part of the talk, we propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems...

Abstract: We develop two efficient stochastic gradient-based algorithms to solve a class of composite nonconvex optimization problems that covers both finite-sum and expectation settings. In the first part of the talk, we propose a new stochastic proximal gradient framework that utilizes a well-known biased stochastic estimator called SARAH introduced by Nguyen et all 2017. The algorithm consists of two steps: a proximal gradient and an everaging step making it different from existing nonconvex proximal-type algorithms. It only requires an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. It works with both constant and adaptive step-sizes, while allowing single-sample and minibatches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. In the second part of this talk, we introduce a novel hybrid approach to form new stochastic estimators for objective functions and propose a hybrid stochastic proximal gradient method to solve composite nonconvex optimization problems...

Abstract:

The prediction of interactions between nonlinear laser beams is a longstanding open problem. A traditional assumption is that these interactions are deterministic. We have shown, however, that in the nonlinear Schrodinger equation (NLS) model of laser propagation, beams lose their initial phase information in the presence of input noise. Thus, the interactions between beams become unpredictable as well.

Computationally, these predictions are enabled through a novel spline-based stochastic computational method. Our algorithm efficiently estimates probability density functions (PDF) that result from differential equations with random input. This is a new and general problem in numerical uncertainty-quantification (UQ), which leads to surprising results and analysis at the intersection of probability and transport theory.

**Amir Sagiv** is a Chu Assistant Professor at Columbia University's department of Applied Physics and Applied Mathematics. Before that, he completed his Ph.D. in Applied Mathematics at Tel Aviv University under the supervision of Gadi Fibich and Adi Ditkowski, and earned his B.Sc. in Mathematics and Physics at the Hebrew University of Jerusalem.

Abstract:

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. The premise is that despite nonconvexity, the loss function may possess benign geometric properties that enable fast global convergence under carefully designed initializations. In many sample-starved problems, this benign geometry only holds over a restricted region of the entire parameter space with certain structural properties, yet gradient descent seems to follow a trajectory staying within this nice region without explicit regularizations, thus is extremely computationally efficient and admits strong statistical guarantees. In this talk, we will formally establish this “implicit regularization” phenomenon of gradient descent for the fundamental problem of estimating low-rank matrices from noisy incomplete, rank-one, or bilinear measurements, by exploiting statistical modeling in analyzing iterative optimization algorithms via a leave-one-out perturbation argument. We will also discuss an approach called on median-guided truncated gradient descent which is provably resilient to outliers while maintaining computational and statistical efficiency.

Abstract:

Fast-slow systems are notoriously difficult to simulate. Numerical stability limitations force explicit integrators to waste cycles resolving the fast dynamics. Implicit integrators often have the stability to step over the daunting short timescale, but suffer from poorly conditioned nonlinear solves generating large timesteps, as well as accuracy issues when stepping over potentially-relevant short-timescale dynamics. In this talk I will propose a general strategy for developing improved implicit integrators that exploit a fast-slow system's slow manifold. These slow manifold integrators enjoy well-conditioned nonlinear solves, and useful error estimates.

Specifically, errors are well-controlled _near the slow manifold_. After presenting the general approach, I will discuss a nontrivial application of slow manifold integration to charged particle dynamics in a strong magnetic field.

We interpret the formation of rogue waves on the surface of deep water using tools from large deviation theory and optimal control. We compute the instantons of the problem, i.e. the most likely realizations leading to extreme surface elevations via the governing nonlinear dynamics. Strikingly, the larger waves closely follow the instanton evolution, with small extra fluctuations. Our results are validated by Montecarlo and by real experimental data in a wave flume accross a wide range of forcing regimes, generalizing the existing theories in the limiting linear and highly-nonlinear cases.

The results are obtained in the one-dimensional set-up of the flume, but the method is general and can be extended to the fully two-dimensional case of the ocean. In principle, the framework is exportable to other nonlinear physical systems, to study the mechanisms underlying the extreme events and assess their risk.

Abstract: The capability of using imperfect statistical reduced-order

models to capture crucial statistics in turbulent flows is investigated.

Much simpler and more tractable block-diagonal models are proposed to

approximate the complex and high-dimensional turbulent flow equations. A

systematic framework of correcting model errors with empirical information

theory is introduced, and optimal model parameters under this unbiased

information measure can be achieved in a training phase before the

prediction...

Abstract: It is cliché to mimic biological design rules in synthetic materials, yet this is the precise challenge for regenerative medicine, therapies for disease pathologies, and vaccines. To design and engineer solutions to biological dysfunction, it is essential to understand Nature’s design rules for successful function. Today, we have data, amazing data, from advances in super-resolution (spatial and temporal) microscopy, targeted fluorescent signaling, chemical synthesis, and various passive and active probes of living systems. I will introduce two biological systems that rely on transient, short-lived, binding interactions to perform diverse functionalities: the genome in live cells with the requirement of genes to self-organize; and, the mucus barriers covering every organ with the requirement to regulate diffusive transport of foreign species within and to flow in order to clear all trapped insults. Time permitting, I will mention other examples. Each system is explored through feedback between experimental data, data analytics, mechanistic modeling and computation, and visualization of experimental and simulated data. Many collaborators will be acknowledged in the lecture.

Abstract: The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a target high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. This mapping gives rise to a deterministic partial differential equation that governs the parameter evolution under gradient descent dynamics. We also show that in the limit that the number of parameters n is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales link n^{-1}. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. This conceptual framework can be leveraged to develop algorithms that accelerate optimization using non-local transport. I will conclude by showing that using neuron birth/death processes in parameter optimization guarantees global convergence and provides a substantial acceleration in practice.

Many of the mechanisms that underlie intracellular transport by molecular motors are well-studied by both experiments and mathematical modeling. This is especially true of cargos that are being transported by individual motors and that are perturbed by certain kinds of external forces. However, our understanding of single-motor dynamics does not yet yield robust predictions for what has been observed at a whole cell scale when multiple opposite-directed motors interact with each other. In this talk, we will explore recent mathematical modeling and Bayesian inferential efforts that are directed at new experiments that are aimed at revealing multi-motor interactions in vitro.

Abstract: Graphs are universal representations of pairwise relationship. A trending topic in deep learning is to extend the remarkable success of well-established neural network architectures (e.g., CNN and RNN) for Euclidean structured data to irregular domains, including notably, graphs. A proliferation of graph neural networks (e.g., GCN) emerged recently, but the scalability challenge for training and inference persists. The essence of the problem is the prohibitive computational cost of computing a mini-batch, owing to the recursive expansion of neighborhoods. We propose a scalable approach, coined FastGCN, based on neighborhood sampling to reduce the mini-batch computation. FastGCN achieves orders of magnitude improvement in training time, compared with a standard implementation of GCN. Predictions remain comparably accurate. A curious question for this approach is why stochastic gradient descent (SGD) training ever converges. In the second part of this talk, we analyze that the gradient estimator so computed is not unbiased but consistent. We thus extend the standard SGD results for unbiased gradients to consistent gradients and show that their convergence behaviors are similar. These results are important and may spawn new interest in the machine learning community, since in many learning scenarios unbiased estimators may not be efficient to compute, and hence other nonstandard but fast gradient estimators serve as sound alternatives.

Abstract: Information retrieval from graphs plays an increasingly important role in data science and machine learning. This talk focuses on two such examples. The first one concerns the graph cuts problem: how to find the optimal k-way graph cuts given an adjacency matrix. We present a convex relaxation of ratio cut and normalized cut, which gives rise to a rigorous theoretical analysis of graph cuts. We derive deterministic bounds of finding the optimal graph cuts via a spectral proximity condition which naturally depends on the intra-cluster and inter-cluster connectivity. Moreover, our theory provides theoretic guarantees for spectral clustering and community detection under stochastic block model...

Abstract: In this talk, we concern the problems of consensus optimization and resource allocation, and how to solve them in a decentralized manner. These two problems are known to be closely related to empirical risk minimization in machine learning and management problems in operations research, respectively. By saying “decentralized”, we mean that the tasks are to be completed over a set of networked agents in which each agent is able to communicate with adjacent agents. For both problems, every agent in the network wants to collaboratively minimize a function that involves global information, while only a piece of information is available to each of them.

### 2018

Spatially compatible discretization is a term that broadly encapsulates numerical discretizations of PDEs that possess some mimetic property of the continuum operators, such as conservation, maximum principles, H(div)/H(curl) - conformity, or discrete preservation of an exact sequence. The construction of such methods is greatly facilitated by employing the exterior calculus framework and the duality between the boundary operators and the exterior derivative in the generalized Stokes theorem. In a finite element context, such tools form the so-called finite element exterior calculus (FEEC) which have unified mixed finite element theory. Meshfree methods, on the other hand, seek to solve a PDE strictly in terms of point evaluations (0-forms) to facilitate automated geometry discretization and large deformation Lagrangian mechanics...

The evolution, through spatially periodic linear dispersion, of rough initial data produces fractal, non-differentiable profiles at irrational times and, for asymptotically polynomial dispersion relations, quantized structures at rational times. Such phenomena have been observed in dispersive wave models, optics, and quantum mechanics, and lead to intriguing connections with exponential sums arising in number theory. Ramifications and recent progress on the analysis, numerics, and extensions to nonlinear wave models, both integrable and non-integrable, will be presented.

Abstract: Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this talk, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Legendre-Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided.

Recently Faou-Germain-Hani introduced a continuous resonance (CR) equation applying systematically a variant of weak turbulence theory approach to a nonlinear Schroedinger equation (NLS). They obtained CR equation as a large box limit of two dimensional NLS in a weakly nonlinear regime. We consider one dimensional version of CR equation and investigate critical points and their stability. For three dimensional CR equation, we argue that the Gaussian is a ground state by a combination of numerical and analytical methods. This is a joint work with Gene Wayne (Boston University).

"Applications of Renewal Theory to Intracellular Transport”

Abstract:

Intracellular transport, especially in axons, consists primarily of different molecular motors moving along microtubules with cargos in tow. Biochemical processes at the nanoscale control the dynamics of the motors. Changes in the behavior of the motors (speed, diffusivity, processivity, etc) can then alter the distribution and dynamics of the population of transported cargos at the scale of several microns. Therefore, an important element in understanding cellular regulation of transport is the interaction between these motor-level and cell-level scales.

In this talk, I will discuss how renewal and renewal-reward processes are used to build multi-scale models of cellular transport, connecting fine-scale biophysical models (typically Markovian) to coarse-scale models that are more relevant both for experimental observations and for understanding transport at the cell level.

Abstract: The Lagrangian method is widely used in many fields for multi-material flow simulations due to its distinguished advantage in capturing material interfaces and free boundary automatically. In applications such as astrophysics and inertial confinement fusion, there are three-dimensional cylindrical-symmetric multi-material problems which are usually simulated by the Lagrangian method in the two-dimensional cylindrical coordinates. For this type of simulation, the critical issues for the schemes include keeping positivity of physically positive variables such as density and internal energy and keeping spherical symmetry in the cylindrical coordinate system if the original physical problem has this symmetry.

Abstract: Inverse Problems on graphs encompass many areas of physics, algorithms and statistics, and are a confluence of powerful methods, ranging from computational harmonic analysis and high-dimensional statistics to statistical physics. Similarly as with inverse problems in signal processing, learning has emerged as an intriguing alternative to regularization and other computationally tractable relaxations, opening up new questions in which high-dimensional optimization, neural networks and data play a prominent role. In this talk, I will argue that several tasks that are ‘geometrically stable’ can be well approximated with Graph Neural Networks, a natural extension of Convolutional Neural Networks on graphs. I will present recent work on supervised community detection, quadratic assignment, neutrino detection and beyond showing the flexibility of GNNs to extend classic algorithms such as Belief Propagation.

As biotechnologies for data collection become more efficient and mathematical modeling becomes more ubiquitous in the life sciences, analyzing both high-dimensional experimental measurements and high-dimensional spaces for model parameters is of the utmost importance. We present a perspective inspired by differential geometry that allows for the exploration of complex datasets such as these. In the case of single-cell leukemia data we present a novel statistic for testing differential biomarker correlations across patients and within specific cell phenotypes. A key innovation here is that the statistic is agnostic to the clustering of single cells and can be used in a wide variety of situations. Finally, we consider a case in which the data of interest are parameter sets for a nonlinear model of signal transduction and present an approach for clustering the model dynamics. We motivate how the aforementioned perspective can be used to avoid global bifurcation analysis and consider how parameter sets with distinct dynamic clusters contrast.

`The nucleus is the organizing center of a cell. We use multi-scale modeling to understand how dozens of nuclei in multi-nucleated muscle`

cells position themselves and adapt their size. Positioning mechanisms involve cytoskeletal fibers, called microtubules,that interact with molecular motors to create forces. We perform large scale computational force screens with hundreds of coarse models to predict nuclear positions...

### 2017

In the terahertz frequency range, the effective (complex-valued) surface conductivity of atomically thick 2D materials such as graphene has a positive imaginary part that is considerably larger than the real part. This feature allows for the propagation of slowly decaying electromagnetic waves, called surface plasmon-polaritons (SPPs), that are confined near the material interface with wavelengths much shorter than the wavelength of the free-space radiation.

Big data are often created by aggregating multiple data sources and modeled as large-scale attributed networks. Many applications of big data analytics are concerned of discovering anomalous patterns (subnetworks) that are interesting or unexpected, such as detection of disease outbreaks, subnetwork biomarkers, network intrusions, cyber threats, societal events, among others.

Leaky oil droplets that are self-propelling due to their created concentration gradient form an ideal system for studying collective behavior. I will present a simple model that can be reduced to a system of non-Markov stochastic differential equations, allowing for analytical results that match the observed experimental system. The particles' interactive force is observed through their hovering above a bottom plate and their repelling nature. The model also displays a regime of super-diffusive scaling likely related to the mobility transition to a constant velocity solution (of the deterministic system). The single non-dimensional parameter in the model controls the history of interaction, allowing the system to go from having complete memory to behaving like interacting electrostatic potentials.

All numerical calculations will fail to provide a reliable answer unless the continuous problem under consideration is well posed. Well-posedness depends in most cases only on the choice of boundary conditions. In this paper we will highlight this fact, and exemplify by discussing well-posedness of a prototype problem: the time-dependent compressible Navier–Stokes equations. We do not deal with discontinuous problems, smooth solutions with smooth and compatible data are considered.

In particular, we will discuss how many boundary conditions are required, where to impose them and which form they should have in order to obtain a well posed problem. Once the boundary conditions are known, one issue remains; they can be imposed weakly or strongly. It is shown that the weak and strong boundary procedures produce similar continuous energy estimates. We conclude by relating the well-posedness results to energy-stability of a numerical approximation on summation-by-parts form. It is shown that the results obtained for weak boundary conditions in the well-posedness analysis lead directly to corresponding stability results for the discrete problem, if schemes on summation-by-parts form and weak boundary conditions are used.

The analysis in this paper is general and can without difficulty be extended to any coupled system of partial differential equations posed as an initial boundary value problem coupled with a numerical method on summation-by parts form with weak boundary conditions. Our ambition in this paper is to give a general roadmap for how to construct a well posed continuous problem and a stable numerical approximation, not to give exact answers to specific problems.

### 2016

There are numerous and diverse challenges associated with analyzing data collected from different fields of science and engineering. This talk consists of two parts. First, time-dependent oscillatory signals occur in a wide range of fields, including geophysics, biology, medicine, finance and social dynamics. Of great interest are techniques that decompose the time-dependent signals into multiple oscillatory components, with time-varying amplitudes and instantaneous frequencies. Such decompositions can help us better describe and quantify the underlying dynamics that govern the system. I will present a new advance in time-frequency representations whose effectiveness is justified by both numerical experiments and theoretical analysis. Second, the high-dimensionality of point cloud data makes investigating such data difficult. Fortunately, these data often locally concentrate along a low-dimensional subspace and this makes the problem more tractable. I will talk about utilizing low-dimensional structures for various data analysis objectives, ranging from recovering the underlying data in the presence of complex noise including Gaussian additive noise and large sparse corruptions, to recognizing subspace-based patterns in data, from robust algorithm design to theoretical analysis. The techniques for learning subspaces have broad applications: image processing, computer vision, bioinformatics, medicine, etc. At the end, I will talk about some future directions where both fields are involved.

Technological advances in data acquiring and storage have led to a rapid proliferation of big data in diverse areas such as Internet search, information technology, healthcare, biology, and engineering. The data involved in many of these applications are large and growing faster than the development of modern computer. This talk present ways to handle these large amount of data. The strategies include sampling small amount of data, breaking variables into small blocks, and performing parallel computing. In the first part, a block stochastic gradient (BSG) method will be introduced. BSG inherits the advantage of both stochastic gradient (SG) and block coordinate descent (BCD) methods, and it performs better than each of them individually. The second part of this talk will present an asynchronous parallel block coordinate update (ARock) method for fixed-point problems, which abstract many applications such as solving linear equations, convex optimization, statistical learning, and optimal control. Compared to its synchronous counterpart, ARock eliminates idle time, reduces memory-access congestion, and has perfect load balance. Numerical results show that ARock can achieve almost linear speed-up while the synchronous methods may suffer from load imbalance and have very bad speed-up.

Low rank models exist in many applications, ranging from signal processing to data analysis. Typical examples include low rank matrix completion, phase retrieval, and spectrally sparse signal reconstruction. We will present a class of computationally efficient algorithms which are universally applicable for those low rank reconstruction problems. Theoretical recovery guarantees will be established for the proposed algorithms under different random models, showing that the sampling complexity is essentially proportional to the intrinsic dimension of the problems rather the ambient dimension. Extensive numerical experiments demonstrate the efficacy of the algorithms.

Inspired by real-world networks consisting of layers that encode different types of connections, such as a social network at different instances in time, we study community structure in multilayer networks. We analyze fundamental limitations on the detectability of communities by developing random matrix theory for the dominant eigenvectors of modularity matrices that encode an aggregation of network layers. Aggregation is often beneficial when the layers are correlated, and it represents a crucial step for the discretization of time-varying network data, whereby layers are binned into time windows. We explore two methods for aggregation: summing the layers? adjacency matrices as well as thresholding this summation at some value. We develop theory for both large- and small-scale communities and analyze detectability phase transitions that are onset by varying either the density of within-community edges or community size. We identify layer-aggregation strategies that are optimal in that they minimize the detectability limit. Our results indicate good practices in the context of community detection for how to aggregate network layers, threshold pairwise-interaction data matrices, and discretize time-varying network data. We apply these results to synthetic and empirical networks, including a study of anomaly detection for the Enron email corpus.

Photo-acoustic tomography (PAT) and thermo-acoustic tomography (TAT) are novel hybrid modalities in medical imaging. A hybrid modality combines a high-resolution physical phenomenon and a high-contrast one in an aim to preserve the advantages of both. In PAT and TAT, high-resolution ultrasound wave is coupled with high-contrast optical or electromagnetic wave through the photo-acoustic effect. The study of their mathematical models can be divided into two steps. The first step concerns recovery of the radiation absorbed by tissues from the boundary measurement of ultrasound signals. This amounts to solving an inverse source problem for the acoustic wave equation. The second step consists of recovering optical or electromagnetic parameters of tissues from the absorbed radiation. This leads to inverse problems with internal measurement. In this talk, we will discuss the models underlying PAT and TAT and obtain several results concerning uniqueness, stability, and reconstructive procedures of these inverse problems.

In this talk, I propose several efficient, reliable, and practical computational algorithms to solve challenging optimization problems arising in medical imaging and image processing. These problems are non-differentiable, and ill-conditioned, non-convex, and/or highly nonlinear, that traditional sub-gradient based methods converge very slowly. To tackle the computational complexities, I use relaxation and approximation techniques. In addition, I exploit splitting variables and alternating direction method of multipliers to decouple the original challenging problems into subproblems which are easier to solve. To obtain fast results, I develop innovative line search strategies and solve the subproblems by Fourier transforms and shrinkage operators. I present the analytical properties of these algorithms as well as various numerical experiments on parallel Magnetic Resonance imaging, image inpainting, and image colorization. The comparison with some existing state-of-art methods are given to show the efficiency and the effectiveness of the proposed methods.

LIGO’s detection of gravitational waves from a binary black hole merger inaugurates a completely new mode of observational astronomy and represents the culmination of a quest lasting half a century. After a brief review of gravitational waves in general relativity, I will discuss the detection itself. How do the LIGO instruments work? How do we know the signal was caused by a binary black hole merger? What does this detection tell us about binary black holes? Then I will focus on how this moment came to pass. The detection required many ingredients to be in place including (1) developments in theoretical relativity to allow proof that gravitational waves were not coordinate artifacts; (2) a bold vision to recognize that gravitational wave detection was not impossible; (3) technological developments of novel vacuum systems, lasers, optical coatings, active seismic isolation, etc.; (4) the successful conclusion of a 35 year effort to simulate binary black holes on the computer; (5) development of sophisticated, new data analysis methods to tease a waveform from noisy data; (5) the growth of the field of gravitational wave science from a handful of practitioners to the more than 1000 authors on the detection paper; and finally (6) the (nearly) unwavering support of the National Science Foundation. The first detection was followed by a second one in this first "science run" and soon another science run will begin. I will end with discussion of the future — more binary black holes, other sources of gravitational waves and what we might learn, instrument upgrades, new facilities — and other ways to detect gravitational waves — from space and from monitoring millisecond pulsars.

Wave breaking in deep oceans is a challenge that still defies complete scientific understanding. Sailors know that at wind speeds of approximately 5m/sec, the random looking windblown surface begins to develop patches of white foam ('whitecaps') near sharply angled wave crests. We idealize such a sea locally by a family of close to maximum amplitude Stokes waves and show, using highly accurate simulation algorithms based on a conformal map representation, that perturbed Stokes waves develop the university feature of an overturning plunging jet. We analyze both the cases when surface tension is absent and present. In the latter case, we show the pluning jet is regularized by capillary waves which rapidly become nonlinear Crapper waves in whose trough pockets whitecaps may be spawned.

The "particle-in-cell" (PIC) method is a technique for solving kinetic PDEs that has been a standard simulation tool in plasma physics for 50 years. Originally, the method was an attempt to circumvent the curse of dimensionality when solving high-dimensional kinetic PDEs by combining particle- and grid-based representations. The technique has been enormously successful in many regards but even today, generating a quantitatively accurate solution in complex, three-dimensional geometry requires many hours on a massively parallel machine. Two prominent reasons for the massive complexity of PIC schemes are the statistical noise introduced by the particle representation and the fact that multiple disparate physical time-scales necessitate taking enormous numbers of time-steps. We present approaches to circumventing each of these difficulties. First, we propose the use of 'sparse grids' (see e.g. Griebel et al, 1990) to estimate grid-based quantities from particle information. We show that this can dramatically reduce statistical errors while only increasing grid-based error by a logarithmic factor. Second, we present a multilevel - in time - technique in the spirit of the multilevel Monte Carlo (MLMC) method (see e.g. Giles, 2008). The idea is to combine information from simulations using many particles and a large time step on the one hand with simulations using few particles and a small time step on the other. This is done in such a way as to generate a new solution that mimics one with many particles and a small time-step, but at dramatically reduced cost. Scalings of the computational complexity of PIC codes using each of these approaches will be discussed, and proof-of-principle results will be presented from solving the 4-D Vlasov-Poisson PDE. Finally, we will discuss the prospects for combining the two approaches, parallel issues, and other future directions.

Computational nanophotonics is one of the central tools of the science of light and photonic device engineering. It plays a crucial role in enabling optical technologies ranging from bio-sensing to quantum information processing. Up to the present, a plethora of various techniques and commercial software founded on conventional computational electromagnetics methods havebeen developed. After a brief review of previous work based on the innovative methods of transformation optics, I will present a new class of elliptic omnidirectional concentrators focusing light on a disk, a thin strip, or a rod. This study expands the theory of a circular omnidirectional concentrator—an ‘optical black hole’—previously developed by our team, and then experimentally demonstrated at the microwave, at optical spectral bands, and in acoustics. Our ray-tracing and full-wave simulations of new elliptic designs show flawless focusing and absorbing performance at complete acceptance angles.

Kernel-based non-linear dimensionality reduction methods, such as Local Linear Embedding (LLE) and Laplacian Eigenmaps, rely heavily upon pairwise distances or similarity scores, with which one can construct and study a weighted graph associated with the data set. When each individual data object carries structural details, the correspondence relations between these structures provide additional information that can be leveraged for studying the data set using the graph. In this talk, I will introduce the framework of Horizontal Diffusion Maps (HDM), a generalization of Diffusion Maps in manifold learning. This framework models a data set with pairwise structural correspondences as a fibre bundle equipped with a connection. We further demonstrate the advantage of incorporating such additional information and study the asymptotic behavior of HDM on general fibre bundles. In a broader context, HDM reveals the sub-Riemannian structure of high-dimensional data sets, and provides a nonparametric learning framework for data sets with structural correspondences. Mre generally, it can be viewed as geometric realization of synchronization problems. A synchronization problem for a group $G$ and a graph $\Gamma=\left(V, E\right)$ searches for an assignment of elements in $G$ to edges of $\Gamma$ so the overall configuration minimizes an energy functional under certain compatibility constraints; it is essentially a generalization to the non-commutative setting of the little Grothendieck problem. In this talk, I will also explain some recent work on the cohomological nature of this type of problems. Our interest in synchronization and diffusion geometry arises from the emerging field of automated geometric morphometrics. At present, evolutionary anthropologist using physical traits to study evolutionary relationships among living and extinct animals analyze morphological data extracted from carefully defined anatomical landmarks. Identifying and recording these landmarksis time consuming and can be done accurately only by trained morphometricians. This necessity renders these studies inaccessible to non-morphologists and causes phenomics to lag behind genomics in elucidating evolutionary patterns. This talk will also cover the application of our work to the automation of this morphological analysis in a landmark-free manner.

Nonlinear evolution PDEs are a central topic in mathematical research, not only due to their inner beauty and complexity but also thanks to their broad range of real-world applications, from physics and biology to finance and economics. The first part of this talk is devoted to a new approach develop- ed in collaboration with A.S. Fokas and A. Himonas for the well-posedness of initial-boundary value problems for such PDEs in one spatial dimension. In particular, it is shown that the nonlinear Schrödinger (NLS) and the Korteweg-de-Vries (KdV) equations are well-posed on the half-line with data in appropriate Sobolev spaces. The second part of the talk is concerned with the initial value problem for a nonlocal, nonlinear evolution PDE of Camassa- Holm type with cubic nonlinearity, which is integrable, admits periodic and non-periodic multi-peakon traveling wave solutions, and can be derived as a shallow water approximation to the celebrated Euler equations. Finally, the third part of the talk addresses a long-standing open question, namely the nonlinear stage of modulational instability (a.k.a. Benjamin-Feir instability), which is one of the most ubiquitous phenomena in nonlinear science. For all those physical systems governed by the focusing NLS equation, a precise characterization of the nonlinear stage of modulational instability is obtained by computing explicitly the long-time asymptotic behavior of the relevant initial value problem formulated with nonzero boundary conditions at infinity.”

Many physical systems admit mathematical models from contact geometry, and symmetries of the corresponding geometric structure provide the modeler with insights that can be obtained in no other way. In this talk I will introduce contact geometry through a selection of examples arising from fluid mechanics, Hamiltonian dynamics, and Riemannian geometry. Finally because contact geometry is defined using the language of differential forms, it may seem appropriate for only those problems that admit smooth formulations; however if time permits I will also explain the extension of smooth contact dynamics to topological dynamics.

The relatively recent introduction of viscosity solutions and the Barles-Souganidis convergence framework have allowed for considerable progress in the numerical solution of fully nonlinear elliptic equations. Convergent, wide-stencil finite difference methods now exist for a variety of problems. However, these schemes are defined only on uniform Cartesian meshes over a rectangular domain. We describe a framework for constructing convergent meshfree finite difference approximations for a class of nonlinear elliptic operators. These approximations are defined on unstructured point clouds, which allows for computation on non-uniform meshes and complicated geometries. Because the schemes are monotone, they fit within the Barles-Souganidis convergence framework and can serve as a foundation for higher-order filtered methods. We present computational results for several examples including problems posed on random point clouds, computation of convex envelopes, obstacles problems, Monge-Ampere equations, and non-continuous solutions of the prescribed Gaussian curvature equation.

Most of dynamical processes are continuous, whereas in experiment, signals are often measured in the form of discrete spatiotemporal series and conclusions are drawn by analyzing these sampled signals. In this talk, I will illustrate two examples to show how different samplings may lead to artifact of data processing and provide corresponding approaches to extract the intrinsic properties from the underlying continuous processes. The first example is about analyzing spatiotemporal activities measured by voltage-sensitive-dye-based optical imaging in the primary visual cortex of the awake monkey. Through computational modeling, we show that our model can well capture the phenomena observed in experiment and can separate them from those statistical effects arising from spatial averaging procedures in experiment. The second example is about analyzing Granger causality for information flow within continuous dynamical processes. We show that different sampling rate may potentially yield incorrect causal inferences and such sampling artifact can be present for both linear and nonlinear processes. We show how such hazards lead to incorrect network reconstructions and describe a strategy to obtain a reliable Granger causality inference.

Oscillations in the brain are associated with learning, memory, and other cognitive functions. Evidence shows that inhibitory neurons play an important role in brain oscillations. Yet, how various types of inhibitory neurons contribute to the generation of oscillations remains unclear. Here we address the issue of what mathematical tools can be used to reveal information flow accompanying oscillations in the brain. By recording inhibitory neurons in the hippocampus of freely behaving mice and using time-delayed mutual information, we identify two classes of inhibitory neurons whose firing activities share high mutual information with the slow theta-band (4-12 Hz) and the fast ripple-band (100-250 Hz) of local field potential, respectively. Information flow direction further suggests their distinct contribution to theta and ripple oscillations. In contrast, Granger Causality analysis fails here to infer the causality between activities of inhibitory neurons and hippocampal oscillations.

A large variety of observable phenomena are mathematically described as transitions between metastable states in a system with many degrees of freedom, such as magnetization reversals. Metastability refers the system spending extended periods of time relative to its natural time scale in localized regions of phase space, transiting infrequently between them. As a toy system for a nanomagnet, I investigate a Langevin equation which limits to a stochastic partial differential equation as the dimension goes to infinity. Consistent with an energy barrier viewpoint, I show how time-scale separation averaging can be used to describe mean transition times in a low-dimensional, low-damping, regime. For the infinite dimensional system, I show how metastability can be explained by an entropic barrier in phase space, despite transition times remaining exponential with the energy barrier height. The difference lies in the prefactor in front of the exponential term, and depend on an effective dimension of the system.

As events of the past decade have tragically demonstrated, tsunamis pose a major risk to coastal populations around the world. Numerical modeling is an important tool in better understanding past tsunamis and their geophysical sources, in real-time warning and evacuation, and in assessing hazards and mitigating the risk of future tsunamis. I will discuss a variety of techniques from adaptive mesh refinement to probabilistic hazard analysis that are being used for tsunamis and related geophysical hazards.

### 2015

Cancer is a class of diseases that are characterized by abnormal cell growth and the ability to spread to other parts of the body. Different combinations of genetic mutations cause different types of cancer, and identifying the combinations of mutations responsible for cancer is essential for finding more effective treatments. Identifying these mutations, which necessitates separating driver mutations from a much larger number of passenger mutations, is a difficult task. However, the advent of inexpensive next-generation sequencing techniques, coupled with the development of novel algorithms that incorporate areas of biology, computer science, and mathematics, provides the potential for more personalized and more targeted cancer treatments. In this talk, we first briefly review the biology of cancer. We then overview various computational methods for identifying driver mutations in cancer along with their mathematical motivations. We finally explore our work on a particular computational technique for identifying groups of driver mutations using biological networks and mutation data.

Problems governed by wave propagation span much of the physical phenomena we experience. Thus the development of better tools for simulating waves has the potential for significant impact. Crucial components of an effcient time-domain solver are robust high-resolution volume discretizations applicable in complex geometry. Our focus is on high-order energy stable volume discretization methods applicable on hybrid grids. In particular we will discuss a new formulation of upwind discontinuous Galerkin methods for wave equations in second order form, Galerkin methods on structured grids, and methods built from Hermite interpolation.

For large scale nonsmooth convex optimization problems, first order methods involving only the subgradients are usually used thanks to their scalability to the problem size. Douglas-Rachford (DR) splitting is one of the most popular first order methods in practice. It is well-known that DR applied on dual problem is equivalent to the widely used alternating direction method of multipliers (ADMM) in nonlinear mechanics and the split Bregman method in image processing community. When DR is applied to convex optimization problems such as compressive sensing, one interesting question of practical use is how the parameters in DR affect the performance. We will show an explicit formula of the sharp asymptotic convergence rate of DR for the simple L1 minimization. The analysis will be verified on examples of processing seismic data in Curvetlet domain. This is a joint work with Prof. Laurent Demanet at MIT

Computer-assisted or automated analysis of atomic-scale resolution image for polycrystalline materials has important applications in characterizing and understanding material micro-structure. In this talk, we will discuss some recent progress in crystal image analysis using 2D synchrosqueezed transforms combined with variational approaches. This talk is based on joint works with Benedikt Wirth, Haizhao Yang and Lexing Ying.