### 2021

Abstract: We propose a deep learning approach for wave propagation in media with multiscale wave speed, using a second-order linear wave equation model. We use neural networks to enhance the accuracy of a given inaccurate coarse solver, which under-resolves a class of multiscale wave media and wave fields of interest. Our approach involves generating training data by the given computationally efficient coarse solver and another sufficiently accurate solver, applied to a class of wave media (described by their wave speed profiles) and initial wave fields. We find that the trained neural networks can approximate the nonlinear dependence of the propagation on the wave speed as long as the causality is appropriately sampled in training data. We combine the neural-network-enhanced coarse solver with the parareal algorithm and demonstrate that the coupled approach improves the stability of parareal algorithms for wave propagation and improves the accuracy of the enhanced coarse solvers.

Abstract: Reinforcement Learning (RL) has attracted considerable interest from both industry and academia recently. The study of RL algorithms with provable rates of convergence, however, is still in its infancy. In this talk, we discuss some recent progresses on the solutions of two fundamental RL problems, i.e., stochastic policy evaluation and policy optimization. For policy evaluation, prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. We address all these issues by developing novel analysis of TD, and introducing new algorithms, including conditional TD algorithm (CTD) and fast FD (FTD) algorithm to achieve the best-known so-far convergence rate for policy evaluation...

Abstract: Operator splitting methods for convex optimization and monotone inclusions have their roots in the solution of partial differential equations, and have since become popular in machine learning and image processing applications. Their application to "operations-research-style" optimization problems has been somewhat limited.

A notable exception is their application to stochastic programming. In a paper published in 1991, Rockafellar and Wets proposed the progressive hedging (PH) algorithm to solve large-scale convex stochastic programming problems. Although they proved the convergence of the method from first principles, it was already known to them that PH was an operator splitting method.

This talk will present a framework for convex stochastic programming and show that applying the ADMM (and thus Douglas-Rachford splitting) to it yields the PH algorithm. The equivalence of PH to ADMM has long been known but not explicitly published.

Next, the talk will apply the projective splitting framework of Combettes and Eckstein to the same formulation, yielding a method which is similar to PH but can be implemented in an aynchronous manner. We call this method "asynchronous projective hedging" (APH). Unlike most decomposition methods, it does not need to solve every subproblem at every iteration; instead, each iteration may solve just a single subproblem or a small subset of the available subproblems.

Finally, the talk will describe work integrating the APH algorithm into mpi-sppy, a Python package for modeling and distributed parallel solution of stochastic programming problems. Mpi-sppy uses the Pyomo Python-based optimization modeling sytem. Our experience includes using 8,000 processor cores to solve a test problem instance with 1,000,000 scenarios.

Portions of the work described in this talk are joint with Patrick Combettes (North Carolina State University), Jean-Paul Watson (Lawrence Livermore National Laboratory, USA), and David Woodruff (University of California, Davis).

### 2020

Abstract:

This talks reviews augmented Lagrangian type methods for solving cone convex (and hence including linearly) constrained

nonconvex composite optimization (CCC-NCO) problems and discusses their corresponding complexity results. It then describes

a more recent inner accelerated proximal inexact augmented Lagrangian (NL-IAPIAL) method for solving the CCC-NCO problem which

consists of: a) applying an inexact proximal point method to approximately solve a sequence of quadratic penalty subproblems

for increasing values of the penalty parameter, and; b) solving all the generated prox subproblems by an accelerated composite

gradient (ACG) method. It is shown that the method, started from any infeasible point $x_0$ generates a $\rho$-approximate

stationary point in ${\cal O}(\rho^{-3})$ ACG iterations. This improves upon results obtained for previous algorithms in that:

1) it allows for the presence of a composite term in the objective function; 2) the feasible region does not have to be bounded;

3) the initial point does not have to be feasible; and 4) substantially reduces the previously known iteration complexity of

${\cal O}(\rho^{-6})$.

Abstract: Deep representation learning seeks to learn a data representation that transfers to downstream tasks. In this talk, we study two forms of representation learning: supevised pre-training and self-supervised learning.

Supervised pre-training uses a large labled source dataset to learn a representation, then trains a classifier on top of the representation. We prove that supervised pre-training can pool the data from all source tasks to learn a good representation which transfers to downstream tasks with few labeled examples.

Self-supervised learning creates auxilary pretext tasks that do not require labeled data to learn representations. These pretext tasks are created solely using input features, such as predicting a missing image patch, recovering the color channels of an image, or predicting missing words. Surprisingly, predicting this known information helps in learning a representation effective for downtream tasks. We prove that under a conditional independence assumption, self-supervised learning provably learns representations.

Abstract: The quadratic Wasserstein distance has recently been proposed as an alternative to the classical $L^2$ distance for measuring data mismatch in computational inverse problems. Extensive computational evidences showing the advantages of using the Wasserstein distance has been reported. The objective of this talk is to provide some simple observations that might help us explain the numerically-observed differences between results based on the quadratic Wasserstein distance and those based on the $L^2$ distance for general linear and nonlinear inverse problems.

Abstract: We will consider the inverse problem of determining the sound speed or index of refraction of a medium by measuring the travel times of waves going through the medium. This problem arises in global seismology in an attempt to determine the inner structure of the Earth by measuring travel times of earthquakes. It has also several applications in optics and medical imaging among others.The problem can be recast as a geometric problem: Can one determine the Riemannian metric of a Riemannian manifold with boundary by measuring the distance function between boundary points? This is the boundary rigidity problem. The linearization of this problems involves the integration of a tensor along geodesics, similar to the X-ray transform. We will also describe some recent results, joint with Plamen Stefanov and Andras Vasy, on the partial data case, where you are making measurements on a subset of the boundary. No previous knowledge of Riemannian geometry will be assumed.

Abstract: Stochastic gradient and related methods for solving stochastic optimization problems have been studied extensively in recent years. In addition, it has been shown that such algorithms and much of their convergence and complexity guarantees extend in straightforward ways when one considers problems involving simple constraints, such as when one can perform projections onto the feasible region of the problem. However, settings with general nonlinear constraints have received less attention, and many of the approaches that have been proposed for solving such problems resort to using penalty or (augmented) Lagrangian methods, which are often not the most effective strategies. In this work, we propose and analyze stochastic optimization methods based on the sequential quadratic optimization (commonly known as SQP) methodology. We discuss advantages and disadvantages of such techniques. This is joint work with Albert Berahas, Daniel P. Robinson, and Baoyu Zhou.