The Computer Science Colloquium series is held each semester and is sponsored by the Computer Science department. Faculty invite speakers from all areas of computer science, and the talks are open to all members of the RPI community.
The phenomenon of spreading is very broad: it can mean spreading of electromagnetic waves, wind-blown seeds, diseases or information. Spreading can also happen in various environments, from simple spatial (like waves on water) to complicated networks (like information in society). Quite often, it is evident there is a spread, but it is not directly known where it came from. In case of physical phenomena, that feature constant-velocity spreading in space, it may require simple triangulation to pinpoint the source. But if the process happens in a complex network and it also has nature so complex as to be only possible to describe in a stochastic manner (such as epidemics or information spreading) then finding a source becomes a more complicated problem.
Both epidemics and information spreading share a common property - there is no total mass, energy, count, or volume conserved, as is for example in spreading waves (energy) or diffusing substances (mass/volume). Because of that, both can be modeled in similar way, for example by a basic SI (Susceptible-Infected) model. The presentation will describe some existing methods to find sources of spread in such processes and focus on a method based on maximum likelihood introduced by Pinto et.al. as well as describe derivative methods that feature much better performance, as well as improvements in accuracy.
Assume we have a network, where information or epidemic has spread, and we only know when it arrived in some specific points in the network (which we call observers). Further assumptions are that spreading always happens along shortest paths from source to any node, and that the delays on links can be approximated by normal distribution. It is then possible for each potential source in the network (every node in general) to calculate expected arrival times as well as the variances and covariance of these times. For each node we therefore have a multivariate normal distribution of arrival times at all observers. Comparing the distributions with actual observation, we can find the source that has distribution best fitting observed arrival times.
One of drawbacks of such method is that for large number N of nodes in network, the computational costs gets prohibitive, with up to O(N4) complexity. We have proposed a derivative method that limits the considered observers, as well as smart choice of suspected nodes. We only take into account ~ observers that are closest to real source (have earliest observed time of infection), greatly decreasing computational complexity. Instead of calculating distribution for every node, we start with closest observer and follow a gradient of increasing likelihood. These changes not only greatly increase performance (complexity at worst O(N2log(N)) but also increase accuracy in scale-free network topologies.
Krzysztof Suchecki is an assistant professor at Warsaw University of Technology, Faculty of Physics. He has MSc and PhD degrees in physics. His research topics focus on dynamics of and on complex networks, such as Ising and voter model dynamics, including co-evolution of network structure and dynamical node states. His current research is focused on spreading of information in networks and methods to identify sources of information.
enabled this expansion, and how an increasing Community of researchers have made it possible.
Julian Dolby is a Research Staff Member at the IBM Thomas J. Watson Research Center, where he works on a range of topics, including static program analysis, software testing, concurrent programming models and the semantic Web. He is one of the primary authors of the publically available Watson Libraries for Analysis (WALA) program analysis infrastructure, and his recent WALA work has focused on creating the WALA Mobile infrastructure.
Complex systems typically display a modular structure, as modules are easier to assemble than the individual units of the system, and more resilient to failures. In the network representation of complex systems, modules, or communities, appear as subgraphs whose nodes have an appreciably larger probability to get connected to each other than to other nodes of the network. In this talk I will address three fundamental questions: How is community structure generated? How to detect it? How to test the performance of community detection algorithms? I will show that communities emerge naturally in growing network models favoring triadic closure, a mechanism necessary to implement for the generation of large classes of systems, like e.g. social networks. I will discuss the limits of the most popular class of clustering algorithms, those based on the optimization of a global quality function, like modularity maximization. Testing algorithms is probably the single most important issue of network community detection, as it implicitly involves the concept of community, which is still controversial. I will discuss the importance of using realistic benchmark graphs with built-in community structure, as well as the role of metadata.
The next generation of complex engineered systems will be endowed with sensors and computing capabilities that enable new design concepts and new modes of decision-making. For example, new sensing capabilities on aircraft will be exploited to assimilate data on system state, make inferences about system health, and issue predictions on future vehicle behavior---with quantified uncertainties---to support critical operational decisions. However, data alone is not sufficient to support this kind of decision-making; our approaches must exploit the synergies of physics-based predictive modeling and dynamic data. This talk describes our recent work in adaptive and multifidelity methods for optimization under uncertainty of large-scale problems in engineering design. We combine traditional projection-based model reduction methods with machine learning methods, to create data-driven adaptive reduced models. We develop multifidelity formulations to exploit a rich set of information sources, using cheap approximate models as much as possible while maintaining the quality of higher-fidelity estimates and associated guarantees of convergence."
Karen E. Willcox is Professor of Aeronautics and Astronautics at the Massachusetts Institute of Technology. She is also Co-Director of the MIT Center for Computational Engineering and formerly the Associate Head of the MIT Department of Aeronautics and Astronautics. Before joining the faculty at MIT, she worked at Boeing Phantom Works with the Blended-Wing-Body aircraft design group. Willcox is currently Co-director of the Department of Energy DiaMonD Multifaceted Mathematics Capability Center on Mathematics at the Interfaces of Data, Models, and Decisions, and she leads an Air Force MURI on optimal design of multi-physics systems. She is also active in education innovation, serving as co-Chair of the MIT Online Education Policy Initiative, co-Chair of the 2013-2014 Institute‑wide Task Force on the Future of MIT Education, and lead of the Fly-by-Wire project developing blended learning technology as part of the Department of Education's First in the World program.
A central theme in mechanism design is understanding the tradeoff between simplicity and optimality of the designed mechanism. An important and challenging task here is to design simple multi-item mechanisms that can approximately optimize the revenue, as the revenue-optimal mechanisms are known to be extremely complex and thus hard to implement. Recently, we have witnessed several breakthroughs on this front obtaining simple and approximately optimal mechanisms when the buyers have unit-demand (Chawla et. al. '10) or additive (Yao '15) valuations. Although these two settings are relatively similar, the techniques employed in these results are completely different and seemed difficult to extend to more general settings. In this talk, I will present a principled approach to design simple and approximately optimal mechanisms based on duality theory. Our approach unifies and improves both of the aforementioned results, and extends these simple mechanisms to broader settings, i.e. multiple buyers with XOS valuations.
Based on joint work with Nikhil R. Devanur, Matt Weinberg and Mingfei Zhao.
Yang Cai is a William Dawson Assistant Professor of Computer Science at McGill University. He received his PhD in computer science from MIT, advised by Costis Daskalakis. He was a postdoctoral researcher in UC Berkeley. Yang’s research interests lie in the area of theoretical computer science, in particular algorithmic game theory, online algorithms and logic.
Abstract: Although the concept of ransomware is not new (i.e., such attacks date back at least as far as the 1980's), this type of malware has recently experienced a resurgence in popularity. In fact, in 2014 and 2015, a number of high-profile ransomware attacks were reported, such as the large-scale attack against Sony that prompted the company to delay the release of the film, "The Interview". Ransomware typically operates by locking the desktop of the victim to render the system inaccessible to the user, or by encrypting, overwriting, or deleting the user's files. However, while many generic malware detection systems have been proposed, none of these systems have attempted to specifically address the ransomware detection problem. In this keynote, I talk about some of the trends we are seeing in ransomware. Then, I present a novel dynamic analysis system called UNVEIL that is specifically designed to detect ransomware. The key insight of the analysis is that in order to mount a successful attack, ransomware must tamper with a user's files or desktop. UNVEIL automatically generates an artificial user environment, and detects when ransomware interacts with user data. In parallel, the approach tracks changes to the system's desktop that indicate ransomware-like behavior. Our evaluation shows that UNVEIL significantly improves the state of the art, and is able to identify previously unknown evasive ransomware that was not detected in the anti-malware industry.
Bio: Engin Kirda holds the post of professor of computer science at Northeastern University in Boston. Before that, he held faculty positions at Institute Eurecom in the French Riviera and the Technical University of Vienna, where he co-founded the Secure Systems Lab that is now distributed over five institutions in Europe and the United States. Professor Kirda's research has focused on malware analysis (e.g., Anubis, Exposure, and Fire) and detection, web application security, and practical aspects of social networking security. He co-authored more than 100 peer-reviewed scholarly publications and served on the program committees of numerous well-known international conferences and workshops. Professor Kirda was the program chair of the International Symposium on Recent Advances in Intrusion Detection (RAID) in 2009, the program chair of the European Workshop on Systems Security (Eurosec) in 2010 and 2011, the program chair of the well-known USENIX Workshop on Large Scale Exploits and Emergent Threats in 2012, and the program chair of security flagship conference Network and Distributed System Security Symposium (NDSS) in 2015. In the past, Professor Kirda has consulted the European Commission on emerging threats, and recently gave a Congressional Breifing in Washington, D.C. on advanced malware attacks and cyber-security. He also spoke at SXSW Interactive 2015 about "Malware in the Wild" and at Blackhat 2015. Besides his roles at Northeastern, Professor Kirda is a co-founder of Lastline Inc., a Silicon-Valley based company that specializes in the detection and prevention of advanced targeted malware.
Abstract: Driven by modern applications and the abundance of empirical network data, network science aims at analysing real-world complex systems arising in the social, biological and physical sciences by abstracting them into networks (or graphs). The size and complexity of real networks has produced a deep change in the way that graphs are approached, and led to the development of new mathematical and computational tools. In this talk, I will present a data-driven network-based methodology to approach data analysis problems arising in a variety of contexts while highlighting state of the art network models including spacial, multilayer, interdependant and modular networks. I will describe different stages in the analysis of data starting from (1) network representations of urban transportation systems, then (2) inference of structural patterns in networks and phase transitions in their detectability, and finally (3) understanding the implications of structural features to dynamical processes taking place on networks. I will conclude with a discussion of the future directions of my work and its intersection with complexity theory, machine learning and statistics.
Bio: I received my Bachelors of Science degree, cum laude, in Mathematics and Computer Science (double degree) from the Israel Institute of Technology (Technion) in 2008. I then shifted my focus towards acquiring Industrial experience and worked as a Software Engineer at Diligent Technologies, an Israel startup that was acquired by IBM. I completed my PhD in Computer Science at the University of St Andrews, UK under the guidance of Simon Dobson in 2014. During my PhD I was supported bya full prize scholarship from the Scottish Informatics and COmputer Science Alliance (SICSA). In 2014 I joined Peter Mucha's group in the Department of Mathematics at the University of North Carolina at Chapel Hill as a Postdoctoral Scholar. My research has been focused on the development of mathematical and computational tools for modeling and analyzing complex systems, roughly defined as large networks of simple components with no central control that give rise emergent complex behavior. I believe that looking at data through a "network lens" provides us with useful perspectives on diverse problems, from designing optimal transportation systems and smart cities to clustering high-demensional data. I am constantly looking for new datasets on which I can apply my "network machinery" to solve real-world problems and to inspire the development of new methodologies.
Abstract: Thanks to Information Technologies, online user behaviors are broadly recorded at an unprecedented level. This gives us an opportunity for getting insights into human behaviors and our societies from real data of large scale enough that makes manual analysis and inspection completely impractical in building intelligent and trustworthy systems. In this talk I will discuss about research problems, challenges, priciples, and methodologies of developing network-based computational models for behavioral analysis. Specifically, I will present recent approaches on (1) modeling user behavior intentions with knowledge from social and behavioral sciences for behavior prediction, recommendation, and suspicious behavioral detection, (2) modeling social and spatiotemporal information for knowledge from behavioral contexts, (3) Structuring behavioral content into information networks of entities and attributes, and (4) integrating structured and unstructured behavior data to support decision-making and information systems. Two results, CatchSync and MetaPAD, will be presented in details. I will conclude with my thoughts on future directions.
Bio: Dr. Meng Jiang is a postdoctoral researcher in University of Illinois at Urbana-Champaign. He received his Ph.D. from the Department of Computer Science at Tsinghua University in 2015. He obtained his bachelor degree from the same department in 2010. He visited Carnegie Mellon University in 2013 and vistited University of Maryland, College Park in 2016. Find more about him here: http://www.meng-jiang.com.
His research lies in the field of data mining, focusing on user behavior modeling. He has delivered two book chapters and two conference tutorials on this topic. His Ph.D. thesis won the Dissertation Award at Tsinghua. His work on "Suspicious Behavior Detection" was selected as one of the Best Paper Finalists in KDD '14. His work on "Social Contextual Recommendation" has been deployed in the Tencent social network. The package of his work on "Attribute Discovery from Text Corpora" is transferring to U.S. Army Research Lab. He has published 20 papers with 560+ citations, including 15 papers with 360+ citations as the first author.
Abstract: Complex systems exist in almost every aspect of science & technology. My research question focuses on how to understand, predict, control, and ultimately survive real-world complex systems in an ever-changing world facing the global challenges of climate change, weather extremes, and other natural and human distasters. I will present three recent works in the field of network science and complex systems: resilience, robustness, and control. (I) Resilience, a system's ability to adjust its activity to retain its basic functionality when errors and environmental changes occur, is a defining property of many complex systems. I will show a set of analytical tools with which to identify the natural control and state parameters of a multi-dementional complex system, helping us derive an effective one-dimensional dynamics that accurately predicts the system's resilience. The analytical results unveil the network characteristics that can enhance or diminish resilience, offering ways to prevent the collapse of ecological, biological, or economic systems, and guiding the design of technological systems resilient to both internal failures and environmental changes. (II) Increasing evidence shows that real-world systems interact with one another, and the real goal in network science shouldn't just understand individual networks, but deciphering the dynamical interactions in networks of networks (NONs). Malfunction of a few nodes in one network layer can cause cascading failures and catastrophic collapse of the entire system. I will show the general theoretical framework for analyzing the robustness of and cascading failures in NONs. The results of NONs have been surprisingly rich, and they differ from those of single networks that they present a new paradigm. (III) Controlling complex networks is the ultimate goal of understanding the dynamics of them. I will present a k-walk theory and greedy algorithm for target control of complex networks. Extending from the three aspects of current research, I will describe the two future research directions: universality of competitive dynamics, and control and recovery of damaged complex systems.
Bio: Dr. Jianxi Gao is a research assistant professor in the Center for Complex Network Research at Northeastern University. Dr. Gao received his Ph.D. degree at Shanghai Jiao Tong University from 2008 - 2012. During his Ph.D. he was a visiting scholar at Prof. H. Eugene Stanley's lab at Boston University from 2009 - 2012. Dr. Gao's major contribution includes the theory for robustness of networks of networks and resilience of complex networks. Since 2010, Dr. Gao has published over 20 journal papers in Nature, Nature Physics, Nature COmmunications, Proceedings of the National Academy of Sciences, Physical Review Letters and more, with over 15 hundred citations on Google SCholar. Dr. Gao has been selected as the Editor board of Nature Scientific Reports, distinguished referee of EPL (2014-2016), and referee of Science, PNAS, PRL, PRX and more. His publications were reported over 20 times by international public and professional media.
Abstract: Whether the people we follow on Twitter can be recommended as our potential friends on Facebook? How is the box office that US movies can acheive in China? How do weather and nearby points-of-interest (POIs) affect the traffic routes planned for vehicles? About the same information entities, a large amount of information can be collected from various sources, each of which provides a specific signature of the entity from a unique underlying aspect. Effective fusion of these different information sources provides an opportunity for understanding the information entities more comprehensively.
My thesis works investigate the priciples, methodologies and algorithms for knowledge discovery across multiple aligned information sources, and evaluate the corresponding benefits. Fusing and mining multiple information sources of large Volumes and diverse Varieties is a fundamental problem in Big Data studies. In this talk, I will discuss about the information fusion and synergistic knowledge discovery works, focusing on online social media, and present my algorithmic works on multi-source learning frameworks together with the evaluation results. I will also provide my future vision of fusion learning for broader real-world applications at the conclusion of the talk.
Bio: Jiawei Zhang is a Ph.D. candidate at the Department of Computer Science at University of Illinois at Chicago (UIC), under the supervision of Prof. Philip S. Yu since August 2012. Prior to joining UIC, he obtained his Bashelor's Degree in Computer Science from Nanjing University, in China. His research interests span the fields of Data Science, Data Mining, Network Mining, and Machine Learning. His research works focus on fusing multiple large-scale information sources of diverse varieties together, and carrying out synergistic data mining tasks across these fused sources in one unified alalytic.
His fusion learning works have appeared in KDD, ICDM, SDM, ECLM/PKDD, IJCAI, WWW, WSDM, CIKM, IEEE Transactions on Knowledge and Data Engineering (TKDE). He receives the Best Student Paper Runner Up Award from ASONAM' 16. He has been serving as the information specialist and director of ACM Transaction on KNowledge Discovery Data (TKDD) since August 2014. His is also the PC member of WWW' 17, KDD' 16, CIKM' 16, CIKM' 15, and AIRS' 16. Besides the academic experience at University Illinois Chicago, he also has industrial research experiences working at Microsoft Research in 2014, IBM T.J. Watson Research Center in 2015.
The prediction and control of the dynamics of networked systems is one of the central problems in network science. Structural and dynamical similarities of different real networks suggest that some universal laws might accurately describe the dynamics of these networks, though the nature and common origins of such laws remain elusive. Do these universal laws exist? We do not have the answer to this question...yet. I will talk about the latent geometry approach to network systems, which, in my opinion, could be a first step toward the formulation of universal laws of network dynamics. In this approach, networks underlying complex systems are viewed as discretizations of smooth geometric splaces. Network nodes are points in these spaces and the probibility of a connection between them. I will start my talk with a motivation and a high level introduction to the latent geometry concept. I will continue with a (semi) rigorous discussion of the mathematics underlying the approach and computational algorithms for uncovering latent geometries of real systems. I will conclude my talk by describing existing and prospective applications of the latent geometry, including Internet interdomain routing, large-scale dynamics of networked systems, human diseases and social dynamics.
Bio: Dr. Kitsak is an associate research scientist in the Department of Physics and the Network Science Institute at Northeastern University. Dr. Kitsak earned Ph.D. in theoreticsl physics from Boston University in 2009 under the direction of Prof. H.E. Stanley. Dr. Kitsak has held postdoctoral positions at the Center for Applied Internet Data Analysis (CAIDA), UC San Diego (2009-2012); and the Center for Complex Network Research (CCNR), Northeastern University (2012-2014). His research focuses on the development of theoretical and computational approaches to networked systems.
Abstract: Probabilistic reasoning lets you predict the future, infer past causes of current observations, and learn from experience. It can be hard to implement a probabilistic application because you have to implement the representation, inference, and learning algorithms. Probabilistic programming makes this much easier by providing an expressive language to represent models as well as inference and learning algorithms that automatically apply to models written in the language. In this talk, I will present the past, present, and future of probabilistic programming and our Figaro probabilistic programming system. I will start with the motivation for probabilistic programming and Figaro. After presenting some basic Figaro concepts, I will introduce several applications we have been developing at Charles River Analytics using Figaro. Finally, I will describe our future vision of providing a probabilistic programming tool that domain experts with no machine learning knowledge can use. In particular, I will present a new inference method that is designed to work well on a wide variety of problems with no user configuration. Prior knowledge of machine learning is not required to follow this talk.
Bio: Dr. Avi Pfeffer is Chief Scientist at Charles River Analytics. Dr. Pfeffer is a leading researcher on a variety of computational intelligence techniques including probabilistic reasoning, machine learning, and computational game theory. Dr. Pfeffer has developed numerous innovative probabilistic representation and reasoning frameworks, such as probabilistic programming, which enables the development of probabilistic models using the full power of programming languages, and statistical relational reasoning. He is the lead developer of Charles River Analyics' Figaro probabilistic programming language. As an Associate Professor at Harvard, he developed IBAL, the first general-purpose probabilistic programming language. While at Harvard, he also produced systems for representing, reasoning about, and learning beliefs, preferences, and decision making strategies of people in strategic situations. Prior to joining Harvard, he invented object-oriented Bayesian networks and probabilistic relational models, which form the foundation of the field of statistical relational learning. Dr. Pfeffer serves as Program Chair of Conference on Uncertainty in Artificial Intelligence. He has published many journal and conference articles and is the author of a text on probabilistic programming. Dr. Pfeffer received his Ph.D. in computer science from Stanford University and his B.A. in computer science from the University of California, Berkeley.
With great progress made in the areas of Cognitive Computing and Human-scale Sensory Research, the paradigm of human computer interaction will be a collaboration between human beings and intelligent machines through a human-scale and immersive experience. What research challenges exists today to enable this new paradigm, how it will help human beings (especially groups of people), what transformation it will bring to us, are all important to us. In this talk, Dr. Su is going to introduce the vision of future Cognitive and Immersive Situations Room - an immersive, interactive, intelligent physical environment, and the research agenda to realize the vision.
Bio: Dr. Hui Su is the Director of Cognitive and Immersive Systems Lab, a collaboration between IBM Research and Rensselaer Polytechnic Institute. He has been a technical leader and an executive at IBM Research. Most recently, he was the Director of IBM Cambridge Research Lab in Cambridge, MA and was responsible for a broad scope of global missions in IBM Research, including Cognitive User Experience, Center for Innovation in Visual Analytics and Center for Social Business. As a technical leader and a researcher for 20 years at IBM Research, Dr. Su has been an expert in multiple areas ranging from Human Computer Interaction, Cloud Computing, Visual Analytics, and Neural Network Algorithms for Image Recognition etc. As an executive, he has been leading research labs and research teams in the U.S. and China.
Dimensionality reduction methods have been used to represent words with vectors in NLP applications since at least the introduction of latent semantic indexing in the late 1980s, but word embeddings developed in the past several years have exhibited a robust ability to map semantics in a surprisingly straightforward manner onto simple linear algebraic operations. These embeddings are trained on cooccurrence statistics and intuitively justified by appealing to the distributional hypothesis of Harris and Firth, but are typically presented in an ad-hoc algorithmic manner. We consider the canonical skip-gram Word2vec embedding, arguably the most well-known of these recent word embeddings, and establish a corresponding generative model that maps the composition of words onto the addition of their embeddings. By virtue of, first, the fact that words can be replaced more generally with arbitrary symbols and, second, natural connections between Word2vec, the classical RC(M) association model for contingency tables, Bayesian PCA decompositions, Poisson matrix completion, and the Sufficient Dimensionality Reduction model of Globerson and Tishby, our results are meaningful in a broad context.