## Seminar Information

Seminar Description: In this seminar, we study a number of approaches for determining the relative importance of nodes in networks. The problem of relative relevance or importance in network and graphical data appears in a very broad spectrum of applications, such as: (1) Given a directed graph of web pages linking to other web pages, which web page is the most "relevant" given a certain keyword query? (2) Given a social network with undirected friendships between users: Which user in the network is the most "similar" to a chosen user? (3) Given an interaction network of users on eBay with a subset of nodes identified as fraudsters (labeled training data), which are the most likely other users that are fraudsters too based on common interaction patterns? (4) Given an electrical network and assuming independent failure of edges: which node from a chosen subset is the one that is most likely reachable (i.e. connected) to the source set ("source-terminal reliability")?Our goal is to understand the connections between seemingly unconnected scenarios and algorithms by studying the workings, assumptions, and limitations of a few selected approached in depth, then trying to find common properties and differences. In doing so, our analysis and discussions will be guided by questions, such as:

- What is a good measure of "relevance" or "similarity?"
- When or why does one approach work better (what is the right metric) than another?
- What are some conditions under which one model can be transferred into another model?
- Why can some methods be evaluated in polynomial time while others are computationally hard?

Recommended Background Knowledge: The seminar is targeted towards PhD students with a good understanding of the following topics (or willingness to get up to speed during the seminar). We will, at time, read sections of these books:

- Linear Algebra

- [Lay 2011]: Lay: Linear algebra and its applications (book, 4th ed, 2011): excellent book!

- Probability Theory

- Fundamentals of network theory

- [Newman 2010]: Ch 6 and 7 of "Newman: Networks - An Introduction (book, 2010)"

- Graphical models

- [Bishop 2006]: Ch 8 of "Bishop: Pattern Recognition and Machine Learning (book, 2006)": Graphical models (free)
- [Koller,F 2009]: Koller, Friedman: Probabilistic graphical models: principles and techniques (book, 2009)
- [Murphy 2012]: Murphy: Machine Learning - a Probabilistic Perspective (book, 2012) (free)

- Complexity theory

- Ch 8 of "Dasgupta, Papadimitriou, Vazirani: Algorithms (book, 2006)": excellent free online book!
- Ch 29 of "Jeff Erickson' Algorithm class": excellent free online class material

Instructor: Wolfgang Gatterbauer

Assistant Professor of Business Technologies, Tepper School of Business, CMU.

Courtesy Assistant Professor, School of Computer Science, CMU.

Email:

Office: Tepper 354

Office Hours: via email

## Seminar Requirements

1. Presentations and in-class contributions (50%): We will study the workings and predictions of several different approaches. Everybody will read the papers upfront, and analyze the methods on suggested example graphs in the manner they find most appropriate (e.g., simulation in Matlab, or handwritten drawings). Students are given 20-30min at the beginning of each class to discuss their findings with another student that is randomly assigned by the instructor. Each team is expected to be able present their findings to the class (ideally on the overhead projector with handwritten slides). One pair of students are announced in the previous class and is in lead of the topic for each class. The lead team is expected to understand the paper more thoroughly then everybody else. Material from each students is handed in each class. Important: you do not need to hand in digitally prepared slides, often drawing by pen on paper together with your assigned partner is far more effective. Goal is to understand the paper, and to help others understand your insights, not to produce fancy slides. Thus come with your pen and lots of paper. We may also possibly use Baiboard for collaborative drawing in class by students and instructors (to be determined). Here are some guiding questions:- What is the paper's setting? How does it differ from other papers read so far?
- What are the explicit and implicit assumptions made by the paper? Why this assumptions? Are they justified or not?
- What is the abstract problem the paper tries to solve? How does it differ from other papers read so far?
- What are the main strengths and weaknesses of this paper? What prevented the authors from addressing the weaknesses?
- What are the main technical contributions? Where does the paper fit in the related literature?
- How does the method work and what does it predict on a specific illustrative graph?
- How does the method generalize to other possible applications? Is there any relationship to empirical work? Are there testable implications?
- What are the conclusions?
- What did you learn from the paper?
- What are some possible follow-questions to be resolved?
- What are some other questions for additional discussion to which you may not know the answer yourself?

2. Written research proposal or small research project with presentation. (50%): Minimum 1000 words and 3 drawings, maximum 5000 words (no limit on drawings). Students will hand in and present their research proposals on the last day of class. The discussion will be very interactive, i.e. questions will be posed during presentation, not only after.

- 2a) Methods option (mainly targeted towards CS and ACO students)
- Deals with a method / approach / algorithm that is motivated by or related to the class material.
- Reviews, critiques and compares the relevant literature,
- Proposes one or more pertinent and interesting research questions,
- Communicates the (potential) contributions from studying these questions and their relevance both to research and practice,
- Presents a possible approach, along with any developed model/method or analysis, and
- Concludes with a summary and discussion of the main ideas and/or results and/or possible extensions.

- 2b) Applied option (mainly targeted towards more applied fields, e.g., Finance or OM)
- Deals with a topic that is motivated by the students' discipline and data set of interest, yet still related to our class topic.
- Motivates the relevance of the problem and reviews the current state-of the art
- Adapts and compares one or several algorithms to this real-world application
- Evaluates empirically how well the approach works, and why or why not.
- Concludes with a summary and discussion of the main ideas and/or results and/or possible extensions.