ACM WSDM 2013 Tutorial

Anomaly, Event, and Fraud Detection in Large Graph Datasets

Leman Akoglu

Stony Brook University
Department of Computer Science

Christos Faloutsos

Carnegie Mellon University
School of Computer Science

Slides (see Proposal Outline)

Part I-I. (~4MB)

Part I-II. (~2MB)

Part II. (~2MB)

Part III. (~3MB)

Abstract

Detecting anomalies and events in data is a vital task, with numerous applications in security, finance, health care, law enforcement, and many others. While many techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently.

The goal of this tutorial is to provide a general, comprehensive overview of the state-of-the-art methods for anomaly, event, and fraud detection in data represented as graphs. As a key contribution, we provide a thorough exploration of both data mining and machine learning algorithms for these detection tasks. We give a general framework for the algorithms, categorized under various settings: unsupervised vs. (semi-)supervised, for static vs. dynamic data. We focus on the scalability and effectiveness aspects of the methods, and highlight results on crucial real-world applications, including accounting fraud and opinion spam detection.

List of references

The following publications are referenced in the tutorial (categorized by each major topic).

Outlier and Anomaly detection

Outlier detection in clouds of multi-dimensional points:

M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. SIGMOD, pages 93–104, 2000.
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. ICDE, 2003.
C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD, 2001.
A. Ghoting, S. Parthasarathy and M. Otey, Fast Mining of Distance Based Outliers in High-Dimensional Datasets. DAMI, 2008.
Y. Wang, S. Parthasarathy and S. Tatikonda, Locality Sensitive Outlier Detection. ICDE, 2011.
A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-based outlier and anomaly detection in evolving data sets. ICDM, 2004.
K. Smets and J. Vreeken. The Odd One Out: Identifying and Characterising Anomalies. SDM, 2011.
L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast and Reliable Anomaly Detection in Categoric Data. CIKM, 2012.

Anomaly detection in graph data:

L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2012.
Best Research Paper Award
W. Eberle and L. B. Holder. Discovering structural anomalies in graph-based data. ICDM Workshops, pages 393–398, 2007.
C. C. Noble and D. J. Cook. Graph-based anomaly detection. KDD, pages 631–636, 2003.
J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. ICDM, 2005.
Hanghang Tong, Ching-Yung Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.
H. D. K. Moonesinghe and P.-N. Tan. OutRank: a graph-based outlier detection framework using random walks. International Journal on Artificial Intelligence Tools, 17(1), 2008.

Event/Outbreak, and Fraud detection

Event/Outbreak detection:

Ide, T. and Kashima, H., Eigenspace-Based Anomaly Detection in Computer Systems. KDD, 2004.
L. Akoglu, M. McGlohon, C. Faloutsos. Event Detection in Time Series of Mobile Communication Graphs. Army Science Conference, 2010.
Sun J., Faloutsos C., Papadimitriou S., Yu P. S. GraphScope: parameter-free mining of large time-evolving graphs. KDD, 2007.
Wong, W.-K., Moore, A., Cooper, G. and Wagner, M. What's Strange About Recent Events (WSARE): An Algorithm for the Early Detection of Disease Outbreaks. Journal of Machine Learning, 2005.
M. Van Leeuwen, A. Siebes. StreamKrimp: Detecting Change in Data Streams. ECML PKDD, 2008.
Graph-based Fraud detection:
S. Pandit, D. H. Chau, S. Wang, C. Faloutsos. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW, 2007.
M. McGlohon, S. Bay, M. G. Anderle, D. M. Steier, C. Faloutsos: SNARE: a link analytic system for graph labeling and risk detection. KDD, 2009.
L. Akoglu, R. Chandy, C. Faloutsos: Opinion Fraud Detection in Review Networks. CMU-CS-12-130, 2012.
Z. Li, H. Xiong, Y. Liu, A. Zhou. Detecting Blackhole and Volcano Patterns in Directed Networks. ICDM, pp 294-303, 2010.
G. Wang, S. Xie, B. Liu, P. S. Yu. Review Graph based Online Store Review Spammer Detection. ICDM, 2011.
J. Neville, O. Simsek, D. Jensen, J. Komoroske, K. Palmer, and H. Goldberg. Using Relational Knowledge Discovery to Prevent Securities Fraud . KDD, pp. 449–458, 2005.

Relational Learning with networks

P. Sen,G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective Classification in Network Data. AI Magazine, Special Issue on AI and Networks, 29(3):93-106, 2008.
J. Neville and D. Jensen. Collective Classification with Relational Dependency Networks. KDD Workshops, 2003.
J. Neville and D. Jensen. Iterative Classification in Relational Data. AAAI Workshops, 2000.
S. A. Macskassy and F. Provost. A Simple Relational Classifier. KDD Workshops, 2003.
S. Chakrabarti, B. E. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. SIGMOD, 1998.
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. UAI, pages 485-492, 2002.
Yedidia, J.S.; Freeman, W.T.; Weiss, Y. Understanding Belief Propagation and Its Generalizations. Morgan Kaufmann. pp. 239–236. ISBN 1-55860-811-7. (Also see links on Wikipedia)
D. Zhou and B. Schölkopf. Learning from Labeled and Unlabeled Data Using Random Walks. DAGM-Symposium 2004.
A. Broder, R. Krauthgamer, and M. Mitzenmacher. Improved Classification via Connectivity Information. SODA, 2000.
A. Blum, S. Chawla. Learning from Labeled and Unlabeled Data using Graph Mincuts. ICML, 2001.

Links to talks/tutorials by tutors

List of talks/tutorials by Christos Faloutsos
Videolectures by Christos Faloutsos
Videolecture by Leman Akoglu
Example slides for conference talks by Leman Akoglu

Contact information

Christos Faloutsos
christos@cs.cmu.edu
Carnegie Mellon University,
School of Computer Science
GHC 8019 Pittsburgh, PA 15213

Leman Akoglu
leman@cs.stonybrook.edu
Stony Brook University,
Department of Computer Science
1425 CS Bldg. Stony Brook, NY 11794