|
Slides (see Proposal Outline)
Abstract
Detecting anomalies and events in data is a vital task, with numerous applications in security, finance, health care, law enforcement, and many others. While many techniques have been developed in past years for spotting
outliers and anomalies in unstructured collections of multi-dimensional points,
with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently.
The goal of this tutorial is to provide a general, comprehensive overview of the state-of-the-art methods for
anomaly, event, and fraud detection in data represented as graphs.
As a key contribution, we provide a thorough exploration of both data mining and machine learning algorithms for these detection tasks.
We give a general framework for the algorithms, categorized under various settings: unsupervised vs. (semi-)supervised, for static vs. dynamic data.
We focus on the scalability and effectiveness aspects of the methods, and highlight results on crucial real-world applications, including accounting fraud and opinion spam detection.
List of references
The following publications are referenced in the tutorial (categorized by each major topic).
|
Outlier and Anomaly detection
Outlier detection in clouds of multi-dimensional points:
- M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. SIGMOD, pages 93–104, 2000.
- S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using the local correlation integral. ICDE, 2003.
- C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. SIGMOD, 2001.
- A. Ghoting, S. Parthasarathy and M. Otey, Fast Mining of Distance Based Outliers in High-Dimensional Datasets. DAMI, 2008.
- Y. Wang, S. Parthasarathy and S. Tatikonda, Locality Sensitive Outlier Detection. ICDE, 2011.
- A. Ghoting, M. E. Otey, and S. Parthasarathy. LOADED: Link-based
outlier and anomaly detection in evolving data sets. ICDM, 2004.
- K. Smets and J. Vreeken. The Odd One Out: Identifying and Characterising Anomalies. SDM, 2011.
- L. Akoglu, H. Tong, J. Vreeken, and C. Faloutsos. Fast and Reliable Anomaly Detection in Categoric Data. CIKM, 2012.
- Survey: V. Chandola, A. Banerjee, V. Kumar: Anomaly Detection: A Survey. ACM Computing Surveys, Vol. 41(3), Article 15, July 2009.
Anomaly detection in graph data:
- L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted Graphs. PAKDD, 2012.
- W. Eberle and L. B. Holder. Discovering structural anomalies in graph-based data. ICDM Workshops, pages 393–398, 2007.
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. KDD,
pages 631–636, 2003.
- J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood
formation and anomaly detection in bipartite graphs. ICDM, 2005.
- Hanghang Tong, Ching-Yung Lin: Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. SDM, pages 143-153, 2011.
- N. Katenka, Q. Ding, P. Barford, E. Kolaczyk, and M. Crovella. Intrusion as (Anti)social Communication: Characterization and Detection. KDD, 2012.
- Michael Davis, Weiru Liu, Paul Miller, George Redpath: Detecting anomalies in graphs with numeric labels. CIKM 2011:1197-1202
- H. D. K. Moonesinghe and P.-N. Tan. OutRank: a graph-based outlier detection
framework using random walks. International Journal on Artificial Intelligence Tools, 17(1), 2008.
- Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, Jiawei Han: On community outliers and their efficient detection in information networks. KDD 2010: 813-822
- Manish Gupta, Jing Gao, Yizhou Sun, Jiawei Han: Integrating community matching and outlier detection for mining evolutionary community outliers. KDD 2012: 859-867
- W. Eberle and L. B. Holder. Discovering structural anomalies in graph-based data. ICDM Workshops, pages 393–398, 2007.
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. KDD,
pages 631–636, 2003.
- Fragkiskos D. Malliaros, Vasileios Megalooikonomou, Christos Faloutsos: Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly Detection. SDM 2012:942-953
- Boden B., Günnemann S., Hoffmann H., Seidl T. Mining Coherent Subgraphs in Multi-Layer Graphs with Edge Labels. KDD 2012.
|
|
Event/Outbreak, and Fraud detection
Event/Outbreak detection:
|
|
Relational Learning with networks
- P. Sen,G. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. Eliassi-Rad. Collective Classification in Network Data. AI Magazine, Special Issue on AI and Networks, 29(3):93-106, 2008.
- J. Neville and D. Jensen. Collective Classification
with
Relational Dependency Networks. KDD Workshops, 2003.
- J. Neville and D. Jensen. Iterative Classification in Relational Data. AAAI Workshops, 2000.
- S. A. Macskassy and F. Provost. A Simple Relational Classifier. KDD Workshops, 2003.
- S. Chakrabarti, B. E. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. SIGMOD, 1998.
- B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. UAI, pages 485-492, 2002.
- Yedidia, J.S.; Freeman, W.T.; Weiss, Y. Understanding Belief Propagation and Its Generalizations. Morgan Kaufmann. pp. 239–236. ISBN 1-55860-811-7. (Also see links on Wikipedia)
- D. Zhou and B. Schölkopf. Learning from Labeled and Unlabeled Data Using Random Walks. DAGM-Symposium 2004.
- A. Broder, R. Krauthgamer, and M. Mitzenmacher. Improved Classification via Connectivity Information. SODA, 2000.
- A. Blum, S. Chawla. Learning from Labeled and Unlabeled Data using Graph Mincuts. ICML, 2001.
|
Links to talks/tutorials by tutors
Contact information
Leman Akoglu
leman@cs.stonybrook.edu
Stony Brook University,
Department of Computer Science
1425 CS Bldg. Stony Brook, NY 11794 |
Christos Faloutsos
christos@cs.cmu.edu
Carnegie Mellon University,
School of Computer Science
GHC 8019 Pittsburgh, PA 15213 |
|
|