Publications

(see also: papers on Google Scholar)
2024
The calibration gap between model and human confidence in large language models [ PDF ]
M. Steyvers, H. Tejeda, A. Kumar, C. Belem, S. Karny, X. Hu, L. Mayer, P. Smyth
arXiv

Perceptions of linguistic uncertainty by language models and humans [ PDF ]
C. Belem, M. Kelly, M. Steyvers, S. Singh, and P. Smyth
Proceedings of the 2024 Empirical Methods in Natural Language Processing Conference (EMNLP 2024)

Benchmark data repositories for better benchmarking [ PDF ]
R. Longjohn, M. Kelly, S. Singh, P. Smyth
38th Neural Information Processing Systems Conference (NeurIPS 2024)

Dynamic conditional optimal transport through simulation-free flows [ PDF (arXiv preprint) ]
G. Kerrigan, G. Migliorini, P. Smyth
38th Neural Information Processing Systems Conference (NeurIPS 2024)

Functional flow matching [ PDF ] [ Code]
G. Kerrigan, G. Migliorini, P. Smyth
27th International Conference on AI and Statistics (Outstanding Student Paper Award)

Probabilistic modeling for sequences of sets in continuous time [ PDF ] [ Code]
Y. Chang, A. Boyd, P. Smyth
27th International Conference on AI and Statistics

Bayesian online learning for consensus prediction [ PDF ] [ Code ]
S. Showalter, A. Boyd, P. Smyth, M. Steyvers
27th International Conference on AI and Statistics

Likelihood ratios for changepoints in categorical event data with applications in digital forensics [ PDF ]
R. Longjohn, P. Smyth
Journal of Forensic Sciences

2023
Capturing humans mental models of AI: an item response theory approach [ PDF ] [ Code/data ]
M. Kelly, A. Kumar, P. Smyth, M. Steyvers
2023 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)

Fair survival time prediction via mutual information minimization [ PDF ]
H. Do, Y. Chang, Y. S. Cho, P. Smyth, J. Zhong
Proceedings of the Machine Learning for Healthcare Conference (MLHC 2023)

Diffusion generative models in infinite dimensions [ Paper ] [ Code ]
G. Kerrigan, J. Ley, P. Smyth
26th International Conference on AI and Statistics (AIStats 2023)

Probabilistic querying of continuous-time event sequences [ Paper ] [ Code ]
A. Boyd, Y. Chang, S. Mandt, P. Smyth
26th International Conference on AI and Statistics (AIStats 2023)

Variable-based calibration for machine learning classifiers [ PDF ] [ Code ]
M. Kelly, P. Smyth
37th AAAI Conference on Artificial Intelligence (AAAI 2023)

Inference for mark-censored temporal point processes [ PDF ]
A. Boyd, Y. Chang, S. Mandt, P. Smyth
39th Conference on Uncertainty in AI (UAI 2023)

Deep anomaly detection under labeling budget constraints [ PDF ]
A. Li, C. Qiu, M. Kloft, P. Smyth, S. Mandt, M. Rudolph
40th International Conference on Machine Learning (ICML 2023)

A cell-level discriminative neural network model for diagnosis of blood cancers [ PDF ]
E. Robles, Y. Jin, P. Smyth, et al
Bioinformatics, 39:10

Climate-driven changes in the predictability of seasonal precipitation [ PDF ]
P. Le, J. T. Randerson, R. Willett, S. Wright, P. Smyth, C. Guilloteau, A. Mamalakis, E. Foufoula-Georgiou
Nature Communications, 14:3822, 2023

A brief tour of deep learning from a statistical perspective [ Journal link ]
E. Nalisnick, P. Smyth, D. Tran
Annual Review of Statistics and its Application, 2023

Differentiating mental models of self and others: a hierarchical framework for knowledge assessment [ PDF ]
A. Kumar, P. Smyth, M. Steyvers
Psychological Review, to appear

2022
Predictive querying for autoregressive neural sequence models [ PDF ] [ Code ]
A. Boyd, S. Showalter, S. Mandt, P. Smyth
Neural Information Processing Systems 35 (NeurIPS 2022)

AI-assisted decision-making: a cognitive modeling approach to infer latent reliance strategies [ PDF ]
H. Tejeda, A. Kumar, P. Smyth, M. Steyvers
Computational Brain and Behavior, Oct 2022

Likelihood ratios for categorical count data with applications in digital forensics [ Journal link]
R. Longjohn, P. Smyth, H. Stern
Law, Probability, and Risk

Fair generalized linear models with a convex penalty [ PDF ]
H. Do, P. Putzel, A. Martin, P. Smyth, J. Zhong
International Conference on Machine Learning (ICML 2022)

Bayesian modeling of human-AI complementarity [ Journal link ]
M. Steyvers, H. Tejeda, G. Kerrigan, P. Smyth
Proceedings of the National Academy of Sciences

A joint fairness model with applications to risk predictions for underrepresented populations [ Journal link ]
H. Do, S. Nandi, P. Putzel, P. Smyth, J. Zhong
Biometrics

California wildfire spread derived using VIIRS satellite observations and an object-based tracking system [ Journal link ]
Y. Chen, S. Hatnson, N. Andela, S. Coffield, C. Graff, D. Morton, L. Ott, E. Foufoula-Georgiou, P. Smyth, M. Goulden, J. Randerson
Scientific Data

2021
Combining human predictions with model probabilities via confusion matrices and calibrationj [ PDF ]
G. Kerrigan, M. Steyvers, P. Smyth
Neural Information Processing Systems 34 (NeurIPS 2021)

Detecting and adapting to irregular distribution shifts in Bayesian online learning [ PDF ]
A. Li, A. Boyd, P. Smyth, S. Mandt
Neural Information Processing Systems 34 (NeurIPS 2021)

Active Bayesian assessment of black-box classifiers [ PDF ]
D. Ji, R. Logan, P. Smyth, and M. Steyvers
35th AAAI Conference on Artificial Intelligence (AAAI 2021)

Zonally contrasting shifts of the tropical rain belt in response to climate change [ Journal link ]
A. Mamalakis, J. T. Randerson, J-Y Yu, M. S. Pritchard, G. Magnusdottir, P. Smyth, P. A. Levine, S. Yu, E. Foufoula-Georgiou
Nature Climate Change, January 2021

Graph-guided regularized regression of Pacific Ocean climate variables to increase predictive skill of southwestern US winter precipitation [ Journal link ]
A. Stevens, R. Willett, A. Mamalakis, E. Foufoula-Georgiou, A. Tejedor, J. T. Randerson, P. Smyth, S. Wright
Journal of Climate, January 2021

2020
Can I trust my fairness metric? Assessing fairness with unlabeled data and Bayesian inference [ PDF ]
D. Ji, P. Smyth, and M. Steyvers
Neural Information Processing Systems 33 (NeurIPS 2020)

User-dependent neural sequence models for continuous-time event data [ PDF ]
A. Boyd, R. Bamler, S. Mandt, and P. Smyth
Neural Information Processing Systems 33 (NeurIPS 2020)

Forecasting global fire emissions on sub-seasonal to seasonal (S2S) timescales [ Journal link ]
Y. Chen, J. T. Randerson, S. R. Coffield, E. Foufoula-Georgiou, P. Smyth, C. A. Graff, D. C. Morton, N. Andela, G. R. van der Werf, L. Giglio, L. E. Ott
Journal of Advances in Modeling Earth Systems, 12(9), 2020

Forecasting daily wildfire activity using Poisson regression [ Journal link ]
C. A. Graff, S. R. Coffield, Y. Chen, E. Foufoula-Georgiou, J. T. Randerson, P. Smyth
IEEE Transactions on Geoscience and Remote Sensing, 58(7):4837--4851, 2020.

Quantifying the association between discrete event time series with applications to digital forensics [ Online journal version ]
C. Galbraith, P. Smyth, and H. Stern
Journal of the Royal Statistical Society A, 183(3):1005--1027, 2020

2019
Dropout as a structured shrinkage prior [ Link to proceedings ]
E. Nalisnick, J. M. Hernandez-Lobato, P. Smyth
Proceedings of 36th International Conference on Machine Learning (ICML)

Bayesian evaluation of black-box classifiers [ PDF ]
D. Ji, R. Logan, P. Smyth, M. Steyvers
ICML Workshop on Uncertainty and Robustness in Deep Learning

Detecting conversation topics in primary care office visits from transcripts of patient-provider interactions [ Link to journal ]
J. Park, D. Kotzias, P. Kuo, R. Logan, K. Merced, S. Singh, M. Tanana, E. Karra Taniskidou, J. Elston Lafata, D. Atkins, M. Tai-Seale, Z. Imel, P. Smyth
Journal of the American Medical Informatics Association (JAMIA)

Machine learning of discriminative gate locations for clinical diagnosis [link to journal]
D. Ji, P. Putzel, Y. Qian, I. Chang, A. Mandava, R. H. Scheuermann, J. D. Bui, H-Y Wang, P. Smyth
Cytometry A

Machine learning to predict final fire size at the time of ignition [ PDF ]
S. Coffield, C. Graff, Y. Chen, P. Smyth, E. Foufoula-Georgiou, J. Randerson
International Journal of Wildland Fire

2018
Bayesian trees for automated cytometry analysis [ PDF ]
D. Ji, E. Nalisnick, Y. Qian, R. H. Scheuermann, and P. Smyth
Proceedings of 3rd Machine Learning for Healthcare Conference

Learning priors for invariance [ PDF ]
E. Nalisnick and P. Smyth
Proceedings of 21st International Conference on AI and Statistics

Understanding student procrastination via mixture models [ PDF ]
J. Park, R. Yu, F. Rodriguez, R. Baker, P. Smyth, and M. Warschauer
Proceedings of 11th International Conference on Educational Data Mining (Best Paper Award)

Prediction of sparse user-item consumption rates with zero-inflated Poisson regression [ PDF ]
M. Lichman and P. Smyth
Proceedings of WWW 2018 Conference

Predicting consumption patterns with repeated and novel events [ Link to journal ]
D. Kotzias, M. Lichman, and P. Smyth
IEEE Transactions on Knowledge and Data Engineering

Using social media to measure temporal ambient population: does it explain local crime rates? [ Link to journal ]
J.R. Hipp, C. Bates, M. Lichman, and P. Smyth
Justice Quarterly

2017
Science and data science (PNAS perspective article) [ link to PNAS ]
D. Blei and P. Smyth
Proceedings of the National Academy of Sciences , 2017

Bayesian non-homogeneous Markov models via Polya-Gamma data augmentation with applications to rainfall modeling [ ArXiv link ]
T. Holsclaw, A. M. Greene, A. W. Robertson, and P. Smyth
Annals of Applied Statistics, 2017

Learning approximately objective priors [ ArXiv link ]
E. Nalisnick and P. Smyth

Analyzing user-event data using scored-based likelihood ratios with marked point processes [ link to journal ]
C. Galbraith and P. Smyth

Stick-breaking variational autoencoders [ ArXiv link ]
E. Nalisnick and P. Smyth

Detecting changes in student behavior from clickstream data [ ACM Proceedings link ]
J. Park, K. Denaro, F. Rodriguez, P. Smyth, M. Warschauer
Proceedings of Learning Analytics and Knowledge Conference (LAK 2017) , March 2017
2016
Bayesian detection of changepoints in finite-state Markov chains for multiple sequences
P. Arnesen, T. Holsclaw, and P. Smyth
Technometrics , doi:10.1080/00401706.2015.1044118, 58(2), 205-213, 2016
Personalized location models with adaptive mixtures
M. Lichman, D. Kotzias, and P. Smyth
Proceedings of the ACM SIGSPATIAL Conference, October 2016
Daily states of the March-April east Pacific ITCZ in three decades of high-resolution satellite data
C. Haffke, G. Magnusdottir, D. Henke, P. Smyth, and Y. Peings
Journal of Climate , doi:10.1175/JCLI-D-15-0224.1, 29(8), 2981-2995, 2016
A Bayesian hidden Markov model of daily precipitation over South and East Asia
T. Holsclaw, A. M. Greene, A. W. Robertson, and P. Smyth
Journal of Hydrometeorology , doi:10.1175/JHM-D-14-0142.1, 17(1):3-25, 2016
Analyzing NIH funding patterns over time with statistical text analysis
J. Park, M. Blume-Kohout, R. Krestel, E. Nalisnick, P. Smyth
Thirtieth AAAI Conference: Proceedings of Workshop on Scholarly Big Data , 2016
2015
Content coding of psychotherapy transcripts using labeled topic models
G. Gaut, M. Steyvers, Z. E. Imel, D. C. Atkins, and P. Smyth
IEEE Journal of Biomedical and Health Informatics , November 2015
From group to individual labels using deep features
D. Kotzias, M. Denil, N. De Freitas, and P. Smyth
Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015)
Modeling response time in digital human communication
N. Navaroli and P. Smyth
Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2015)
Hot swapping for online adaptation of optimization hyperparameters
K. Bache, P. Smyth,and D. DeCoste
International Conference on Learning Representations (ICLR 2015)
Measurement error and outcome distributions: methodological issues in regression analyses of behavioral coding data
T., Holsclaw, K. A. Hallgren, M. Steyvers, P. Smyth, and D. C. Atkins
Psychology of Addictive Behaviors doi:10.1037/adb0000091, 29(4):1031-1040, 2015
2014
Annealing paths for the evaluation of topic models
J. Foulds and P. Smyth
Proceedings of 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014)
Modeling human location data with mixtures of kernel densities
M. Lichman and P. Smyth
Proceedings of 20th ACM SIGKDD Conference (KDD 2014))
Approximate slice sampling for Bayesian posterior inference
C. DuBois, A. Korattikora, M. Welling, P. Smyth
Proceedings of the 17th International Conference on AI and Statistics
Beyond MAP estimation with the track-oriented multiple hypothesis tracker
A. Frank, P. Smyth, and A. Ihler
IEEE Transactions on Signal Processing
Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification
D. C. Atkins, M. Steyvers, Z. E. Imel, and P. Smyth
Implementation Science
The co-factor of LIM domains (CLIM/LDB/NLI) maintains basal mammary epithelial stem cells and promotes breast tumorigenesis
M. Salmans, Z. Yu, K. Watanabe, E. Cam, P. Sun, P. Smyth, X. Dai, B. Andersen
PLOS Genetics
2013
Modeling scientific impact with topical influence regression
J. Foulds and P. Smyth
Proceedings of the Empirical Methods in Natural Language Processing Conference (EMNLP 2013), October 2013.
Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation
J. Foulds, L. Boyles, C. DuBois, P. Smyth, M. Welling
Proceedings of the 19th ACM SIGKDD Conference, August 2013.
Text-based measures of document diversity
K. Bache, D. Newman, P. Smyth
Proceedings of the 19th ACM SIGKDD Conference, August 2013.
Recommending patents based on latent topics
R. Krestel, P. Smyth
Proceedings of the 7th ACM Recommender System Conference (RecSys), October 2013.
Hierarchical models for relational event sequences
C. DuBois, C. T. Butts, D. McFarland, P. Smyth
Journal of Mathematical Psychology, May 2013.
Stochastic blockmodeling of relational event dynamics
C. DuBois, C. T. Butts, P. Smyth
16th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2013.
Modeling individual email patterns over time with latent variable models
N. Navaroli, C. DuBois, P. Smyth
Machine Learning, May 2013. (extended version of ACML 2012 paper)
Windows into relational events: data structures for contiguous subsequences of edges
M. J. Bannister, C. DuBois, D. Eppstein, P. Smyth
ACM-SIAM Symposium on Discrete Algorithms (SODA13), January 2013.
2012
Statistical topic models for multi-label document classification
T. N. Rubin, A. Chambers, P. Smyth, M. Steyvers
Machine Learning, 88(1-2), 157-208, 2012.
Distributed Gibbs sampling for latent variable models
A. Asuncion, P. Smyth, M. Welling, D. Newman, I. Porteous, S. Triglia
in Scaling Up Machine Learning, R. Bekkerman, M. Bilenko, and J. Langford (eds.), Cambridge University Press, 2012.
Statistical models for exploring individual email communication behavior
N. Navaroli, C. DuBois, P. Smyth
Proceedings of the 4th Asian Conference on Machine Learning (ACML 2012), November 2012.
A graphical model representation of the track-oriented multiple hypothesis tracker
A. Frank, P. Smyth, A. T. Ihler
IEEE Statistical Signal Processing Workshop, August 2012.
Brain and muscle Arnt-like protein-1 (BMAL1) controls circadian cell proliferation and susceptibility to UVB-induced DNA damage in the epidermis
M. Geyfman, V. Kumar, Q. Liu, R. Ruiz, W. Gordon, F. Espitia, E. Cam, S. E. Millar, P. Smyth, A. Ihler, J. S. Takahashi, B. Andersen
Proceedings of the National Academy of Sciences, 2012.
Automated analysis of the temporal behavior of the double Intertropical Convergence Zone over the east Pacific
D. Henke, P. Smyth, C. Haffke, G. Magnusdottir
Remote Sensing of Environment, May 2012.
TopicNets: visual analysis of large text corpora with topic modeling
B. Gretarsson, J. O'Donovan, S. Bostandjiev, T. Hollerer, A. Asuncion, D. Newman, P. Smyth
ACM Transactions on Intelligent Systems and Technology, February 2012.
2011
Continuous-time regression models for longitudinal networks
D. Q. Vu, A. Asuncion, D. R. Hunter, P. Smyth
Proceedings of the 25th Conference on Neural Information Processing (NIPS 2011), December 2011.
Latent set models for two-mode network data
C. DuBois, J. Foulds, P. Smyth
Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, July 2011.
Dynamic egocentric models for citation networks
D. Q. Vu, A. Asuncion, D. R. Hunter, P. Smyth
Proceedings of the 28th International Conference on Machine Learning (ICML 2011), June 2011.
A dynamic relational infinite feature model for longitudinal social networks
J. Foulds, A. Asuncion, C. DuBois, C. T. Butts, P. Smyth
Proceedings of the 14th International Conference on AI and Statistics, April 2011.
Revisiting MAP estimation, message passing, and perfect graphs
J. Foulds, N. Navaroli, P. Smyth, A. T. Ihler
Proceedings of the 14th International Conference on AI and Statistics, April 2011.
Optimal use of land surface temperature data to detect changes in forest cover
T. van Leeuwen, A. J. Frank, Y. Jin, P. Smyth, M. L. Goulden, G. R. van der Werf, J. T. Randerson
Journal of Geophysical Research - Biogeosciences, 116, G02002, doi:10.1029/2010JG00148
Multi-instance mixture models and semi-supervised learning
J. Foulds and P. Smyth
SIAM International Conference on Data Mining, April 2011.
Downscaling projections of Indian monsoon rainfall using a non-homogeneous hidden Markov model
A. M. Greene, A. W. Robertson, P. Smyth, and S. Triglia
Quarterly Journal of the Royal Meteorological Society,137(655), 347-359, January 2011.
Detecting the ITCZ in instantaneous satellite data using spatiotemporal statistical modeling: ITCZ climatology in the east Pacific
C. L. Bain, J. De Paz, J. Kramer, G. Magnusdottir, P. Smyth, H. Stern, and C. Wang
Journal of Climate, 24(1), 216-330, January 2011.
Combining background knowledge and learned topics
M. Steyvers, P. Smyth, and C. Chemudugunta
Topics in Cognitive Science, 3(1), 18-47, January 2011.
Asynchronous distributed estimation of topic models for document analysis
A. Asuncion, P. Smyth, and M. Welling
Statistical Methodology, 8(1), 3--17, January 2011.
2010
Learning concept graphs from text with stick-breaking priors
A. L. Chambers, P. Smyth, and M. Steyvers
Neural Information Processing Conference (NIPS), 2010.
Diurnal cycle of the Intertropical Convergence Zone in the east Pacific
C. L. Bain, G. Magnusdottir, P. Smyth, and H. Stern
Journal of Geophysical Research - Atmospheres, 115, D23116, December 2010.
A Bayesian mixture approach to modeling spatial activation patterns in multi-site fMRI data
S. Kim, P. Smyth, and H. Stern
IEEE Transactions on Medical Imaging, 29(6), 1260-1274, June 2010.
A Bayesian framework for storm tracking using a hidden-state representation
L. Scharenbroich, G. Magnusdottir, P. Smyth, H. Stern, and C. Wang
Monthly Weather Review, 138(6), 2132-2148, June 2010.
Particle filtered MCMC-MLE with connections to contrastive divergence
A. Asuncion, Q. Liu, A. T. Ihler, and P. Smyth
Proceedings of the 27th International Conference on Machine Learning (ICML), July 2010.
Modeling relational events via latent classes
C. DuBois and P. Smyth
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010.
Learning with blocks: composite likelihood and contrastive divergence
A. Asuncion, Q. Liu, A. T. Ihler, and P. Smyth
Thirtheenth International Conference on AI and Statistics, May 2010.
Estimating replicate time-shifts using Gaussian process regression
Q. Liu, K. K. Lin, B. Andersen, P. Smyth, and A. T. Ihler
Bioinformatics, February 2010
Learning author-topic models from text corpora
M. Rosen-Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, and M. Steyvers
ACM Transactions on Information Systems, 28(1), January 2010.
2009
Particle-based variational inference for continuous systems
A. T. Ihler, A. J. Frank, and P. Smyth
Neural Information Processing Conference (NIPS), 2009.
Bayesian detection of non-sinusoidal periodic patterns in circadian expression data
D. Chudova, A. T. Ihler, K. K. Lin, B. Andersen, and P. Smyth
Bioinformatics, 25:3114-3120, 2009.
Distributed algorithms for topic models
D. Newman, A. Asuncion, P. Smyth, and M. Welling
Journal of Machine Learning Research, 10(Aug):1801-1828, 2009.
Circadian clock genes contribute to the regulation of hair follicle cycling
K. K. Lin, V. Kumar, M. Gefman, D. Chudova, A. T. Ihler, P. Smyth, R. Paus, J. S. Takahashi, B. Andersen
On smoothing and inference for topic models
A. Asuncion, M. Welling, P. Smyth, Y. W. Teh
The 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), June 2009.
2008
Asynchronous distributed learning of topic models
A. Asuncion, P. Smyth, and M. Welling
Neural Information Processing Systems (NIPS) 21, December 2008.
Modeling documents by combining semantic concepts with unsupervised statistical learning
C. Chemudugunta, A. Holloway, P. Smyth, and M. Steyvers
Proceedings of the International Semantic Web Conference (ISWC-08), October 2008.
Combining concept hierarchies and statistical topic models
C. Chemudugunta, P. Smyth, and M. Steyvers
Proceedings of the Conference on Information and Knowledge Management (CIKM-08), October 2008.
Fast collapsed Gibbs sampling for latent Dirichlet allocation
I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling
Proceedings of the 14th ACM SIGKDD Conference (KDD-08), August 2008.
Probabilistic analysis of a large-scale urban traffic data set
J. Hutchins, A. Ihler, and P. Smyth
Second International Workshop on Knowledge Discovery from Sensor Data (ACM SIGKDD Conference, KDD-08), August 2008.
2007
Modeling count data from multiple sensors: a building occupancy model
J. Hutchins, A. Ihler, and P. Smyth
IEEE International Workshop on Computational Advances in MultiSensor Adaptive Processing, December 2007.
Distributed inference for latent Dirichlet allocation
D. Newman, A Asuncion, P. Smyth, and M. Welling
Advances in Neural Information Processing Systems 20, December 2007.
Learning to detect events with Markov-modulated Poisson processes
A. T. Ihler, J. Hutchins, and P. Smyth
ACM Transactions on Knowledge Discovery from Data, December 2007.
Test-retest and between-site reliability in a multicenter fMRI study
L. Friedman et al (long list of authors including S. Kim and P. Smyth)
Human Brain Mapping.
Infinite mixtures of trees
S. Kirshner and P. Smyth
Proceedings of the 24th International Conference on Machine Learning (ICML), June 2007.
Probabilistic clustering of extratropical cyclones using regression mixture models
S. J. Gaffney, A. W. Robertson, P. Smyth, S. J. Camargo, and M. Ghil
Climate Dynamics, 29(4), 423-440, 2007.
Subject metadata enhancement using statistical topic models
D. Newman, K. Hagedorn, P. Smyth, and C. Chemudugunta
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007), June 2007.
Cluster analysis of typhoon tracks. Part I: General properties
S. J. Camargo, A. W. Robertson, S. J. Gaffney, P. Smyth, and M. Ghil
Journal of Climate, 20:3635-3653, 2007.
Cluster analysis of typhoon tracks. Part II: Large-scale circulation and ENSO
S. J. Camargo, A. W. Robertson, S. J. Gaffney, P. Smyth, and M. Ghil
Journal of Climate, 20:3654-3676, 2007.
Graphical models for statistical inference and data assimilation
A. T. Ihler, S. Kirshner, M. Ghil, A. W. Robertson, P. Smyth
Physica D, 230(1-2), 72-87, 2007.
2006
Learning time-intensity profiles of human activity using non-parametric Bayesian models
A. Ihler and P. Smyth
Advances in Neural Information Processing Systems 19, December 2006.
Modeling general and specific aspects of documents with a probabilistic topic model
C. Chemudugunta, P. Smyth, and M. Steyvers
Advances in Neural Information Processing Systems 19, December 2006.
Hierarchical Dirichlet processes with random effects
S. Kim and P. Smyth
Advances in Neural Information Processing Systems 19, December 2006.
A nonparametric Bayesian approach to detecting spatial activation patterns in fMRI data
S. Kim, P. Smyth, and H. Stern
Proceedings of the 9th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), October 2006.
Segmental hidden Markov models with random effects for waveform modeling
S. Kim and P. Smyth
Journal of Machine Learning Research, 7(Jun):945-969, 2006.
Adaptive event detection with time-varying Poisson processes
A. Ihler, J. Hutchins, and P. Smyth
Proceedings of the 12th ACM SIGKDD Conference (KDD-06), August 2006.
Statistical entity-topic models
D. Newman, C. Chemudugunta, P. Smyth, and M. Steyvers
Proceedings of the 12th ACM SIGKDD Conference (KDD-06), August 2006.
Gibbs sampling for coupled infinite mixture models in the stick-breaking representation
I. Porteous, A. Ihler, P. Smyth, and M. Welling
Proceedings of the Uncertainty in AI Conference, July 2006.
Analyzing entities and topics in news articles using statistical topic models
D. Newman, C. Chemudugunta, P. Smyth, and M. Steyvers
IEEE International Conference on Intelligence and Security Informatics, Springer Lecture Notes in Computer Science (LNCS), May 2006.
Subseasonal-to-interdecadal variability of the Australian monsoon over North Queensland
A. Robertson, S. Kirshner, P. Smyth, S. Charles, and B. Bates
Quarterly Journal of the Royal Meteorological Society, 132, 519-542, 2006.
Imaging phenotypes and genotypes in schizophrenia
J. Turner, P. Smyth, F. Macciardi, J. H. Fallon, J. Kennedy, S. Potkin
Neuroinformatics, 4(1), 21-50, March 2006.
Cluster analysis of Western North Pacific tropical cyclone tracks
S.J. Camargo, A.W. Robertson, S.J. Gaffney, P. Smyth and M. Ghil
IRI Technical Report 05-03, International Research Institute for Climate and Society, Columbia University, Palisades, NY, January 2006.
Probabilistic clustering of extratropical cyclones using regression mixture models
S. Gaffney, A. Robertson, P. Smyth, S. Camargo, M. Ghil
Technical Report UCI-TR 06-02, Bren School of Information and Computer Sciences, UC Irvine, January 2006.
2005
Prediction and ranking algorithms for event-based network data
J. O Madadhain, J. Hutchins, P. Smyth
ACM SIGKDD Explorations: Special Issue on Link Mining, 7(2), 23-30, December 2006.
Parametric response surface models for analysis of multi-site fMRI data
S. Kim, P. Smyth, H. Stern, and J. Turner
Proceedings of the 8th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), October 2005.
EventRank: a framework for ranking in time-varying networks
J. O Madadhain and P. Smyth
ACM SIGKDD Workshop on Link Discovery, August 2005.
A spectral approach to finding communities in graphs
S. White and P. Smyth
SIAM 2005 Conference on Data Mining, April 2005.
2004
Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance
K. Lin, D. Chudova, G. W. Hatfield, P. Smyth, and B. Andersen
Proceedings of the National Academy of Sciences , November 2004.
Joint probabilistic curve clustering and alignment
S. Gaffney and P. Smyth
Advances in Neural Information Processing 17, 2004.
Probabilistic author-topic models for information discovery
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths
in Proceedings of the 10th ACM SIGKDD Conference, August 2004.
Modeling waveform shapes with random effects segmental hidden Markov models
S. Kim, P. Smyth, and S. Luther
Technical Report UCI-ICS 04-05, March 2004.
(shorter version in Proceedings of the 20th International Conference on Uncertainty in AI, July 2004.)
Conditional Chow-Liu tree structures for modeling discrete-valued vector time series
S. Kirshner, P. Smyth, and A. Robertson
Technical Report UCI-ICS 04-04, March 2004.
(shorter version in Proceedings of the 20th International Conference on Uncertainty in AI, July 2004.)
The author-topic model for authors and documents
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth
in Proceedings of the 20th International Conference on Uncertainty in AI, July 2004.
Hidden Markov models for modeling daily rainfall occurrence over Brazil
A. Robertson, S. Kirshner, and P. Smyth
Technical Report UCI-ICS 03-27. Revised version appeared in the Journal of Climate.
Cluster analysis of Western North Pacific tropical cyclone tracks
S. J. Camargo, A. W. Robertson, S. J. Gaffney, and P. Smyth
Extended Abstract; Proceedings of the 26th Conference on Hurricanes and Tropical Meteorology, 3-7 May 2004, Miami, FL, 10A.7, p. 250-251.
Learning stochastic path planning models from video images
S. Parise, P. Smyth
Technical Report UCI-ICS 04-12.
Gene expression clustering with functional mixture models
D. Chudova, C. Hart, E. Mjolsness, and P. Smyth
in Advances in Neural Information Processing 16, MIT Press, 2004.
2003
Modeling the Internet and the Web: Probabilistic Methods and Algorithms
P. Baldi, P. Frasconi, and P. Smyth
John Wiley and Sons, 2003.
The JUNG (Java Universal Network/Graph) Framework
J. O'Madadhain, D. Fisher, S. White, and Y. Boey
Technical Report UCI-ICS 03-17.
Algorithms for estimating relative importance in networks
S. White and P. Smyth
Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (SIGKDD 2003)
Probabilistic models for joint clustering and alignment of multidimensional curves
D. Chudova, S. Gaffney, and P. Smyth
In Proceedings of the Nineteenth Conference on Uncertainty and Artificial Intelligence, 2003.
Model-based clustering and visualization of navigation patterns on a Web site
I. V. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White
Journal of Data Mining and Knowledge Discovery, 7(4), 2003.
(extended version of ACM SIGKDD 2000 conference paper)
Beyond independence: probabilistic models for query approximation on binary transaction data
D. Pavlov, H. Mannila, P. Smyth
IEEE Transactions on Knowledge and Data Engineering, September 2003.
Translation-invariant mixture models for curve clustering
D. Chudova, S. Gaffney, E. Mjolsness, and P. Smyth
Technical Report UCI-ICS 03-09, March 2003.
(extended version of a paper in Proceedings of the Ninth ACM Conference on Knowledge Discovery and Data Mining)
Unsupervised learning from permuted data
S. Kirshner, S. Parise, and P. Smyth
Technical Report UCI-ICS 03-03, February 2003.
(extended version of a paper in Proceedings of the Twentieth International Conference on Machine Learning, ICML-03)
Approximate query answering by model averaging
D. Pavlov and P. Smyth
In Proceedings of the SIAM Third International Conference on Data Mining, May 2003.
Curve clustering with random effects regression mixtures
S. Gaffney and P. Smyth
In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, January 2003.
Clustering Markov states into equivalence classes using SVD and heuristic search algorithms
X. Ge, S. Parise, and P. Smyth
In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, January 2003.
AISTATS-2003 poster: PDF, PS
2002
Learning to classify galaxy shapes using the EM algorithm
S. Kirshner, I. V. Cadez, P. Smyth, C. Kamath
NIPS 2002, in Advances in Neural Information Processing 15 , MIT Press.
Sequential pattern discovery under a Markov assumption
D. Chudova and P. Smyth
Technical Report UCI-ICS 02-08.
A revised version of this tech report appears in the Journal of Data Mining and Knowledge Discovery, 7(3), 273-299, 2003. An earlier, shorter version of the paper appeared in Proceedings of the Eighth ACM International Conference on Knowledge Discovery and Data Mining (KDD-2002), August 2002 (winner, best research paper award)
Probabilistic model-based detection of bent-double radio galaxies
S. Kirshner, I. V. Cadez, P. Smyth, C. Kamath and E. Cantu-Paz
In Proceedings of the International Conference on Pattern Recognition, ICPR 2002, August 2002.
Also available in slightly extended form as Technical Report UCI-ICS 02-14 (PDF), with more details on kernel density estimation.
The Markov modulated Poisson process and Markov Poisson cascade with applications to Web traffic modeling
S. L. Scott and P. Smyth
Presented at the Seventh Valencia Conference on Bayesian Statistics. Revised version in Bayesian Statistics 7, Oxford University Press, 2003.
2001
Predictive profiles for transaction data using finite mixture models
I. V. Cadez, P. Smyth, E. Ip, and H. Mannila
Technical Report UCI-ICS 01-67, December 2001.
Classification of disorders of anemia on the basis of mixture model parameters
C. E. McLaren, I. V. Cadez, P. Smyth, and G. J. McLachlan
Technical Report UCI-ICS 01-56, November 2001.
Hidden Markov models for endpoint detection in plasma etch processes
X. Ge and P. Smyth
Technical Report UCI-ICS 01-54, September 2001.
The distribution of loop lengths in graphical models for turbo decoding
X. Ge, D. Eppstein, and P. Smyth
IEEE Transactions on Information Theory, vol. 47, no. 6, pp. 2549--2553, Sep 2001.
Data mining at the interface of computer science and statistics
P. Smyth
A review paper on why statistical thinking should be an essential component in data mining, primarily written for a computer science and engineering audience.
A revised version appeared as a chapter in Data Mining for Scientific and Engineering Applications, Kluwer, pages 35-61, 2001.
2000
Model complexity, goodness of fit, and diminishing returns
I. Cadez and P. Smyth
in Advances in Neural Information Processing Systems 13, MIT Press, 2001.
Theoretical results on how goodness-of-fit (squared error, log-likelihood) changes as a function of model complexity (e.g., for best subsets linear regression, mixture models). Under appropriate assumptions the goodness of fit is concave (or convex) to first-order.
Data mining: data analysis on a grand scale?
P. Smyth
Technical Report UCI-ICS 00-20, July 2000.
(revised version appeared in Statistical Methods in Medical Research)
A review paper which addresses the question ``what is data mining and how is it different from statistics?" (primarily written for a statistical audience)
A general probabilistic framework for clustering individuals
I. Cadez, S. Gaffney and P. Smyth
Technical Report UCI-ICS 00-09, March 2000.
Revised version in ACM SIGKDD 2000 Proceedings
Outlines a general EM-based framework for clustering sets of sequences, curves, and other non-vector objects, with applications to gene expression data, Web page requests, and red blood cell histograms.
Visualization of navigation patterns on a Web site using model-based clustering
I. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White
Technical Report MSR-TR-00-18, Microsoft Research, March 2000.
Shorter version in ACM SIGKDD 2000 Proceedings.
Application of mixtures of Markov models to clustering and visualization of several hundred thousand page request sequences from a large commercial Web site
Towards scalable support vector machines using squashing
D. Pavlov, D. Chudova, P. Smyth
revised version in ACM SIGKDD 2000 Proceedings.
Demonstrates how the idea of pseudo-datasets using squashing can be used in combination with support vector machines for very large data sets.
Deformable Markov model templates for time-series pattern matching
X. Ge and P. Smyth
Technical Report UCI-ICS 00-10, March 2000.
Revised version in ACM SIGKDD 2000 Proceedings (Runner up for best research paper award).
A model-based approach to online local pattern detection in time series: provides a Viterbi algorithm for optimal detection, with applications in end-point detection in plasma etch for semiconductor manufacturing.
Segmental semi-Markov models for change-point detection with applications to semiconductor manufacturing
X. Ge and P. Smyth
Technical Report UCI-ICS 00-08, March 2000.
Application of segmental semi-Markov models to the classic problem of change-point detection, with results on a real semiconductor manufacturing problem.
Probabilistic models for query approximation with large sparse binary datasets
D. Pavlov, H. Mannila, P. Smyth
Technical Report UCI-ICS 00-07, March 2000.
Revised version in UAI-2000 Proceedings.
Shows how frequent itemsets (or association rules) can be combined in a coherent manner into a multivariate probabilistic model (a Markov random field).
1999
Maximum likelihood estimation of mixture densities for binned and truncated multivariate data
I. Cadez, P. Smyth, G. J. McLachlan, and C. E. McLaren
Technical Report UCI-ICS 99-13, March 1999.
A revised longer version appeared in the journal Machine Learning in 2002.
Provides a general EM framework for fitting mixture models to binned and truncated data - a generalization of Geoff McLachlan's earlier univariate framework to the multivariate case.
The distribution of cycle lengths in graphical models for iterative decoding
X. Ge, D. Eppstein, and P. Smyth
Technical Report UCI-ICS 99-10, March 1999.
Revised version to appear in IEEE Transactions on Information Theory.
Derives approximate analytical results on the distribution of cycle lengths in turbo-decoding and LDPC decoding graphs: in typical graphs almost all nodes are found to be on cycles of length 20 or more, relatively few nodes are on cycles of length 8 or less.
Modeling of inhomogeneous Markov random fields with applications to cloud screening
I. Cadez and P. Smyth
Technical Report UCI-ICS 98-2l (revised March 1999).
Derives a computationally-efficient method for fitting spatially-varying Markov random field models to pixel data, with illustrations on simulated data and remote-sensing image data
Hierarchical models for screening of iron deficiency anemia
I. Cadez, C. E. McLaren, P. Smyth, and G. J. McLachlan
Technical Report UCI-ICS 99-14, March 1999.
Revised version appeared in the Proceedings of the 1999 International Conference on Machine Learning.
Hierarchical mixture models applied to the problem of screening blood samples for various anemia-related disorders.
Trajectory clustering using mixtures of regression models
S. Gaffney and P. Smyth
Technical Report UCI-ICS 99-15, March 1999.
Revised version appeared in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , August 1999.
General framework for clustering curves and trajectories using EM, including an extension to mixtures of non-parametric kernel regression models.
Prediction with local patterns using cross-entropy
H. Mannila, D. Pavlov, and P. Smyth
Revised version appeared in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , August 1999.
Early results on the association rule/probabilistic model theme (see Pavlov, Mannila, and Smyth, 2000 (above) for more recent work).
Discovering Chinese Words from Unsegmented Text
X. Ge, W. Pratt and P. Smyth
SIGIR-99.
Uses EM algorithm to do unsupervised segmentation of Chinese text into words.
Local context matching for page replacement
X. Ge, S. Gaffney, D. Pavlov, P. Smyth
Technical Report UCI-ICS 99-37, Sept 1999.
Proposed a nearest-neighbor style algorithm for sequence prediction that outperforms other well-known algorithms (such as LRU) for page caching in computer systems.
Probabilistic clustering using hierarchical models
I. Cadez and P. Smyth
Technical Report UCI-ICS 99-16, March 1999.
Derives a generalized EM (GEM) algorithm for model-based clustering using probabilistic hierarchical models.
Probabilistic model-based clustering of multivariate and sequential data
P. Smyth
in Proceedings of the Seventh International Workshop on AI and Statistics, D. Heckerman and J. Whittaker (eds.), Los Gatos, CA: Morgan Kaufmann, January 1999.
General mixture model framework for clustering individuals based on a combination of sequential and static information
1998
An evaluation of linearly combining density estimators via stacking
P. Smyth and D. Wolpert
Technical Report UCI-ICS 98-25.
Shorter version appeared in Machine Learning, vol 36, 1/2, pp. 53-89, July 1999.
How to use David Wolpert's stacking idea in density estimation, and get weighted mixtures of densities which outperform any of the component models out of sample.
Model selection for probabilistic clustering using cross-validated likelihood
P. Smyth
ICS Technical Report 98-09.
Revised version appeared in Statistics and Computing, 10:1, 63-72, 2000.
Illustrates how cross-validated log-likelihood can be used as a data-driven alternative to methods such as BIC, etc., in model selection for unsupervised learning.
Multiple regimes in Northern hemisphere height fields via mixture model clustering
P. Smyth, M. Ghil, and K. Ide
Technical Report UCI-ICS 98-08.
Extended version appeared in the Journal of the Atmospheric Sciences, vol. 56, no.21, November 1999.
Gaussian mixture model clustering is used to confirm earlier non-probabilistic studies in the atmospheric science literature that indicated that low-frequency upper atmosphere pressure patterns in the Northern hemisphere are dominated by three specific regimes.
1997
Belief networks, hidden Markov models, and Markov random fields: a unifying view
P. Smyth
Revised version appeared in Pattern Recognition Letters, 18, 1261-1268, 1997.
Short high-level review discussing how graphical models can be viewed as a unifying framework for multivariate probabilistic models in many different fields.
Probabilistic independence networks for hidden Markov models
P. Smyth, D. Heckerman, and M. Jordan
Revised version appeared in Neural Computation , vol.9, no. 2, 227--269, 1997
Revised Illustrates in detail the links between hidden Markov models and graphical models. Argues that the graphical model framework is a natural one for developing extensions of HMMs in a systematic manner
Bounds on the mean classification error rate of multiple experts
P. Smyth
Pattern Recognition Letters, 1997.
If K experts provide subjective class labels for a set of N objects or images can we say anything about how accurate they are relative to ground truth? provides some simple bounds on the average accuracy of the K experts
1996 and earlier
Clustering sequences with hidden Markov models
P. Smyth
in Advances in Neural Information Processing 9, M. C. Mozer, M. I. Jordan, and T. Petsche (eds.), MIT Press, 1997.
Older paper on clustering sequences with hidden Markov models: see the Cadez, Gaffney, and Smyth (2000) paper above for a more recent and general framework
Clustering using Monte Carlo cross-validation
P. Smyth
in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1996.
Earlier conference version of the Statistics and Computing paper (Smyth, 2000) above
Automating the hunt for volcanoes on Venus
M.Burl et al.
in Proceedings of the 1994 Conference on Computer Vision and Pattern Recognition.
Early results on a pattern recognition system developed at JPL for automatically detecting volcanoes in radar images of Venus. A longer version of this paper with more details appeared in the Machine Learning journal in 1997.
Markov monitoring with unknown states
P. Smyth
in IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on Intelligent Signal Processing for Communications , December 1994.
Application of hidden Markov models to a large-scale real-world fault detection problem involving online monitoring of electro-mechanical antenna control systems for NASA's Deep Space Network
Inferring ground truth from subjective labelling of Venus images
P. Smyth et al.
in Advances in Neural Information Processing Systems 7, MIT Press, 1995 .
An EM-based model for inferring reliability from multiple sets of class labels provided subjectively by a set of experts, with applications to labelling volcanoes in images of Venus