Figure 1. Human-centric knowledge discovery and decision optimization. In this loop, improved systems' utilities can be produced by in-depth understanding of humans (i.e., the flow from humans to systems); and optimized humans' decision making can be realized by customized knowledge services (i.e., the flow from systems to humans). The proposed research tasks are labeled on their corresponding positions in the loop.
CAREER: Human-Centric Knowledge Discovery and Decision Optimization
Principal Investigator: Hongning Wang, CAREER#1553568
Janurary 15, 2016 to April 31, 2021
Research Objectives
Research Thrusts
- Joint text and behavior analysis. To exploit as many types of human-generated data as possible and capture the dependencies among them, this project develops a set of novel probabilistic generative models to perform integrative analysis of text and behavior data.
- Task-based online decision optimization. Traditional static, ad-hoc and passive machine-human interactions are inadequate to optimize humans' dynamic decision making processes. To address this limitation, users' longitudinal information seeking activities are organized into tasks, where new online learning algorithms are applied to proactively infer users' intents and adapt the systems for long-term utility optimization.
- Explainable personalization. Existing personalized systems are black boxes to their users. Users typically have little control over how their information is used to personalize systems. To help ordinary users be aware of how the system's behavior is customized and increase their trust in such systems, statistical learning algorithms are built to generate both system-oriented and user-oriented explanations.
- System implementation and prototyping. User studies are conducted in a prototype system integrated with all the algorithms developed in this project to evaluate the deployed algorithms. Evaluation and feedback from real users are circulated back to refine the assumptions and design of the developed algorithms.
Expected Outcome
- Algorithmic solutions for user behavior modeling and online interactive learning.
- Open source tools and web services that will provide joint analysis of human-generated text data and behavior data in various applications, such as search logs, forum discussions, and opinionated reviews.
- Annotated corpora and new evaluation metrics that will enable researchers to conduct follow-up research in related domains.
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant CAREER#1553568.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Supported Students
- Qingyun Wu, PhD, 2015-now, area of research: online learning and bandit algorithms
- Huazheng Wang, PhD, 2015-2019, area of research: online learning and bandit algorithms
- Lin Gong, PhD, 2015-2019, area of research: user modeling and sentiment analysis
- Nan Wang, PhD, 2018-now, area of research: explainable recommendation
Research Progress
- Hidden Topic Sentiment Model (Thrust I)
-
Publication:
- Md Mustafizur Rahman and Hongning Wang. Hidden Topic Sentiment Model. The 25th International World-Wide Web Conference (WWW'2016), p155-165, 2016. (PDF)
- Yue Wang, Hongning Wang and Hui Fang. Extracting User-Reported Mobile Application Defects from Online Reviews. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), SENTIRE (2017), p422-429, 2017. (PDF)
- Code: Java Implementation of HTSM is available here.
- Data sets: NewEgg Reviews (JSON, Readme), Amazon Reviews (JSON, Readme), Manually Selected Prior and Seed Words (ZIP)
- Accounting for the Correspondence in Commented Data (Thrust I)
-
Publication:
- Renqin Cai, Chi Wang and Hongning Wang. Account for Correspondence in Commented Data. The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), p365-374, 2017. (PDF)
- Code: Java Implementation of CCTM is available here.
- Chrome Extension: A Chrome Extension for amazon.com that automatically maps users' review comments with the product specifications (currently only works for the "Computers & Accessories" category in Amazon). It can be downloaded here.
- Modeling Student Learning Styles in MOOCs (Thrust I)
-
Publication:
- Yuling Shi, Zhiyong Peng and Hongning Wang. Modeling Student Learning Styles in MOOCs. The 26th International Conference on Information and Knowledge Management (CIKM 2017), p979-988, 2017. (PDF)
- Diheng Zhang, Yuling Shi, Hongning Wang, and Bethany A. Teachman. Predictors of Attrition in a Public Online Interpretation Training Program for Anxiety. 51st Annual Convention of Association for Behavioral and Cognitive Therapies (poster paper), 2017. (Poster)
- Renqin Cai, Xueying Bai, Yuling Shi, Zhenrui Wang, Parikshit Sondhi and Hongning Wang. Modeling Sequential Online Interactive Behaviors with Temporal Point Process. The 27th International Conference on Information and Knowledge Management (CIKM 2018), p873-882, 2018. (PDF)
- Code: Python Implementation of L2S model is available here.
- Modeling Social Norms Evolution for Personalized Sentiment Classification (Thrust I)
-
Publication:
- Lin Gong, Mohammad Al Boni and Hongning Wang. Modeling Social Norms Evolution for Personalized Sentiment Classification. The 54th Annual Meeting of the Association for Computational Linguistics (ACL'2016), p855-865, 2016. (PDF)
- Lin Gong, Benjamin Haines and Hongning Wang. Clustered Model Adaptation for Personalized Sentiment Analysis. The 26th International World Wide Web Conference (WWW 2017), p937-946, 2017. (PDF)
- Lin Gong and Hongning Wang. When Sentiment Analysis Meets Social Network: A Holistic User Behavior Modeling in Opinionated Data. The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), p1455-1464, 2018. (PDF, Video)
- Code:
- Data sets:
- Joint Topic and Network Embedding for User Representation Learning (Thrust I)
-
Publication:
- Lin Gong, Lu Lin, Weihao Song and Hongning Wang. JNET: Learning User Representations via Joint Network Embedding and Topic Embedding. The 13th ACM International Conference on Web Search and Data Mining (WSDM 2020), p205-213, 2020. (PDF)
- Lu Lin, Lin Gong and Hongning Wang. Learning Personalized Topical Compositions with Item Response Theory. The 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), p609-617, 2019. (PDF)
-
Code:
- A Java Implementation of our joint topic and network embedding algorithm can be found here.
- Data sets: Our experimentation data sets can be found here.
- Contextual Bandits in A Collaborative Environment (Thrust II)
-
Publication:
- Qingyun Wu, Huazheng Wang, Quanquan Gu and Hongning Wang. Contextual Bandits in A Collaborative Environment. The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2016), p529-538, 2016. (PDF)
- Code: Python Implementation of CoLin can be found here.
- Data sets: Our experimentation data sets (include the simulator, processed Yahoo Frontpage, LastFM and Delicious data sets) can be found here.
- Learning Hidden Features for Contextual Bandits (Thrust II)
-
Publication:
- Huazheng Wang, Qingyun Wu and Hongning Wang. Learning Hidden Features for Contextual Bandits. The 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), p1633-1642, 2016. (PDF)
- Huazheng Wang, Qingyun Wu and Hongning Wang. Factorization Bandits for Interactive Recommendation. The Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017). (PDF, Supplement)
- Code: Python Implementation of hLinUCB and factorUCB can be found here.
- Data sets: Our experimentation data sets (include the simulator, processed Yahoo Frontpage, LastFM and Delicious data sets) can be found here.
- Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems (Thrust II)
-
Publication:
- Qingyun Wu, Hongning Wang, Liangjie Hong and Yue Shi. Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems. The 26th International Conference on Information and Knowledge Management (CIKM 2017), p1927-1936, 2017. (PDF)
- Code: Python Implementation of r2Bandit can be found here.
- Learning Contextual Bandits in a Non-stationary Environment (Thrust II)
-
Publication:
- Qingyun Wu, Naveen Iyer and Hongning Wang. Learning Contextual Bandits in a Non-stationary Environment. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p495-504, 2018. (PDF)
- Qingyun Wu, Huazheng Wang, Yanen Li and Hongning Wang. Dynamic Ensemble of Contextual Bandits to Satisfy Users' Changing Interests. The Web Conference 2019 (WWW 2019), p2080-2090, 2019 (PDF)
- Code: Python Implementation of dLinUCB and DenBand can be found here.
- Online Learning to Rank (Thrust II)
-
Publication:
- Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook and Hongning Wang. Efficient Exploration of Gradient Space for Online Learning to Rank. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p145-154, 2018. (PDF)
- Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu and Hongning Wang. Variance Reduction in Gradient Exploration for Online Learning to Rank. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), p835-844, 2019. Best Paper Award (PDF)
- Code: Python Implementation of NSGD and DSP algorithms can be found here.
- Explainable Recommendation (Thrust III)
-
Publication:
- Yiyi Tao, Yiling Jia, Nan Wang and Hongning Wang. The FacT: Taming Latent Factor Models for Explainability with Factorization Trees. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019), p295-304, 2019. (PDF)
- Nan Wang, Yiling Jia, Yue Yin and Hongning Wang. Explainable Recommendation via Multi-Task Learning in Opinionated Text Data. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p165-174, 2018. (PDF)
-
Code:
- A Python Implementation of our factorization tree algorithm for explainable recommendation can be found here.
Result Highlights:
In this work, we develop a Commented Correspondence Topic Model to model correspondence in commented text data. We focus on two levels of correspondence. First, to capture topic-level correspondence, we treat the topic assignments in commented documents as the prior to their comments' topic proportions. This captures the thematic dependency between commented documents and their comments. Second, to capture word-level correspondence, we utilize the Dirichlet compound multinomial distribution to model topics. This captures the word repetition patterns within the commented data. By integrating these two aspects, our model demonstrated encouraging performance in capturing the correspondence structure, which provides improved results in modeling user-generated content, spam comment detection, and sentence-based comment retrieval compared with state-of-the-art topic model solutions for correspondence modeling.
Result Highlights:
Figure T1.2.1 Illustration of the topics learned by CCTM in the ArsTechnica news dataset. In each box of the first level, top 10 words are selected from a global topic; and in the second level, top 10 words from six randomly selected article-comment threads accordingly. The article's title is labeled under each leaf node. Word clouds are used to to highlight the content of selected articles and comments on the second level. The inferred topic distributions in articles and comments are shown at the bottom of this figure.
Result Highlights:
Result Highlights:
Result Highlights:
In this work, we develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithm against several state-of-the-art contextual bandit algorithms.
Result Highlights:
In this work, we propose to learn the hidden features for contextual bandit algorithms. Hidden features are explicitly introduced in our reward generation assumption, in addition to the observable contextual features. A scalable bandit algorithm is achieved via coordinate descent, in which closed form solutions exist at each iteration for both hidden features and bandit parameters. Most importantly, we rigorously prove that the developed contextual bandit algorithm achieves a sublinear upper regret bound with high probability, and a linear regret is inevitable if one fails to model such hidden features. Extensive experimentation on both simulations and large-scale real-world datasets verified the advantages of the proposed algorithm compared with several state-of-the-art contextual bandit algorithms and existing ad-hoc combinations between bandit algorithms and matrix factorization methods.
Result Highlights:
In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation for immediate click, exploitation for expected future clicks, and exploration of unknowns for model estimation. We rigorously prove that with a high probability our proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Extensive experimentation on both simulations and a large-scale real-world dataset collected from Yahoo frontpage news recommendation log verified the effectiveness and significant improvement of our proposed algorithm compared with several state-of-the-art online learning baselines for recommendation.
Result Highlights:
Result Highlights:
Figure T2.4.1 Illustration of dLinUCB. The master bandit model maintains the `badness' estimation of slave models over time to detect changes in the environment. At each round, the most promising slave model is chosen to interact with the environment; and the acquired feedback is shared across all admissible slave models for model update.
In this work, we focus on Dueling Bandit Gradient Descent (DBGD) based OL2R algorithms, which constitute a major endeavor in this direction of research. In particular, we aim at reducing the variance of gradient estimation in DBGD-type OL2R algorithms. We project the selected updating direction (i.e., the winning direction) into a space spanned by the feature vectors from examined documents under the current query (termed the "document space" for short), after an interleaved test. Our key insight is that the result of an interleaved test is solely governed by a user's relevance evaluation over the examined documents. Hence, the true gradient introduced by this test is only reflected in the constructed document space, and components of the proposed gradient which are orthogonal to the document space can be safely removed, for variance reduction purpose. We prove that this projected gradient is still an unbiased estimation of the true gradient, and show that this lower-variance gradient estimation results in significant regret reduction. Our proposed method is compatible with all existing DBGD-type OL2R algorithms which rank documents using a linear model. Extensive experimental comparisons with several best-performing DBGD-type OL2R algorithms have confirmed the effectiveness of our proposed method in reducing the variance of gradient estimation and improving overall ranking performance.
Result Highlights:
Figure T2.5.2 Illustration of model update for DBGD-DSP in a three dimensional space. Dashed lines represent the trajectory of DBGD following different update directions. ut is the selected direction by DBGD, which is in the 3-d space. Red bases present the document space St on a 2-d plane. ut is projected onto St to become gt for model update.
Later, we integrate regression trees to guide the learning of latent factor models for recommendation, and use the learnt tree structure to explain the resulting latent factors. Specifically, we build regression trees on users and items respectively with user-generated reviews, and associate a latent profile to each node on the trees to represent users and items. With the growth of regression tree, the latent factors are gradually refined under the regularization imposed by the tree structure. As a result, we are able to track the creation of latent profiles by looking into the path of each factor on regression trees, which thus serves as an explanation for the resulting recommendations.
Extensive experiments on two large collections of Amazon and Yelp reviews demonstrate the advantage of our model over several competitive baseline algorithms. Besides, our extensive user study also confirms the practical value of explainable recommendations generated by our model.
Result Highlights:
Result Highlights:
-
Publication:
- Derek Wu and Hongning Wang. ReviewMiner: An Aspect-based Review Analytics System. The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017), p1285-1288, 2017. (PDF)
- Demo system: ReviewMiner
Based on the proposed algorithm, we develop Hide-n-Seek, an intent-aware privacy protection plugin for personalized web search. In addition to users' genuine search queries, Hide-n-Seek submits k cover queries and corresponding clicks to an external search engine to disguise a user's search intent grounded and reinforced in a search session by mimicking the true query sequence. The cover queries are synthesized and randomly sampled from a topic hierarchy, where each node represents a coherent search topic estimated by both n-gram and neural language models constructed over crawled web documents. Hide-n-Seek also personalizes the returned search results by re-ranking them based on the genuine user profile developed and maintained on the client side. With a variety of graphical user interfaces, we present the topic-based query obfuscation mechanism to the end users for them to digest how their search privacy is protected.
Result Highlights:
-
Publication:
- Wasi Ahmad, Kai-Wei Chang and Hongning Wang. Intent-aware Query Obfuscation for Privacy Protection in Personalized Web Search. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), p285-294, 2018. (PDF)
- Puxuan Yu, Wasi Ahmad and Hongning Wang. Hide-n-Seek: An Intent-aware Privacy Protection Plugin for Personalized Web Search. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), demo track, p1333-1336, 2018. (PDF)
-
Implementations:
- IQP query obfuscation algorithm: a Python implementation of the algorithm can be found here.
- Hide-n-Seek: Chrome Extension
Research Dissemination
- Open Source Implementations
- Public Systems
- Data Sets
Description | Citation | Link |
---|---|---|
Hidden Topic Sentiment Model | bib | ReadMe, Java Version |
Commented Correspondence Topic Model | bib | ReadMe, Java Version |
Multi-task Linear Adaptation Model | bib | ReadMe, Java Version |
Clustered Linear Adaptation Model | bib | ReadMe, Java Version |
Collaborative Linear Bandit Model | bib | ReadMe, Python Version |
Hidden Factor Linear Bandit Model | bib | ReadMe, Python Version |
factorUCB Model | bib | ReadMe, Python Version |
r2Bandit Model | bib | ReadMe, Python Version |
FacT Model | bib | ReadMe, Python Version |
Dueling Bandit OL2R | bib | ReadMe, Python Version |
Description | Citation | Link |
---|---|---|
ReviewMiner System | bib | System Demo |
Amazon Review Mapping | bib | Chrome Extension |
Hide-n-Seek | bib | Chrome Extension |
Description | Citation | Link |
---|---|---|
NewEgg Reviews | bib | Readme, JSON |
Amazon Reviews | bib | Readme, JSON |
Manually Selected Prior and Seed Words for Amazon Reviews | bib | ZIP |
Multi-armed Bandit Evaluation Simulator | bib | ReadMe, Python Version |
Yahoo Frontpage, LastFM, and Delicious Evaluation Data Set | bib | ReadMe, Download |