TagHelper Tools: Facilitating Reliable Content Analysis of Corpus Data

Funded in part by: ONR Cognitive and Neural Sciences division and NSF through the Pittsburgh Sciences of Learning Center
PI: Carolyn Rose
Students, Staff, and Collaborators Past and Present: Pinar Donmez, Gahgene Gweon, Jaime Arguello, Hao-Chuan Wang, Yue Cui, Mahesh Joshi, Yi-Chia Wang, Rohit Kumar, Sourish Chaudhuri, Edmund Huber, William Cohen, MyroslavaDzikovska, and Cammie Williams
Website: http://www.cs.cmu.edu/~cprose/TagHelper.html

There will be a free TagHelper tools all day tutorial on July 8, 2008 at Carnegie Mellon University. Email cprose@cs.cmu.edu if you are interested in signing up.

You can download TagHelper from the TagHelper Download Page, along with A Basic User's Manual and a ppt file with more info.

TagHelper supports analysis of corpora in English, German, Spanish, and Chinese, and has been used as an instructional tool in two university courses, namely Machine Learning in Practice and Computer Supported Collaborative Learning.

If you publish work using TagHelper tools, please cite the following paper:

Rose, C. P., Wang, Y.C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., Fischer, F. (In Press). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning , International Journal of Computer Supported Collaborative Learning

And let us know you're using it! Email cprose@cs.cmu.edu.

Email Carolyn Rose at cprose@cs.cmu.edu if you would be interested in taking an on-line course related to applied machine learning and text processing.

Building on the early success of the TagHelper project, an exciting development in the past year has been two successful evaluations of fully automatic adaptive collaborative learning support interventions. The purpose of these interventions is to "listen in" on student conversational behavior using text processing technology developed on the TagHelper project, decide based on that behavior when to intervene, and to offer support to make the learning experience more successful. In both studies, the automatic collaborative learning support lead to significant increases in learning gains in comparison to a no support control condition.

Women in CS is hosting a workshop related to TagHelper tools on Saturday, October 6.

There was a TagHelper Tools Tutorial at AI in Education 2007 . Please send email to Carolyn Rose at cprose@cs.cmu.edu to get more information (or report any difficulties you might have with the code). You can download my slides from the tutorial here .

There was a TagHelper tools all day tutorial on June 19 at Carnegie Mellon University. Here are the slides from lectures 1, 2, 3, and 4.

The goal of our research is to develop text classification technology to address concerns specific to classifying sentences using coding schemes developed for behavioral research. A wide range of behavioral researchers including social scientists, psychologists, learning scientists, and education researchers collect, code, and analyze large quantities of natural language corpus data as an important part of their research. A particular focus of our work is developing text classification technology that performs well on highly skewed data sets, which is an active area of machine learning research.

TagHelper is built on top of the Weka Toolkit.

TagHelper tools is licensed under the GNU GPL license.


Publications

Rose, C. P., Wang, Y.C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., Fischer, F. (In Press). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning , International Journal of Computer Supported Collaborative Learning

Wang, H. C., Kumar, R., Rose, C. P., Li, T., Chang, C. (2007). A Hybrid Ontology Directed Feedback Generation Algorithm for Supporting Creative Problem Solving Dialogues, Proceedings of IJCAI 07.

McLaren, B., Scheuer, O., De Laat, M., Hever, R., de Groot, R. & Rose, C. P. (to appear). Using Machine Learning Techniques to Analyze and Support Mediation of Student E-Discussions, Proceedings of AIED 2007.

Kumar, R., Rose, C. P., Wang, Y. C., Joshi, M., Robinson, A. (to appear). Tutorial Dialogue as Adaptive Collaborative Learning Support, Submitted to AIED 2007

Wang, H. C., Rose, C.P., Cui, Y., Chang, C. Y, Huang, C. C., Li, T. Y. (to appear). Thinking Hard Together: The Long and Short of Collaborative Idea Generation for Scientific Inquiry, Proceedings of CSCL 2007.

Wang, Y., Rose, C. P., Joshi, M., Fischer, F., Weinberger, A., Stegmann, K. (to appear). Context Based Classification for Automatic Collaborative Learning Process Analysis, Submitted to AIED 2007 (poster).

Wang, H. C. & Rose, C. P. (to appear). Supporting Collaborative Idea Generation: A Closer Look Using Statistical Process Analysis Techniques, Submitted to AIED 2007 (poster).

Wang, Y. C., Joshi, M., & Rose, C. P. (to appear). A Feature Based Approach for Leveraging Context for Classifying Newsgroup Style Discussion Segments, Proceedings of the Association for Computational Linguistics (poster).

Stegmann, K., Weinberger, A., Fischer, F., & Rose, C. P. (2006). Automatische Analyse naturlich-sprachlicher Daten aus Onlinediskussionen [Automatic corpus analysis of natural language data of online discussions]. Paper presented at the 68th Tagung der Arbeitsgruppe fuer Empirische Pdagogische Forschung (AEPF, Working Group for Empirical Educational Research ) Munich, Germany.

Gweon, G., Rose, C. P., Zaiss, Z., & Carey, R. (2006). Providing Support for Adaptive Scripting in an On-Line Collaborative Learning Environment, Proceedings of CHI 06: ACM conference on human factors in computer systems. New York: ACM Press.

Arguello, J., Buttler, B., Joyce, E., Kraut, R., Ling, K., Wang, X., Rose, C. (2006). Talk to Me: Foundations for Successful Individual-Group Interactions in Online Communities, Proceedings of CHI 06: ACM conference on human factors in computer systems. New York: ACM Press.

Arguello, J. and Rose, C. P. (2006). Museli: A Multi-source Evidence Integration Approach to Topic Segmentation of Spontaneous Dialogue, Proceedings of the North American Chapter of the Association for Computational Linguistics

Gweon, G., Rose, C. P., Wittwer, J., Nueckles, M. (2005). An Adaptive Interface that Facilitates Reliable Content Analysis of Corpus Data, Proceedings of Interact '05

Rose, C., Donmez, P., Gweon, G., Knight, A., Junker, B., Cohen, W., Koedinger, K., & Heffernan, N (2005). Automatic and Semi-Automatic Skill Coding with a View Towards Supporting On-Line Assessment, Proceedings of AI in Education '05.

Donmez, P., Rose, C. P., Stegmann, K., Weinberger, A., and Fischer, F. (2005). Supporting CSCL with Automatic Corpus Analysis Technology , to appear in the Proceedings of Computer Supported Collaborative Learning.

Rose, C. P., and VanLehn, K. (2005). An Evaluation of a Hybrid Language Understanding Approach for Robust Selection of Tutoring Goals, International Journal of AI in Education 15(4).