| |
rada mihalcea
|
downloads
[see also the research page for related information]
Various software modules and data sets that are/were used in my research. They are made available under the terms of GNU General Public License. Both data and software are distributed without any warranty.
For any questions regarding the content of this page, please contact Rada Mihalcea, rada at cs.unt.edu
GWSD: Unsupervised Graph-based Word Sense Disambiguatio
- GWSD is a system for unsupervised all-words graph-based word sense disambiguation download GWSD 1.0 (September 13, 2007).
- Ravi Sinha and Rada Mihalcea, Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity, In Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA, September 2007. [pdf]
- Rada Mihalcea, Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling, In Proceedings of the Joint Conference on Human Language Technology / Empirical Methods in Natural Language Processing (HLT/EMNLP), Vancouver, October, 2005. [pdf]
Affective Text: Data Annotated for Emotions and Polarity
- Affective Text is a data set consisting of 1000 test headlines and 200 development headlines, each of them annotated with the six Eckman emotions and the polarity orientation. [download] (July 13, 2007).
- Carlo Strapparava and Rada Mihalcea, SemEval-2007 Task 14: Affective Text, in Proceedings of the 4th International Workshop on the Semantic Evaluations (SemEval 2007), Prague, Czech Republic, June 2007. [pdf]
- Read more about the task here.
SenseLearner: All-Words Word Sense Disambiguation Tool
- SenseLearner 2.0 [download] (June 13, 2005).
- Changes in version 2.0: a client-server model that allows for significantly faster tagging; simpler input file format (the SemCor-like format is not anymore required)
- SenseLearner 1.0 (beta) [download] (Nov 18, 2004)
Benchmark for the evaluation of back-of-the-book indexing systems
- A benchmark for the evaluation of systems for back-of-the-book indexing [download].
The benchmark is described in:
Andras Csomai and Rada Mihalcea, Creating a Testbed for the Evaluation of Automatically Generated Back-of-the-
book Indexes, in Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CICLing), LNCS, Mex
ico City, February 2006. [pdf]
FrameNet - WordNet verb sense mapping
- FnWnVerbMap 1.0 [download]
A mapping between verb lexical units in FrameNet II and verb senses in WordNet. The mapping process is described in:
Lei Shi and Rada Mihalcea, Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing, Cicling 2005, Mexico [pdf]
Resources and Tools for Romanian NLP
- Romanian corpus of newspaper articles (and two novels), 50 mil. words. [research purpose only - send a request to rada at cs unt edu]
- Romanian sense tagged data, 39 ambiguous words [download]
- Romanian-English parallel texts, sentence-aligned, 1 mil. words (each side) [download; research purpose only - send a request to rada at cs unt edu]
- Romanian-English word aligned data (2003) [download]
- Romanian-English word aligned data (2005) [download]
- Romanian-English dictionary (38,000 entries) [download]
- For other resources and tools for Romanian, see the ConsiLR webpage.
Open Mind Word Expert Sense Tagged Data
- OMWE 1.0: Sense tagged data for 288 nouns, created within the Open Mind Word Expert framework during one year of activity (2002) [download]
- OMWE 2.0: Sense tagged data for nouns, verbs, adjectives, created within the Open Mind Word Expert framework. These data sets were used during the Senseval-3 evaluations.
- Romanian OMWE: Data for 39 ambiguous words in Romanian [download]
- English OMWE: Data for 57 ambiguous words, annotated with WordNet/Wordsmyth senses [download]
- English-Hindi OMWE: Data for 41 English words annotated with their corresponding Hindi translation [download]
TWA Sense Tagged Data
- Sense tagged data for six words with two-way ambiguities (bass, crane,
motion, palm, plant, tank). [download]
Resources for Word Alignment
- Word aligned data for Romanian-English, English-French.
- Parallel texts for training.
- Code for word alignment evaluation.
All these available from the webpage of the HLT/NAACL 2003 workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond.
SemCor
Texts semantically annotated with WordNet 1.6 senses (created at Princeton
University), and automatically mapped to WordNet 1.7, WordNet 1.7.1, WordNet 2.0
- SemCor 1.6 [download]
- SemCor 1.7 [download]
- SemCor 1.7.1 [download]
- SemCor 2.0 [download]
- SemCor 2.1 [download]
WordNet mappings
A mapping between synsets offsets in various WordNet versions.
- WordNet 1.6 - 1.7 [download]
- WordNet 1.6 - 1.7.1 [download]
- WordNet 1.7 - 1.7.1 [download]
- WordNet 1.6 - 2.0 [download]
- WordNet 1.7.1 - 2.0 [download]
Senseval-2 and Senseval-3 English all-words data converted into SemCor format
Text Filtering
- Evaluation software for text filtering systems, implements the normalized utility, F-measure, precision, and recall, as defined in the TREC 2002 Filtering task. Straightforward usage, follows closely the TREC 2002 Filtering guidelines. [download].
- More soon...
QA Data Set: Annotated questions
-
Annotations for about 5,500 questions used in an analysis of information requests. Questions are drawn from the Excite log, respectively the TREC QA benchmark. This is the data set used in the experiments reported in:
- Rada Mihalcea, The Semantic Wildcard, in Proceedings of the LREC 2002 Workshop on "Using Semantics for Information Retrieval and Filtering: State of the Art and Future Research", Las Palmas, Spain, May 2002.
|
|