The Tsinghua NLP (thunlp) Group devotes to make our NLP algorithms and methods available to everyone, which are expected to be used in Chinese NLP and Knowledge Graphs. These codes are produced by members at thunlp Lab, headed by Prof. Maosong Sun and a.Prof. Zhiyuan Liu.
- THUNRE: A package of Neural Relation Extraction. [Git]
Neural relation extraction aims to extract relations from plain text with neural models, which has been the state-of-the-art methods for relation extraction. In this package, we provide our implementations of CNN [Zeng et al., 2014] and PCNN [Zeng et al.,2015] and their extended version with sentence-level attention scheme [Lin et al., 2016].
- THUNSC: A package of Neural Sentiment Classification. [Git]
Neural Sentiment Classification aims to classify the sentiment in a document with neural models, which has been the state-of-the-art methods for sentiment classification. In this package, we provide our implementations of NSC, NSC+LA and NSC+UPA [Chen et al., 2016] in which user and product information is considered via attentions over different semantic levels.
- THUTAG: A package of Keyphrase Extraction and Social Tag Suggestion. [Git]
The package contains several keyphrase extraction methods including TextRank, ExpandRank, Topical PageRank and WAM, and social tag suggestion methods including KNN, PMI, TagLDA, TAM and WTM. The package has supported one of the most popular microblog apps, Weibo Keywords, which has got more than 3.5 million registered users.
- PLDA+: A package of Parallel LDA. [Git]
PLDA is a parallel C++ implementation of Latent Dirichlet Allocation (LDA). We present a highly optimized parallel implemention of the Gibbs sampling algorithm for the training/inference of LDA. The carefully designed architecture is expected to support extensions of this algorithm. PLDA+, an enhanced parallel implementation of LDA, can further improve scalability of LDA by signiﬁcantly reducing the unparallelizable communication bottleneck and achieve good load balancing.
Knowledge Representation Learning
- KB2E: A package of Knowledge Base to Embeddings. [Git]
The package contains state-of-the-art knowledge representation learning methods including TransE, TransH, TransR and PTransE.
- KR-EAR: Knowledge Representation Learning with Entities, Attributes and Relations. [Git]
This is the lab code of our IJCAI 2016 paper "Knowledge Representation Learning with Entities, Attributes and Relations".
- TKRL: Type-embodied Knowledge Representation Learning. [Git]
This is the lab code of our IJCAI 2016 paper "Representation Learning of Knowledge Graphs with Hierarchical Types". The method is expected to support knowledge representation learning with hierarchical types of entities.
- DKRL: Description-embodied Knowledge Representation Learning. [Git]
This is the lab code of our AAAI 2016 paper "Representation Learning of Knowledge Graphs with Entity Descriptions". The method is expected to support knowledge representation learning with entity descriptions.
Language Representation Learning
Last update: 23 Nov, 2016.
- MMDW: Max-Margin DeepWalk. [Git]
This is the lab code of our IJCAI 2016 paper "Max-Margin DeepWalk: Discriminative Learning of Network Representation. The method is expected to support discriminative network representation learning with node labels.
- CWE: Character Word Embeddings. [Git]
This is the lab code of our IJCAI 2015 paper "Joint Learning of Character and Word Embeddings". This method is expected to learn Chinese word embeddings by taking those characters within words into consideration. The analogical reasoning dataset on Chinese is available in data folder.
- CLWE: Cross-Lingual Word Embeddings. [Homepage]
This is the lab code of our ACL 2015 short paper "Learning Cross-lingual Word Embeddings via Matrix Co-factorization". This method is expected to learn cross-lingual word embeddings with a matrix co-factorization framework.
- OIWE: Online Interpretable Word Embeddings. [Git]
This is the lab code of our EMNLP 2015 short paper "Online Learning of Interpretable Word Embeddings". This method is expected to learn interpretable word embeddings based on OIWE-IPG model proposed in our paper.
- TADW: Text-Associated DeepWalk. [Git]
This is the lab code of our IJCAI 2015 paper "Network Representation Learning with Rich Text Information". The method is expected to support network representation learning with rich text information within each node. The code requires a 64-bit linux machine with MATLAB installed.
- TWE: Topical Word Embeddings. [Git]
This is the lab code of our AAAI 2015 paper "Topical Word Embeddings". The method is expected to perform representation learning of words with their topic assignments by latent topic models such as Latent Dirichlet Allocation.