LaText 

Text Mining based on Latent Variable Models


¢Æ ¿¬±¸ ¸ñÇ¥ ¢Æ

º» ¿¬±¸´Â °úÇÐ ±â¼úºÎÀÇ ³ú½Å°æÁ¤º¸Çлç¾÷ÀÇ ÀÏȯÀ¸·Î ÁøÇàµÇ°í ÀÖÀ¸¸ç Ã߷й×ÇнÀ±â¼ú ÆÀÀÇ ÃÖÁ¾ ¸ñÇ¥´Â Àΰ£ÀÇ ±â¾ï°ú ÇнÀ¿¡ °üÇÑ ÀÎÁö½Å°æ±âÀüÀÇ ¸ðµ¨À» °³¹ßÇϰí À̸¦ ¹ÙÅÁÀ¸·Î ³ôÀº Á¤È®µµ¿Í À¶Å뼺 ÀÖ´Â ½Å°æ¸Á ±â¹Ý Ãß·Ð ¹× ÇнÀ ±â¼úÀ» °³¹ßÇϰí À̸¦ °øÇÐÀûÀ¸·Î ÀÀ¿ëÇÑ ÀÀ¿ë½Ã½ºÅÛÀ» °³¹ßÇÏ´Â °ÍÀÌ´Ù.

º» ¿¬±¸ÆÀÀÌ ¼ÓÇÑ Á¤º¸Å½»öÆÀÀº Á¤º¸ ºÐ·ù, ¿©°ú, ÃßÃ⠵ °üÇÑ ÀÎÁö½É¸®ÇÐ ±â¹ÝÀÇ ±â°èÇнÀ ±â¼ú°ú À¥ ÄÁÅÙÃ÷ ¸¶ÀÌ´× ±â¼ú¿¡ ´ëÇØ ¿¬±¸Çϰí À̸¦ ½Å°æ¸Á ±â¹ÝÀÇ ´ë±Ô¸ð Á¤º¸ °Ë»ö½Ã½ºÅÛÀÇ °³¹ß¿¡ Ȱ¿ëÇÏ¿© ±Ã±ØÀûÀ¸·Î ´ë¿ë·® °í¼º´ÉÀÇ Á¤º¸°Ë»ö ½Ã½ºÅÛ Neuro-IR °³¹ßÀ» ¸ñÇ¥·Î ÇÑ´Ù.

.

 

¢Æ ¿¬±¸ ÃßÁø °èȹ ¹× ¹æ¹ý ¢Æ

 

¿¬±¸°³¹ß¸ñÇ¥

¿¬±¸°³¹ß ³»¿ë ¹× ¹üÀ§

1Â÷³âµµ

ÅØ½ºÆ®Á¤º¸ ºÐ¼®À» À§ÇÑ Àº´Ðº¯¼ö ½Å°æ¸Á ¸ðµ¨ °³¹ß

ÅØ½ºÆ® ¹®¼­ÀÇ ºÐ¼®/ºÐ·ù¸¦ À§ÇÑ Àº´Ð º¯¼ö ½Å°æ¸Á ¸ðµ¨ ¿¬±¸ (multiple-cause models, PLSA, LSA, NMF, ICA, HMM, etc)

Àº´Ðº¯¼ö ½Å°æ¸Á ±â¹ÝÀÇ ¹®¼­ Àε¦½Ì ±â¹ý ¿¬±¸

¹®¼­ÀÇ ÁÖÁ¦¾î ÃßÃâÀ» À§ÇÑ Àº´Ðº¯¼ö ½Å°æ¸Á ¸ðµ¨ÀÇ °³¹ß

´Ù¾çÇÑ À¥ ÄÁÅÙÃ÷ Á¤º¸ÀÇ ºÐ¼®, ºÐ·ù, ¿©°ú ¹æ¹ý ¿¬±¸

´Ù¾çÇÑ À¥ »çÀÌÆ®ÀÇ ÄÁÅÙÃ÷ Á¤º¸¿¡ ´ëÇÑ ºÐ¼® ¹æ¹ý ¿¬±¸

½Å°æ¸ÁÀ» ±â¹ÝÀ¸·Î À¥ ÄÁÅÙÃ÷ Á¤º¸¸¦ ºÐ¼®, ºÐ·ù, ¿©°úÇÒ ¼ö ÀÖ´Â ¹æ¹ý¿¡ °üÇÑ ¿¬±¸

Á¤º¸ºÐ·ù ½Ã½ºÅÛ Å½»ö ¹× ÀÎÁö½É¸®ÇÐ, ¼ö¸®½É¸®ÇÐÀû ¸ðÇü °³¹ß

ÀÎÁö½É¸®ÇÐÀû ½ÇÇèÀ» ÅëÇÑ Àΰ£ÀÇ Á¤º¸ ºÐ·ù¿Í ¹üÁÖÈ­¿¡ °üÇÑ ¿¬±¸

Àΰ£ÀÇ Á¤º¸ ºÐ·ùü°è¿¡ ´ëÇÑ ÇൿÀû/¼ö¸®Àû ¸ðÇüÀÇ °³¹ß

ÅØ½ºÆ® 󸮿¡ °íÀ¯ÇÑ ÀÎÁö±âÁ¦ ¿¬±¸

Á¤º¸ ºÐ·ù¿Í ¹üÁÖÈ­¿¡ ´ëÇÑ °³ÀÎÂ÷ ¿¬±¸

2Â÷³âµµ

Àº´Ðº¯¼ö ½Å°æ¸Á ÇнÀ ±â¹ÝÀÇ Á¤º¸ °Ë»ö ±â¼ú °³¹ß

Á¤º¸°Ë»ö ½Å°æ¸Á ¸ðµ¨ÀÇ ÀÚµ¿ÇнÀ ±â¹ý ¿¬±¸

´ë±Ô¸ð ÅØ½ºÆ® ¹®¼­ÀÇ ºÐ¼®, ºÐ·ù, ¿©°ú ±â¼ú °³¹ß

10 GB ¹®¼­ µ¥ÀÌÅÍ¿¡ ´ëÇÑ ±âº» ¼º´É Å×½ºÆ®

½Å°æ¸Á ±â¹ÝÀÇ À¥ÄÁÅÙÃ÷ Á¤º¸ ÃßÃâ ±â¼ú °³¹ß

»ç¿ëÀÚÀÇ ¿ä±¸ ȤÀº ¼ºÇâ¿¡ ¸Â°Ô ºÐ¼®µÈ À¥ ÄÁÅÙÃ÷ Á¤º¸¸¦ ÃßÃâÇÒ ¼ö ÀÖ´Â ±â¼ú¿¡ °üÇÑ ¿¬±¸

Àΰ£¿¡°Ô ÀûÇÕÇÑ ½Ã½ºÅÛÀÇ ±¸Ãà ¹æ½Ä°ú ±¸Ãà½Ã Á¦ÇÑÁ¡ ÇØ°á ¹æ¾È ¿¬±¸

Á¦¾ÈµÈ ¸ðÇüÀÇ ±¸Çö °¡´É¼º°ú ±¸Çö ±â¹ý ¿¬±¸

°³ÀÎÂ÷¸¦ ÀÌ¿ëÇÏ´Â ½Ã½ºÅÛÀÇ ±¸Çö ¹æ¹ý ¿¬±¸

3Â÷³âµµ

Àº´Ðº¯¼ö ½Å°æ¸Á ¸ðµ¨¿¡ ±â¹ÝÇÑ °í¼º´É Á¤º¸°Ë»ö ½Ã½ºÅÛ Neuro-IRÀÇ ±¸Çö ¹× Æò°¡

Àº´Ðº¯¼ö ½Å°æ¸Á ±â¹ÝÀÇ Text Mining ±â¼ú °³¹ß

Neuro-IR °³¹ß ¹× TREC ad-hoc retrieval¿¡¼­ »óÀ§ ±×·ì ´ëºñ 105% ¼º´É ´Þ¼º

100GB ¹®¼­¸¦ ´Ù·ç´Â ´º½º µµ¿ì¹Ì¿¡ ´ëÇÑ Neuro-IRÀÇ ¼º´É Æò°¡

µ¥ÀÌÅͺ£À̽º ±¸Ãà ¹× ´Ù¸¥°úÁ¦¿ÍÀÇ ½Ã½ºÅÛ ÅëÇÕ

Á¦Ç°Á¤º¸ µ¥ÀÌÅͺ£À̽º ±¸Ãà

µ¥ÀÌÅͺ£À̽º È¿¿ë¼º È®ÀÎ

´Ù¸¥°úÁ¦ÀÇ ½Ã½ºÅÛ°úÀÇ ÅëÇÕ

2´Ü°è¿¡¼­ÀÇ ½Ã½ºÅÛÀ» ½ÇÁ¦·Î ±¸ÇöÇÏ°í ±¸ÇöµÈ ½Ã½ºÅÛ¿¡ ´ëÇÑ Æò°¡

¸ðÇüÀÇ ±¸Çö ¹æ¹ý °³¹ß

°³¹ßµÈ ½Ã½ºÅÛ°ú ±âÁØ ´Ù¸¥ ¸ðÇü°úÀÇ ºñ±³ ¿¬±¸

°³ÀÎÂ÷ ÀÌ¿ë ½Ã½ºÅÛÀÇ ¼öÇà´É·Â¿¡ ´ëÇÑ ¿¬±¸

  

¢Æ Publications ¢Æ

  1. International Journal
    1. Word Sense Disambiguation by Learning Decision Trees from Unlabeled Data, Seong-Bae Park and Byoung-Tak Zhang, Applied Intelligence, vol. 19, pp. 27-38,  2003
    2. Genetic Mining of HTML Structures for Effective Web-Document Retrieval, Sun Kim and Byoung-Tak Zhang, Applied Intelligence, 18(3), pp. 243-256, 2003.
    3. Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering, Jeong-Ho Chang, Sung Wook Chi, and Byoung-Tak Zhang,  Genomics and Informatics, vol. 1, no. 1, pp. 34-40, 2003 (to appear)
    4. An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition, Yu-Seop Kim, Jeong-Ho Chang, and Byoung-Tak Zhang, Lecture Notes in Artificial Intelligence, vol. 2637, pp. 111-116, 2003
    5. Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information, Seong-Bae Park and Byoung-Tak Zhang, Lecture Notes in Artificial Intelligence, vol. 2637, pp. 88-99, 2003.
    6. A Bayesian Evolutionary Approach to the Design and Learning of Heterogeneous Neural Trees, Byoung-Tak Zhang, Integrated Computer-Aided Engineering, vol. 9, no. 1, pp. 73-86, 2002
    7. Topic Extraction from Text Documents using Mulitple-cause Networks, Jeong-Ho Chang, Jae Won Lee, Yuseop Kim, and Byoung-Tak Zhang,  Lecture Notes in Artificial Intelligence vol. 2417, pp. 434-443, 2002  
    8. Construction of Large-Scale Bayesian Networks by Local to Global Search, Kyu-Baek Hwang, Jae Won Lee, Seung-Woo Chung, and Byoung-Tak Zhang,  Lecture Notes in Artificial Intelligence vol. 2417, pp. 375-383, 2002  
    9. Target Word Selection using WordNet and Data-driven Models in Machine Translation, Yu-Seop Kim, Jeong-Ho Chang, and Byoung-Tak Zhang,  Lecture Notes in Artificial Intelligence vol. 2417, p. 607, 2002  
    10. Customer Data Mining and Visualization by Generative Topographic Mapping Methods, Jin-San Yang and Byoung-Tak Zhang, Data Mining and Knowledge Discovery,  2002 (submitted)
  2. Domestic Journal
    1. È¿À²Àû ±¸Á¶ ÇнÀ ¾Ë°í¸®Áò°ú µ¥ÀÌŸ Â÷¿ø Ãà¼Ò¸¦ ÅëÇÑ º£ÀÌÁö¾È¸Á ±â¹ÝÀÇ ¸¶ÀÌÅ©·Î¾î·¹ÀÌ µ¥ÀÌŸ ºÐ¼®¹ý, Ȳ±Ô¹é, ÀåÁ¤È£, À庴Ź, Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö: ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë, vol. 29, no. 11/12, 2002   
    2. Àڱⱸ¼º HMMÀ» ÀÌ¿ëÇÑ À¥¹®¼­ Á¤º¸ ÃßÃâ, ¾öÀçÈ«, À庴Ź, Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö: ¼ÒÇÁÆ®¿þ¾î ¹× ÀÀ¿ë, 2002 (submitted)  
  3. International Conference
    1. Classification of the Risk Types of Human Papilloma Virus by Decision Trees , Seong-Bae Park, Sohyun Hwang, and Byoung-Tak Zhang, The Fourth International Conference on Intelligent Data Engineering and Automated Learning (IDEAL03), 2003(accepted)
    2. Automatic Webpage Classification Enhanced by Unlabeled Data, Seong-Bae Park and Byoung-Tak Zhang, The Fourth International Conference on Intelligent Data Engineering and Automated Learning (IDEAL03), 2003(accepted)
    3. Analysis of Gene Expression Profiles and Drug Activity Patterns by Clustering and Bayesian Network Learning, Jeong-Ho Chang, Kyu-Baek Hwang, and Byoung-Tak Zhang, In Methods of Microarray Data Analysis II (Papers from CAMDA'01), Kluwer Academic Publishers, pp. 169-184, 2002
    4. A Boosted Maximum Entropy Model for Learning Text Chunking, Seong-Bae Park and Byoung-Tak Zhang, In Proceedings of 19th International Conference on Machine Learning (ICML'02), pp. 482-489, 2002
    5. Stock Trading System using Reinforcement Learning with Cooperative Agents, Jang-Min O, Jae Won Lee, and Byoung-Tak Zhang, In Proceedings of 19th International Conference on Machine Learning (ICML'02), pp. 451-458, 2002
    6. A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation, Yuseop Kim, Jeong-Ho Chang, and Byoung-Tak Zhang, Proceedings of the 19th International Conference on Computational Linguistics (COLING2002), vol. 1, pp. 453-459, 2002.  
    7. Concurrent Evolution of Neural Networks and Their Data Sets, Je-Gun Joung and Byoung-Tak Zhang, In Proceedings of 8th International Conference on Neural Information Processing (ICONIP'01), pp. 115-120, 2001.
  4. Domestic Conference
    1. Ç︧ȦÃ÷¸Ó½Å ÇнÀ ±â¹ÝÀÇ ÀÇ¹Ì Ä¿³ÎÀ» ÀÌ¿ëÇÑ ¹®¼­ À¯»çµµ ÃøÁ¤, ÀåÁ¤È£, ±èÀ¯¼·, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 440-442, 2003
    2. ¾Ó»óºí º£ÀÌÁö¾È¸Á¿¡ ÀÇÇÑ À¯ÀüÀÚ¹ßÇöµ¥ÀÌÅÍ ºÐ·ù, Ȳ±Ô¹é, ÀåÁ¤È£, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 434-436, 2003
    3. Á¤º¸º´¸ñ±â¹ý¿¡ ÀÇÇÑ À¯ÀüÀÚ ¹ßÇö µ¥ÀÌÅÍÀÇ ÀÌÁß Å¬·¯½ºÅ͸µ, ±èº´Èñ, Ȳ±Ô¹é, ÀåÁ¤È£, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 362-364, 2003
    4. ºñ¿ëÀÇÁ¸ÇнÀ¿¡ ÀÇÇÑ ÀÎÀ¯µÎÁ¾ ¹ÙÀÌ·¯½ºÀÇ ºÐ·ù, Ȳ¼ÒÇö, ¹Ú¼º¹è, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 401-403, 2003
    5. ½Ã³À½º ÀüÀ§È°µ¿¿¡ ±â¹ÝÇÑ ºÐÀڽŰæ¸Á, Á¤È£Áø, Á¶µ¿¿¬, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 416-418, 2003
    6. ÁøÈ­¿¬»êÀ» ÀÌ¿ëÇÑ ÀÚ¿¬¾î ÆÄ½Ì, ±èµ¿¹Î, ¹Ú¼º¹è, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 419-421, 2003
    7. ÃÖ´ë ¿£Æ®·ÎÇÇ ºÎ½ºÆÃ ¸ðµ¨À» ÀÌ¿ëÇÑ Ç°»ç ¸ðÈ£¼º ÇØ¼Ò, ¹Ú¼º¹è, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ Ãá°è Çмú ´ëȸ ³í¹®Áý(B), pp. 522-524, 2003
    8. °áÁ¤ Æ®¸®¿¡ ÀÇÇÑ ÀÎÀ¯µÎÁ¾ ¹ÙÀÌ·¯½ºÀÇ À§Ç豺 ºÐ·ù, Ȳ¼ÒÇö, ¹Ú¼º¹è, À庴Ź Çѱ¹ µ¥ÀÌÅ͸¶ÀÌ´× ÇÐȸ Ãß°èÇмú´ëȸ ³í¹®Áý, pp. 148-160, 2002
    9. Àº´Ðº¯¼ö¸ðµ¨À» ÀÌ¿ëÇÑ ¹®¼­ Ãßõ, ÀÌÁ¾¿ì, À庴Ź, Çѱ¹ Áö´ÉÁ¤º¸½Ã½ºÅÛÇÐȸ Ãß°è Çмú´ëȸ ³í¹®Áý, pp. 514-519, 2002
    10. Ãִ뿣Ʈ·ÎÇÇ ºÎ½ºÆÃ ¸ðµ¨À» ÀÌ¿ëÇÑ ÀüÄ¡»ç Á¢¼Ó ¸ðÈ£¼º ÇØ¼Ò, ¹Ú¼º¹è, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ °¡À» Çмú¹ßÇ¥ ³í¹®Áý (II), Á¦ 29±Ç 2È£, pp. 670-672, 2002
    11. °èÃþÀû ±ºÁýÈ­¸¦ ÅëÇÑ À̽ºÆ®(Yeast) ´Ü¹éÁúÀÇ °íÂ÷ »óÈ£ ÀÛ¿ë ÃßÃâ, ¾öÀçÈ«, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ °¡À» Çмú¹ßÇ¥ ³í¹®Áý (II), Á¦ 29±Ç 2È£, pp. 364-366, 2002
    12. Co-Trained Support Vector MachinesÀ» ÀÌ¿ëÇÑ ¹®¼­ºÐ·ù, ¹Ú¼º¹è, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ º½ Çмú¹ßÇ¥ ³í¹®Áý (B), Á¦ 29±Ç 1È£, pp. 259-261, 2002
    13. ÀáÀçÀṉ̀¸Á¶ ±â¹Ý ´Ü¾îÀ¯»çµµ¿¡ ÀÇÇÑ ¿ª¾î ¼±ÅÃ, ÀåÁ¤È£, ±èÀ¯¼·, À庴Ź, Çѱ¹Á¤º¸°úÇÐȸ º½ Çмú¹ßÇ¥ ³í¹®Áý (B), Á¦ 29±Ç 1È£, pp. 502-504, 2002
    14. S-HMMÀ» ÀÌ¿ëÇÑ ÅØ½ºÆ® Á¤º¸ÃßÃâ , ¾öÀçÈ«, À庴Ź, Çѱ¹ Á¤º¸°úÇÐȸ º½ Çмú¹ßÇ¥ ³í¹®Áý (B), Á¦ 29±Ç 1È£, pp. 328-330, 2002
    15. Latent variable model ±â¹Ý text learning¿¡ °üÇÑ ºñ±³ ¿¬±¸, ÀåÁ¤È£, À庴Ź, Çѱ¹ ³úÇÐȸ Çмú´ëȸ ³í¹®Áý,, pp. 120-121, 2002  


Project Title

LaText: Text Mining based on Latent Variable Models

Sponsor

Ministry of Science and Technology

Principal Investigator

Prof. Byoung-Tak Zhang

Researchers

Jong-Woo Lee Park

Jeong-Ho Chang

Jang-Min O

Kyu-Baek Hwang

Duration

August 2001 - May 2004

Cooperative Research Institute

Soongsil University,  Chonnam National University


Contact Jeong-Ho Chang
E-Mail jhchang@bi.snu.ac.kr
Phone +82-2-880-1847
Fax +82-2-875-2240


This page is maintained by Jeong-Ho Chang (jhchang@bi.snu.ac.kr).
Last Updates: April 28, 2003.