-
Notifications
You must be signed in to change notification settings - Fork 15
/
08 실습 - 표현(Representation) - 단어의 표현 (TF-IDF, nGram)
1 lines (1 loc) · 2.2 KB
/
08 실습 - 표현(Representation) - 단어의 표현 (TF-IDF, nGram)
1
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"08 실습 - 표현(Representation) - 단어의 표현 (TF-IDF, nGram)","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"I_TVhSXBJk2g"},"source":["# 단어의 표현 (Word Representation)\n","\n","\n","기계는 문자를 그대로 인식할 수 없기때문에 숫자로 변환\n","\n"]},{"cell_type":"markdown","metadata":{"id":"wwB1D_6MP8bg"},"source":["#1 TF-IDF를 활용한 단어 벡터"]},{"cell_type":"markdown","metadata":{"id":"3HGBaQ4bXuSo"},"source":["##1-1 직접 구현하기"]},{"cell_type":"markdown","metadata":{"id":"Y8lmTqCA9ZBs"},"source":["weighting schema|weight|설명\n","--|--|--\n","term frequency|<img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/91699003abf4fe8bdf861bbce08e73e71acf5fd4\" />|=토큰빈도/문서내토큰빈도\n","inverse document frequency|<img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/864fcfdc0c16344c11509f724f1aa7081cf9f657\" />|=log(총문서갯수/(토큰이 등장한 문서수))"]},{"cell_type":"code","metadata":{"id":"y56mwVir0L3a"},"source":["d1 = \"The cat sat on my face I hate a cat\"\n","d2 = \"The dog sat on my bed I love a dog\" "],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"eO1kEEmceE1P"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_fjur4eRRK_x"},"source":["## 1-2 sklearn 활용"]},{"cell_type":"code","metadata":{"id":"mp-oidh9QAEY"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"6LAITxYuQDL8"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"nOH8TC8DROUZ"},"source":["## 1-3 gensim 활용"]},{"cell_type":"code","metadata":{"id":"9YvqERDRRUMM"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"GhHpb-SDINae"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"0W5nqEpVRUas"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"J7fR-R8oRTly"},"source":["\n","\n","---\n","\n"]}]}