Skip to content
View thammegowda's full-sized avatar
🐻
simple bare necessities!
🐻
simple bare necessities!

Highlights

  • Pro

Organizations

@apache @isi-nlp @microsoft @USCDataScience @MicrosoftCopilot

Block or report thammegowda

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
thammegowda/README.adoc
1632148
  • 🔭 I’m currently working on neural machine translation, imbalanced learning

  • 🌱 I’m currently learning …​

  • 👯 I’m looking to collaborate on …​

  • 🤔 I’m looking for help with …​

  • 💬 Ask me about …​ anything

  • 📫 How to reach me: @thammegowda

  • 😄 Pronouns: he/him/his

  • ⚡ Fun fact: …​

Tools

Research

Repo Description Status Note

PyMarian

Python bindings to Marian C++; pip install pymarian

Complete✅

Paper; PyPI

BotEval

Facilitating human evaluation of chatbots; pip install boteval

Complete✅

Paper @ ACL2024 Demos; Demos; PyPI

Cometoid

Distilling strong reference based metrics into stronger reference-less metrics

Complete✅

Paper @ WMT2023; Models on Huggingface

sotastream

A streaming approach to machine translation training. pip install sotastream

Complete✅

Paper @ NLP OSS 2023 ; PyPI

016-many-eng-v2

Many-to-English (v2)

Complete✅

015-nmt-ablation

Transformer ablation, showing that model can work without encoder.

Complete✅

014-udhr-dataset

Parallel sentence alignment from Universal Declaration of Human Rights corpus

WIP/Incomplete◒

013-nmt-codeswitching

Done

Complete✅

Paper

012-macrobert

Macro sampling in BERT

Didn’t work❌

Maybe we should revisit

011-imb-learn

Imbalanced machine learning: case studies in image recognition, text classification, and machine translation

Incomplete◒

Docs

010-hyperparam-theory

A theory on hyperparameter

Incomplete◒

Book idea! Needs more time. 🕙

009-nmt-toolkits

A survey of NMT toolkits

Incomplete◒

Lost interest

008-asr-eval-macro

Macro-averaged evaluation for automatic speech recognition

Incomplete◒

(Some positive results, but needs more evidence)

007-mt-eval-macro

Macro Average: Rare Types are Important Too

Complete✅

NAACL 2021

006-many-to-eng

Many-to-English machine translation tools, data, and pretrained models

Complete✅

ACL 2021 Demos. Demo page

005-nmt-imbalance

Finding the optimal vocabulary size for neural machine translation

Complete ✅

EMNLP 2020 Findings

005-nmt-imbalance-old

Neural machine translation with imbalanced classes

Complete ✅

Rejected from *ACL; Arxiv link

004-nmt-learning-curve

NMT learning curve revisited.

Complete ✅

Not published

image-forensics-MFSec17

An Approach for Automatic and Large Scale Image Forensics

Complete ✅

MFSec 2017

autoextractor

Clustering webpages based on structure and style similarity

Complete✅

IEEE IRI 2016

Pinned Loading

  1. mtdata mtdata Public

    A tool that locates, downloads, and extracts machine translation corpora

    Python 147 22

  2. USCDataScience/sparkler USCDataScience/sparkler Public

    Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

    Java 410 141

  3. isi-nlp/rtg isi-nlp/rtg Public

    Reader Translator Generator - NMT toolkit based on pytorch

    Jupyter Notebook 30 6

  4. isi-nlp/nlcodec isi-nlp/nlcodec Public

    Natural Language EnCoder-Decoder: word, char, bpe etc

    Python 5 2

  5. tensorflow-grpc-java tensorflow-grpc-java Public

    Tensorflow grpc java client for image recognition serving inception model

    Java 39 15

  6. tika-ner-corenlp tika-ner-corenlp Public

    Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser

    Java 13 6