Skip to content

CIS526 final project to identify foreign language context and to translate with respect to that context

Notifications You must be signed in to change notification settings

jtrovato/Parallel_Text_Extraction

Repository files navigation

For Students attempting the homework:

There are two python files that you will use to complete this assignment.

default.py takes in command line arguements specificy parallel wikipedia articels and outputs two filenames, the first contains english sentences from the input data and the second contains the corresponding aligned spanish sentences.

grade.py takes in two filenames, in the format mentioned above, adn output a score for the alignment based on hand aligned data.

These scripts can be used with pipes like so...

./default.py | ./grade.py

The data you will be using is contained in ./wikidata/es/ . In this directory, you find the original wikipedia articles that look like orig.enu.snt and the test file that have a ".test" appended to the file name. In this folder, there are also aligned sentence in files that begin with "pairs".

We have also included a file called dict.es which is a Spanish to English dictionary used for the default code.

For CIS526 Staff:

Everything mentioned for the students also applies to you. Also in this directory are python scipts for the baseline performance as well as 6 extensions that all offer improvement on the baseline implementation. If released as a homework the *.test files should be removed and be sed for calculating leaderboard positions. Similarly, the baseline and extension files should also be removed.

About

CIS526 final project to identify foreign language context and to translate with respect to that context

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •