![monolingual corpora monolingual corpora](https://aclanthology.org/thumb/2020.lrec-1.494.jpg)
Any word with fertility zero is deleted from the string, any word with fertility two is duplicated, etc. These assignments may be made stochastically according to a table n(ø|ei). Every English word in the string is first assigned a fertility (block 305). 3 is a flowchart describing, at a high level, such a stochastic process 300. 2 is shorthand for a hypothetical stochastic process by which an English string 200 gets converted into a French string 205. G06F40/49- Data-driven translation using very large corpora, e.g.
![monolingual corpora monolingual corpora](https://i.pinimg.com/originals/f8/d5/66/f8d566095741bc5e60100901bb613250.png)
G06F40/45- Example-based machine translation Alignment.G06F40/40- Processing or translation of natural language.G06F40/00- Handling natural language data.238000006011 modification reaction Methods 0.000 description 1.238000005309 stochastic process Methods 0.000 description 3.206010039207 Rocky mountain spotted fever Diseases 0.000 description 3.230000014616 translation Effects 0.000 claims abstract description 45.Assignors: UNIVERSITY OF SOUTHERN CALIFORNIA Application granted granted Critical Publication of US7340388B2 publication Critical patent/US7340388B2/en Status Active legal-status Critical Current Adjusted expiration legal-status Critical Links
#Monolingual corpora license#
Assignors: KNIGHT, KEVIN, MARCU, DANIEL, SORICUT, RADU Publication of US20030233222A1 publication Critical patent/US20030233222A1/en Assigned to DARPA reassignment DARPA CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.) Filing date Publication date Priority to US36807102P priority Critical Application filed by University of Southern California USC filed Critical University of Southern California USC Priority to US10/401,134 priority patent/US7340388B2/en Assigned to SOUTHERN CALIFORNIA, UNIVERSITY OF reassignment SOUTHERN CALIFORNIA, UNIVERSITY OF ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Original Assignee University of Southern California USC Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.) ( en Inventor Radu Soricut Daniel Marcu Kevin Knight Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.) Active, expires Application number US10/401,134 Other versions US20030233222A1
#Monolingual corpora pdf#
Google Patents Statistical translation using a large monolingual corpusĭownload PDF Info Publication number US7340388B2 US7340388B2 US10/401,134 US40113403A US7340388B2 US 7340388 B2 US7340388 B2 US 7340388B2 US 40113403 A US40113403 A US 40113403A US 7340388 B2 US7340388 B2 US 7340388B2 Authority US United States Prior art keywords alternate translations input text translations language corpus Prior art date Legal status (The legal status is an assumption and is not a legal conclusion. Google Patents US7340388B2 - Statistical translation using a large monolingual corpus US7340388B2 - Statistical translation using a large monolingual corpus