﻿#
# This is a non-breaking prefix list for the Portuguese language.
# The file is used for sentence tokenization (text -> sentence splitting).
#
# The file was taken from Lingua::Sentence package:
#     http://search.cpan.org/~achimru/Lingua-Sentence-1.03/lib/Lingua/Sentence.pm
#

# File adapted for PT by H. Leal Fontes from the EN & DE versions published
# with moses-2009-04-13. Last update: 10.11.2009.

# Anything in this file, followed by a period (and an upper-case word), does NOT
# indicate an end-of-sentence marker.
# Special cases are included for prefixes that ONLY appear before 0-9 numbers.

# Any single upper case letter followed by a period is not a sentence ender
# (excluding I occasionally, but we leave it in).
# Usually upper case letters are initials in a name.
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

# Usually upper case letters are initials in a name (Portuguese alphabet)
Á
Â
Ã
À
Ç
É
Ê
Í
Ó
Ô
Õ
Ú

# English -- but these work globally for all languages
Mr
Mrs
No
pp
St
no
Sr
Jr
Bros
etc
vs
esp
Fig
fig
Jan
Feb
Mar
Apr
Jun
Jul
Aug
Sep
Sept
Oct
Okt
Nov
Dec
Ph.D
PhD
# in "et al."
al
cf
Inc
Ms
Gen
Sen
Prof
Dr
Corp
Co

#Roman Numerals
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
XVII
XVIII
XIX
XX
i
ii
iii
iv
v
vi
vii
viii
ix
x
xi
xii
xiii
xiv
xv
xvi
xvii
xviii
xix
xx

# List of titles
# These are often followed by upper-case names, but do not indicate sentence breaks.
Adj
Adm
Adv
Art
Ca
Capt
Cmdr
Col
Comdr
Con
Corp
Cpl
DR
DRA
Dr
Dra
Dras
Drs
Eng
Enga
Engas
Engos
Ex
Exo
Exmo
Fig
Gen
Hosp
Insp
Lda
MM
MR
MRS
MS
Maj
Mrs
Ms
Msgr
Op
Ord
Pfc
Ph
Prof
Pvt
Rep
Reps
Res
Rev
Rt
Sen
Sens
Sfc
Sgt
Sr
Sra
Sras
Srs
Sto
Supt
Surg
adj
adm
adv
art
cit
col
con
corp
cpl
dr
dra
dras
drs
eng
enga
engas
engos
ex
exo
exmo
fig
op
prof
sr
sra
sras
srs
sto

# Misc.
# Odd period-ending items that NEVER indicate breaks (p.m. does NOT fall into this
# category - it sometimes ends a sentence).
v
vs
i.e
rev
e.g

# Numbers only
# These should only induce breaks when followed by a numeric sequence.
# Add NUMERIC_ONLY after the word for this function.

# This case is mostly for the english "No." which can either be a sentence of its own, or
# if followed by a number, a non-breaking prefix.
No #NUMERIC_ONLY# 
Nos
Art #NUMERIC_ONLY#
Nr
p #NUMERIC_ONLY#
pp #NUMERIC_ONLY#

A
Av
a.C
a.C
A.D
a.D
a.m
AA
abr
abrev
acad
adj
adm
aer
agr
agric
Al
alf
álg
alm
alt
altit
alv
anat
ap.
apart.
arc
arcaic
arit
arqueol
arquit
art
arts
assem
assemb
assoc
astron
át
at.te 
atm
atte
aum
autom
B
bel
bibliogr
biofís
biogr
bioq
bot
bras
btl
C
C.-alm
C.G.S
cap
caps
Cel
cf
Cia
ciênc
círc
cit
clim
climatol
cód
col
cols
com
comp
compl
cons
consel
conselh
const
cont
cos
cp
créd
cronol
cx
D
d.C
D.C
DD
dec
demog.
demogr
Dep
dep
deps
des
desc
dic
dipl
doc
docs
Dr
Dra
Dras
Drs
dz
E
E.C
E.D
e.g
E.M
ed
edif
educ
EE
elem
eletr. 
eletrôn
Ema
Emb
emb
embriol
eng
enol
equit
Esc
esp
est
Est
etc
ex
Exa
Exmo
f
F
fac
farmac
fasc
fem
ff
fg
fig
fil
filat
filol
filos 
fís
fisiol
fl
fol
folcl
fols
fot
fr
Fr
fs
fs
G
G.M.T 
gal
gen
gên
genét
geom
gír
gr
gram
H
h.c
hab
hip
hist
histol
I
i.e
ib
id
Ilmo 
impr
índ
inf
inform
Ir
J
Jr
jur
just
K
l
L
lat
lb
leg
lég
legisl
légs
lit
liter
liv
livr
log
lóg
logar
long
Ltda
m
M
M.T.S 
Maj
maj
Mal
mat
Me
mec
med
méd
méd.vet
memo
memor
met 
metal. 
meteor
mit
mitol
Mlle 
MM
mme
mob
mod
Mons
morf
morfol
mun
mus
mús
n
N
N. 
N.E
N.O
N.S
N.Sra
N.T
nac
náut
num
núm
O
ob
obs
odont
odontol
of
ópt
org
organiz
oz 
p
P
P.B
P.D
p.ex
P.L
p.m
p.p
P.S
pág
págs
pal
pat
patol
pc
pç
pça
Pe
perf
pg
Ph.D
pl
poét
pol
polít
port
pp
proc
prod
Prof
prof
Profa
profa
Profas
profas
Profs
profs
pron. 
psic
psican
psicol
Q
Q.G
ql
quím
R
r.s.v.p 
Rdv
ref
reg
rel
Rel
relat
Relg
Remte. 
Rep
res
Ret
rev
Revmo
rg
Rod
Rtn
s
S
S.A
s.d
S.E
S.Ema
S.Emas
S.Exa
S.Exas
S.O
S.O.S
S.Revma
S.Revmas
S.Sa
S.Sas
S.W
sarg
sc
scs
sec
seç
séc
secr
sécs
seg
segs
semin. 
Símb
soc
Sociol
Sr
Sra
Sras
Srs
Srta 
ss
SS.AA
sta
sto
suc
t
T
tb
teat
tecn
técn
tecnol
tel
tele
Ten
ten
teol
terapêut
tes
tip
tipogr
tít
ton
topogr
trad
transp
Trav. 
trig
trigon
trim
tt
turism
U
u.e
un
univ
univers
us
v
V
v.-alm
V.A
V.Ema
V.Emas
V.Exa
V.Exas
v.g
V.M
V.Revma
V.Revmas
V.S
W
X
Y
Z

