negspacy package

Submodules

negspacy.negation module

class negspacy.negation.Negex(nlp, language='en', ent_types=[], psuedo_negations=[], preceding_negations=[], following_negations=[], termination=[], chunk_prefix=[])

Bases: object

A spaCy pipeline component which identifies negated tokens in text.

Based on: NegEx - A Simple Algorithm for Identifying Negated Findings and Diseasesin Discharge Summaries

Chapman, Bridewell, Hanbury, Cooper, Buchanan

Parameters
  • nlp (object) – spaCy language object

  • ent_types (list) – list of entity types to negate

  • language (str) – language code, if using default termsets (e.g. “en” for english)

  • psuedo_negations (list) – list of phrases that cancel out a negation, if empty, defaults are used

  • preceding_negations (list) – negations that appear before an entity, if empty, defaults are used

  • following_negations (list) – negations that appear after an entity, if empty, defaults are used

  • termination (list) – phrases that “terminate” a sentence for processing purposes such as “but”. If empty, defaults are used

get_patterns()

returns phrase patterns used for various negation dictionaries

Returns

patterns – pattern_type: [patterns]

Return type

dict

negex(doc)

Negates entities of interest

Parameters

doc (object) – spaCy Doc object

process_negations(doc)

Find negations in doc and clean candidate negations to remove pseudo negations

Parameters

doc (object) – spaCy Doc object

Returns

  • preceding (list) – list of tuples for preceding negations

  • following (list) – list of tuples for following negations

  • terminating (list) – list of tuples of terminating phrases

termination_boundaries(doc, terminating)

Create sub sentences based on terminations found in text.

Parameters
  • doc (object) – spaCy Doc object

  • terminating (list) – list of tuples with (match_id, start, end)

Returns

boundaries – list of tuples with (start, end) of spans

Return type

list

negspacy.test module

Module contents