Skip to content

Commit

Permalink
get_identified_elements() will now always return pronouns
Browse files Browse the repository at this point in the history
  • Loading branch information
jftuga committed Jan 3, 2025
1 parent 265e4b4 commit 6700681
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 8 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,15 @@ Download the required spaCy model:
python -m spacy download en_core_web_trf
```

For debugging, by setting `config.debug=True`, you will also need [VeryPrettyTable](https://github.com/smeggingsmegger/):
```bash
pip install VeryPrettyTable
```

## Usage

### Command Line Interface

The package includes a command-line tool for quick de-identification of text files:

```bash
deidentify input_file [options]
# or:
python -m deidentification.deidentify input_file [options]
```

Expand All @@ -55,7 +52,7 @@ Options:
Example:
```bash
# De-identify a text file and save with HTML markup
python -m deidentification.deidentify input.txt -H -o output.html -r "[REDACTED]"
deidentify input.txt -H -o output.html -r "[REDACTED]"
```

### Python API Usage
Expand Down
6 changes: 5 additions & 1 deletion deidentification/deidentification.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,9 @@ def __init__(self, config: DeidentificationConfig = DeidentificationConfig()):
# this combines all self.all_persons lists from multiple passes of self._find_all_persons()
self.aggregate_persons: list[dict] = []

# this combines all self.all_pronouns lists from multiple loop iterations in self.deidentify()
self.aggregate_pronouns: list[dict] = []

self.all_pronouns: list[dict] = []
self.doc: Optional[Doc] = None
self.table_class = None
Expand Down Expand Up @@ -139,6 +142,7 @@ def deidentify(self, text: str) -> str:
self.__debug_log(f"deidentify(): next iter, persons={len(self.all_persons)}")
if persons_count == 0:
break
self.aggregate_pronouns.extend(self.all_pronouns)
self.all_pronouns = []
merged = self._merge_metadata()
replaced_text = self._replace_merged(replaced_text, merged)
Expand Down Expand Up @@ -167,7 +171,7 @@ def deidentify_with_wrapped_html(self, text: str, html_begin: str = HTML_BEGIN,
return buffer.getvalue()

def get_identified_elements(self) -> dict:
elements = {"message": self.replaced_text, "entities": self.aggregate_persons, "pronouns": self.all_pronouns}
elements = {"message": self.replaced_text, "entities": self.aggregate_persons, "pronouns": self.aggregate_pronouns}
return elements

def _find_all_persons(self) -> int:
Expand Down
2 changes: 1 addition & 1 deletion deidentification/deidentification_constants.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
pgmName = "deidentification"
pgmUrl = "https://github.com/jftuga/deidentification"
pgmVersion = "1.1.1"
pgmVersion = "1.1.2"

GENDER_PRONOUNS = {
"he": "HE/SHE",
Expand Down

0 comments on commit 6700681

Please # to comment.