Package 'EndoMineR'

Title: Functions to mine endoscopic and associated pathology datasets
Description: This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Authors: Sebastian Zeki [aut, cre]
Maintainer: Sebastian Zeki <[email protected]>
License: GPL-3
Version: 2.0.1.9000
Built: 2024-09-02 10:18:26 UTC
Source: https://github.com/ropensci/EndoMineR

Help Index


Determine the Follow up group

Description

This determines the follow up rule a patient should fit in to (according to the British Society for Gastroenterology guidance on Barrett's oesophagus) Specfically it combines the presence of intestinal metaplasia with Prague score so the follow-up group can be determined. It relies on the presence of a Prague score. It should be run after Barretts_PathStage which looks for the worst stage of a specimen and which will determine the presence or absence of intestinal metaplasia if the sample is non-dysplastic. Because reports often do not record a Prague score a more pragmatic approach as been to assess the M stage and if this is not present then to use the C stage extrapolated using the Barretts_Prague function

Usage

Barretts_FUType(dataframe, CStage, MStage, IMorNoIM)

Arguments

dataframe

the dataframe(which has to have been processed by the Barretts_PathStage function first to get IMorNoIM and the Barretts_PragueScore to get the C and M stage if available),

CStage

CStage column

MStage

MStage column

IMorNoIM

IMorNoIM column

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsAll(), BarrettsBxQual(), BarrettsParisEMR(), Barretts_PathStage(), Barretts_PragueScore()

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
v <- Barretts_PragueScore(v, "Findings")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
v$FU_Type <- Barretts_FUType(v, "CStage", "MStage", "IMorNoIM")
rm(v)

Get the worst pathological stage for Barrett's

Description

This extracts the pathological stage from the histopathology specimen. It is done using 'degradation' so that it will look for the worst overall grade in the histology specimen and if not found it will look for the next worst and so on. It looks per report not per biopsy (it is more common for histopathology reports to contain the worst overall grade rather than individual biopsy grades). Specfically it extracts the histopathology worst grade within the specimen FOr the sake of accuracy this should alwats be used after the HistolDx function and this removes negative sentences such as 'there is no dysplasia'. This current function should be used on the column derived from HistolDx which is called Dx_Simplified

Usage

Barretts_PathStage(dataframe, PathColumn)

Arguments

dataframe

dataframe with column of interest

PathColumn

column of interest

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsAll(), BarrettsBxQual(), BarrettsParisEMR(), Barretts_FUType(), Barretts_PragueScore()

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# The function then takes the Histology column from the merged data set (v).
# It extracts the worst histological grade for a specimen
b <- Barretts_PathStage(Mypath, "Histology")
rm(v)

Extract the Prague score

Description

The aim is to extract a C and M stage (Prague score) for Barrett's samples. This is done using a regex where C and M stages are explicitly mentioned in the free text Specfically it extracts the Prague score

Usage

Barretts_PragueScore(dataframe, EndoReportColumn, EndoReportColumn2)

Arguments

dataframe

dataframe with column of interest

EndoReportColumn

column of interest

EndoReportColumn2

second column of interest

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsAll(), BarrettsBxQual(), BarrettsParisEMR(), Barretts_FUType(), Barretts_PathStage()

Examples

# The example takes the endoscopy demo dataset and searches the
# Findings column (which contains endoscopy free text about the
# procedure itself). It then extracts the Prague score if relevant. I
# find it easiest to use this on a Barrett's subset of data rather than
# a dump of all endoscopies but of course this is a permissible dataset
# too


aa <- Barretts_PragueScore(Myendo, "Findings", "OGDReportWhole")

Run all the basic Barrett's functions

Description

Function to encapsulate all the Barrett's functions together. This includes the Prague score and the worst pathological grade and then feeds both of these things into the follow up function. The output is a dataframe with all the original data as well as the new columns that have been created.

Usage

BarrettsAll(
  Endodataframe,
  EndoReportColumn,
  EndoReportColumn2,
  Pathdataframe,
  PathColumn
)

Arguments

Endodataframe

endoscopy dataframe of interest

EndoReportColumn

Endoscopy report field of interest as a string vector

EndoReportColumn2

Second endoscopy report field of interest as a string vector

Pathdataframe

pathology dataframe of interest

PathColumn

Pathology report field of interest as a string vector

Value

Newdf

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsBxQual(), BarrettsParisEMR(), Barretts_FUType(), Barretts_PathStage(), Barretts_PragueScore()

Examples

Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")

Get the number of Barrett's biopsies taken

Description

This function gets the number of biopsies taken per endoscopy and compares it to the Prague score for that endoscopy.Endoscopists should be taking a certain number of biopsies given the length of a Barrett's segment so it should be straightforward to detect a shortfall in the number of biopsies being taken. The output is the shortfall per endoscopist

Usage

BarrettsBxQual(dataframe, Endo_ResultPerformed, PatientID, Endoscopist)

Arguments

dataframe

dataframe

Endo_ResultPerformed

Date of the Endoscopy

PatientID

Patient's unique identifier

Endoscopist

name of the column with the Endoscopist names

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsAll(), BarrettsParisEMR(), Barretts_FUType(), Barretts_PathStage(), Barretts_PragueScore()

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath$BxSize <- HistolBxSize(Mypath$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", Mypath, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$PathStage <- Barretts_PathStage(b1, "Histology")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
b1$FU_Type <- Barretts_FUType(b1, "CStage", "MStage", "PathStage")


colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"
# The number of average number of biopsies is then calculated and
# compared to the average Prague C score so that those who are taking
# too few biopsies can be determined
hh <- BarrettsBxQual(
  b1, "Date.x", "HospitalNumber",
  "Endoscopist"
)
rm(v)

Run the Paris classification versus worst histopath grade for Barrett's

Description

This creates a column of Paris grade for all samples where this is mentioned.

Usage

BarrettsParisEMR(Column, Column2)

Arguments

Column

Endoscopy report field of interest as a string vector

Column2

Another endoscopy report field of interest as a string vector

Value

a string vector

See Also

Other Disease Specific Analysis - Barretts Data: BarrettsAll(), BarrettsBxQual(), Barretts_FUType(), Barretts_PathStage(), Barretts_PragueScore()

Examples

# 
Myendo$EMR<-BarrettsParisEMR(Myendo$ProcedurePerformed,Myendo$Findings)

Index biopsy locations

Description

This function returns all the conversions from common version of events to a standardised event list, much like the Location standardidastion function This does not include EMR as this is extracted from the pathology so is part of pathology type. It is used for automated OPCS-4 coding.

Usage

BiopsyIndex()

See Also

Other NLP - Lexicons: EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()


Group anything by Endoscopist and returns the table

Description

This creates a proportion table for categorical variables by endoscopist It of course relies on a Endoscopist column being present

Usage

CategoricalByEndoscopist(ProportionColumn, EndoscopistColumn)

Arguments

ProportionColumn

The column (categorical data) of interest

EndoscopistColumn

The endoscopist column

See Also

Other Grouping by endoscopist: MetricByEndoscopist()

Examples

# The function plots any numeric metric by endoscopist
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
colnames(v)[colnames(v) == "pHospitalNum"] <- "HospitalNumber"
# The function takes the column with the extracted worst grade of
# histopathology and returns the proportion of each finding (ie
# proportion with low grade dysplasia, high grade etc.) for each
# endoscopist
kk <- CategoricalByEndoscopist(v$IMorNoIM, v$Endoscopist)
rm(Myendo)

Fake Lower GI Endoscopy Set

Description

A dataset containing fake lower GI endoscopy reports. The report field is provided as a whole report without any fields having been already extracted

Usage

ColonFinal

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole

The whole report, in text


Tidy up messy columns

Description

This does a general clean up of whitespace, semi-colons,full stops at the start of lines and converts end sentence full stops to new lines.

Usage

ColumnCleanUp(vector)

Arguments

vector

column of interest

Value

This returns a character vector

See Also

Other NLP - Text Cleaning and Extraction: DictionaryInPlaceReplace(), Extractor(), NegativeRemoveWrapper(), NegativeRemove(), textPrep()

Examples

ii<-ColumnCleanUp(Myendo$Findings)

OPCS-4 Coding

Description

This function extracts the OPCS-4 codes for all Barrett's procedures It should take the OPCS-4 from the EVENT and perhaps also using extent depending on how the coding is done. The EVENT column will need to extract multiple findings The hope is that the OPCS-4 column will then map from the EVENT column. This returns a nested list column with the procedure, furthest path site and event performed

Usage

dev_ExtrapolateOPCS4Prep(dataframe, Procedure, PathSite, Event, extentofexam)

Arguments

dataframe

the dataframe

Procedure

The Procedure column

PathSite

The column containing the Pathology site

Event

the EVENT column

extentofexam

the furthest point reached in the examination

Examples

# Need to run the HistolTypeSite and EndoscopyEvent functions first here
# SelfOGD_Dunn$OPCS4w<-ExtrapolateOPCS4Prep(SelfOGD_Dunn,"PROCEDUREPERFORMED",
# "PathSite","EndoscopyEvent")

Dictionary In Place Replace

Description

This maps terms in the text and replaces them with the standardised term (mapped in the lexicon file) within the text. It is used within the textPrep function.

Usage

DictionaryInPlaceReplace(inputString, list)

Arguments

inputString

the input string (ie the full medical report)

list

The replacing list

Value

This returns a character vector

See Also

Other NLP - Text Cleaning and Extraction: ColumnCleanUp(), Extractor(), NegativeRemoveWrapper(), NegativeRemove(), textPrep()

Examples

inputText<-DictionaryInPlaceReplace(TheOGDReportFinal$OGDReportWhole,LocationList())

Basic graph creation using the template specified in theme_Publication.

Description

This creates a basic graph using the template specified in theme_Publication. It takes a numeric column and plots it against any non-numeric x axis in a ggplot

Usage

EndoBasicGraph(dataframe, xdata, number)

Arguments

dataframe

dataframe

xdata

The x column

number

The numeric column

Value

Myplot This is the final plot

Myplot

See Also

Other Data Presentation helpers: scale_colour_Publication(), scale_fill_Publication(), theme_Publication()

Examples

# This function plots numeric y vs non-numeric x
# Get some numeric columns e.g. number of biopsies and size
Mypath$Size <- HistolBxSize(Mypath$Macroscopicdescription)
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath2 <- Mypath[, c("NumBx", "Size")]
EndoBasicGraph(Mypath, "Size", "NumBx")

Merge endoscopy and histology data.

Description

This takes the endoscopy dataset date performed and the hospital number column and merges with the equivalent column in the pathology dataset. This is merged within a 7 day time frame as pathology is often reported after endoscopic

Usage

Endomerge2(x, EndoDate, EndoHospNumber, y, PathDate, PathHospNumber)

Arguments

x

Endoscopy dataframe

EndoDate

The date the endoscopy was performed

EndoHospNumber

The unique hospital number in the endoscopy dataset

y

Histopathology dataframe

PathDate

The date the endoscopy was performed

PathHospNumber

The unique hospital number in the endoscopy dataset

Examples

v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber",
  Mypath, "Dateofprocedure", "HospitalNumber"
)

EndoMineR: A package for analysis of endoscopic and related pathology

Description

The goal of EndoMineR is to extract as much information as possible from endoscopy reports and their associated pathology specimens. The package is intended for use by gastroenterologists, pathologists and anyone interested in the analysis of endoscopic and ppathological datasets Gastroenterology now has many standards against which practice is measured although many reporting systems do not include the reporting capability to give anything more than basic analysis. Much of the data is locked in semi-structured text.However the nature of semi-structured text means that data can be extracted in a standardised way- it just requires more manipulation. This package provides that manipulation so that complex endoscopic-pathological analyses, in line with recognised standards for these analyses, can be done.The package is basically in three parts/

Details

  • The extraction- This is really when the data is provided as full text reports. You may already have the data in a spreadsheet in which case this part isn't necessary.

  • Cleaning- These are a group of functions that allow the user to extract and clean data commonly found in endoscopic and pathology reports. The cleaning functions usually remove common typos or extraneous information and do some reformatting.

  • Analyses- The analyses provide graphing function as well as analyses according to the cornerstone questions in gastroenterology- namely surveillance, patient tracking, quality of endoscopy and pathology reporting and diagnostic yield questions.

To learn more about EndoMineR, start with the vignettes: 'browseVignettes(package = "EndoMineR")'


Paste endoscopy and histology results into one

Description

As spreadsheets are likely to be submitted with pre-segregated data as appears from endoscopy software output, these should be remerged prior to cleaning. This function takes the column headers and places it before each text so that the original full text is recreated. It will use the column headers as the delimiter. This should be used before textPrep as the textPrep function takes a character vector (ie the whole report and not a segregated one) only

Usage

EndoPaste(x)

Arguments

x

the dataframe

Value

This returns a list with a dataframe containing one column of the merged text and a character vector which is the delimiter list for when the textPrep function is used

Examples

testList<-structure(list(PatientName = c("Tom Hardy", "Elma Fudd", "Bingo Man"
), HospitalNumber = c("H55435", "Y3425345", "Z343424"), Text = c("All bad. Not good", 
"Serious issues", "from a land far away")), class = "data.frame", row.names = c(NA, -3L))
EndoPaste(testList)

Clean endoscopist column

Description

If an endoscopist column is part of the dataset once the extractor function has been used this cleans the endoscopist column from the report. It gets rid of titles It gets rid of common entries that are not needed. It should be used after the textPrep function

Usage

EndoscEndoscopist(EndoscopistColumn)

Arguments

EndoscopistColumn

The endoscopy text column

Value

This returns a character vector

See Also

Other Endoscopy specific cleaning functions: EndoscInstrument(), EndoscMeds(), EndoscopyEvent()

Examples

Myendo$Endoscopist <- EndoscEndoscopist(Myendo$Endoscopist)

Clean instrument column

Description

This cleans the Instument column from the report assuming such a column exists (where instrument usually refers to the endoscope number being used.) It gets rid of common entries that are not needed. It should be used after the textPrep function. Note this is possibly going to be deprecated in the next version as the endoscope coding used here is not widely used.

Usage

EndoscInstrument(EndoInstrument)

Arguments

EndoInstrument

column of interest

Value

This returns a character vector

See Also

Other Endoscopy specific cleaning functions: EndoscEndoscopist(), EndoscMeds(), EndoscopyEvent()

Examples

Myendo$Instrument <- EndoscInstrument(Myendo$Instrument)

Clean medication column

Description

This cleans medication column from the report assuming such a column exists. It gets rid of common entries that are not needed. It also splits the medication into fentanyl and midazolam numeric doses for use. It should be used after the textPrep function.

Usage

EndoscMeds(MedColumn)

Arguments

MedColumn

column of interest as a string vector

Value

This returns a dataframe

See Also

Other Endoscopy specific cleaning functions: EndoscEndoscopist(), EndoscInstrument(), EndoscopyEvent()

Examples

MyendoNew <- cbind(EndoscMeds(Myendo$Medications), Myendo)

Extract the endoscopic event.

Description

This extracts the endoscopic event. It looks for the event term and then looks in the event sentence as well as the one above to see if the location is listed. It only looks within the endoscopy fields. If tissue is taken then this will be extracted with the HistolTypeAndSite function rather than being listed as a result as this is cleaner and more robust.

Usage

EndoscopyEvent(dataframe, EventColumn1, Procedure, Macroscopic, Histology)

Arguments

dataframe

datafrane of interest

EventColumn1

The relevant endoscopt free text column describing the findings

Procedure

Column saying which procedure was performed

Macroscopic

Column describing all the macroscopic specimens

Histology

Column with free text histology (usually microscopic histology)

Value

This returns a character vector

See Also

Other Endoscopy specific cleaning functions: EndoscEndoscopist(), EndoscInstrument(), EndoscMeds()

Examples

# Myendo$EndoscopyEvent<-EndoscopyEvent(Myendo,"Findings",
# "ProcedurePerformed","MACROSCOPICALDESCRIPTION","HISTOLOGY")

See if words from two lists co-exist within a sentence

Description

See if words from two lists co-exist within a sentence. Eg site and tissue type. This function only looks in one sentence for the two terms. If you suspect the terms may occur in adjacent sentences then use the EntityPairs_TwoSentence function.

Usage

EntityPairs_OneSentence(inputText, list1, list2)

Arguments

inputText

The relevant pathology text column

list1

First list to refer to

list2

The second list to look for

See Also

Other Basic Column mutators: EntityPairs_TwoSentence(), ExtrapolatefromDictionary(), ListLookup(), MyImgLibrary()

Examples

# tbb<-EntityPairs_OneSentence(Mypath$Histology,HistolType(),LocationList())

Look for relationships between site and event

Description

This is used to look for relationships between site and event especially for endoscopy events where sentences such as 'The stomach polyp was large. It was removed with a snare' ie the therapy and the site are in two different locations.

Usage

EntityPairs_TwoSentence(inputString, list1, list2)

Arguments

inputString

The relevant pathology text column

list1

The intial list to assess

list2

The other list to look for

See Also

Other Basic Column mutators: EntityPairs_OneSentence(), ExtrapolatefromDictionary(), ListLookup(), MyImgLibrary()

Examples

# tbb<-EntityPairs_TwoSentence(Myendo$Findings,EventList(),HistolType())

Extract the Prague score

Description

The aim is to extract a C and M stage (Prague score) for Barrett's samples. This is done using a regex where C and M stages are explicitly mentioned in the free text Specfically it extracts the Prague score

Usage

Eosinophilics(dataframe, findings, histol, IndicationsFroExamination)

Arguments

dataframe

dataframe with column of interest

findings

column of interest

histol

second column of interest

IndicationsFroExamination

second column of interest

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)


aa <- Eosinophilics(v, "Findings", "Histology","Indications")

Use list of endoscopic events and procedures

Description

This function returns all the conversions from common version of events to a standardised event list, much like the Location standardisation function This does not include EMR as this is extracted from the pathology so is part of pathology type.

Usage

EventList()

See Also

Other NLP - Lexicons: BiopsyIndex(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()

Examples

# unique(unlist(EventList(), use.names = FALSE))

Extract columns from the raw text

Description

This is the main extractor for the Endoscopy and Histology report. This relies on the user creating a list of words representing the subheadings. The list is then fed to the Extractor so that it acts as the beginning and the end of the regex used to split the text. Whatever has been specified in the list is used as a column header. Column headers don't tolerate special characters like : or ? and / and don't allow numbers as the start character so these have to be dealt with in the text before processing

Usage

Extractor(inputString, delim)

Arguments

inputString

the column to extract from

delim

the vector of words that will be used as the boundaries to extract against

See Also

Other NLP - Text Cleaning and Extraction: ColumnCleanUp(), DictionaryInPlaceReplace(), NegativeRemoveWrapper(), NegativeRemove(), textPrep()

Examples

# As column names cant start with a number, one of the dividing
# words has to be converted
# A list of dividing words (which will also act as column names)
# is then constructed
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
Mypath2<-Extractor(PathDataFrameFinal$PathReportWhole,mywords)

Extrapolate from Dictionary

Description

Provides term mapping and extraction in one. Standardises any term according to a mapping lexicon provided and then extracts the term. This is different to the DictionaryInPlaceReplace in that it provides a new column with the extracted terms as opposed to changing it in place

Usage

ExtrapolatefromDictionary(inputString, list)

Arguments

inputString

The text string to process

list

of words to iterate through

See Also

Other Basic Column mutators: EntityPairs_OneSentence(), EntityPairs_TwoSentence(), ListLookup(), MyImgLibrary()

Examples

#Firstly we extract histology from the raw report
# The function then standardises the histology terms through a series of
# regular expressions and then extracts the type of tissue 
Mypath$Tissue<-suppressWarnings(
suppressMessages(
ExtrapolatefromDictionary(Mypath$Histology,HistolType()
)
)
)
rm(MypathExtraction)

Index of GI symptoms

Description

This function returns all the common GI symptoms. They are simply listed as is without grouping or mapping. They have been derived from a manual list with synonyms derived from the UMLS Methatharus using the browser.

Usage

GISymptomsList()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()


Create GRS metrics by endoscopist (X-ref with pathology)

Description

This extracts the polyps types from the data (for colonoscopy and flexible sigmoidosscopy data) and outputs the adenoma,adenocarcinoma and hyperplastic detection rate by endoscopist as well as overall number of colonoscopies. This will be extended to other GRS outputs in the future.

Usage

GRS_Type_Assess_By_Unit(dataframe, ProcPerformed, Endo_Endoscopist, Dx, Histol)

Arguments

dataframe

The dataframe

ProcPerformed

The column containing the Procedure type performed

Endo_Endoscopist

column containing the Endoscopist name

Dx

The column with the Histological diagnosis

Histol

The column with the Histology text in it

Examples

nn <- GRS_Type_Assess_By_Unit(
  vColon, "ProcedurePerformed",
  "Endoscopist", "Diagnosis", "Original.y"
)

Determine the largest biopsy size from the histology report

Description

This extracts the biopsy size from the report. If there are multiple biopsies it will extract the overall size of each one (size is calculated usually in cubic mm from the three dimensions provided). This will result in row duplication.

Usage

HistolBxSize(MacroColumn)

Arguments

MacroColumn

Macdescrip

Details

This is usually from the Macroscopic description column.

See Also

Other Histology specific cleaning functions: HistolNumbOfBx(), HistolTypeAndSite()

Examples

rr <- HistolBxSize(Mypath$Macroscopicdescription)

Extract the number of biopsies taken from the histology report

Description

This extracts the number of biopsies taken from the pathology report. This is usually from the Macroscopic description column. It collects everything from the regex [0-9]1,2.0,3 to whatever the string boundary is (z).

Usage

HistolNumbOfBx(inputString, regString)

Arguments

inputString

The input text to process

regString

The keyword to remove and to stop at in the regex

See Also

Other Histology specific cleaning functions: HistolBxSize(), HistolTypeAndSite()

Examples

qq <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")

Use list of pathology types

Description

This standardizes terms to describe the pathology tissue type being examined

Usage

HistolType()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()


Extract the site a specimen was removed from as well as the type

Description

This needs some blurb to be written. Used in the OPCS4 coding

Usage

HistolTypeAndSite(inputString1, inputString2, procedureString)

Arguments

inputString1

The first column to look in

inputString2

The second column to look in

procedureString

The column with the procedure in it

Value

a list with two columns, one is the type and site and the other is the index to be used for OPCS4 coding later if needed.

See Also

Other Histology specific cleaning functions: HistolBxSize(), HistolNumbOfBx()

Examples

Myendo2<-Endomerge2(Myendo,'Dateofprocedure','HospitalNumber',
Mypath,'Dateofprocedure','HospitalNumber')
PathSiteAndType <- HistolTypeAndSite(Myendo2$PathReportWhole,
Myendo2$Macroscopicdescription, Myendo2$ProcedurePerformed)

Number of tests done per month and year by indication

Description

Get an overall idea of how many endoscopies have been done for an indication by year and month. This is a more involved version of SurveilCapacity function. It takes string for the Indication for the test

Usage

HowManyOverTime(dataframe, Indication, Endo_ResultPerformed, StringToSearch)

Arguments

dataframe

dataframe

Indication

Indication column

Endo_ResultPerformed

column containing date the Endoscopy was performed

StringToSearch

The string in the Indication to search for

Details

This returns a list which contains a plot (number of tests for that indication over time and a table with the same information broken down by month and year).

See Also

Other Basic Analysis - Surveillance Functions: SurveilFirstTest(), SurveilLastTest(), SurveilTimeByRow(), TimeToStatus()

Examples

# This takes the dataframe MyEndo (part of the package examples) and looks in
# the column which holds the test indication (in this example it is called
# 'Indication' The date of the procedure column(which can be date format or
# POSIX format) is also necessary.  Finally the string which indicates the text
# indication needs to be inpoutted. In this case we are looking for all endoscopies done
# where the indication is surveillance (so searching on 'Surv' will do fine).
# If you want all the tests then put '.*' instead of Surv
rm(list = ls(all = TRUE))
ff <- HowManyOverTime(Myendo, "Indications", "Dateofprocedure", ".*")

Cleans medication column if present

Description

This extracts all of the relevant IBD scores where present from the medical text.

Usage

IBD_Scores(inputColumn1)

Arguments

inputColumn1

column of interest as a string vector

Value

This returns a dataframe with all the scores in it

Examples

# Example to be provided

Extract from report, using words from a list

Description

The aim here is simply to produce a document term matrix to get the frequency of all the words, then extract the words you are interested in with tofind then find which reports have those words. Then find what proportion of the reports have those terms.

Usage

ListLookup(theframe, EndoReportColumn, myNotableWords)

Arguments

theframe

the dataframe,

EndoReportColumn

the column of interest,

myNotableWords

list of words you are interested in

See Also

Other Basic Column mutators: EntityPairs_OneSentence(), EntityPairs_TwoSentence(), ExtrapolatefromDictionary(), MyImgLibrary()

Examples

# The function relies on defined a list of
# words you are interested in and then choosing the column you are
# interested in looking in for these words. This can be for histopathology
# free text columns or endoscopic. In this example it is for endoscopic
# columns
myNotableWords <- c("arrett", "oeliac")
jj <- ListLookup(Myendo, "Findings", myNotableWords)

Use list of upper and lower GI standard locations

Description

The is a list of standard locations at endoscopy. It used for the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function.

Usage

LocationList()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), RFACath(), WordsToNumbers()


Use list of standard locations for lower GI endoscopy

Description

The is a list of standard locations at endoscopy that is used in the extraction of the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function

Usage

LocationListLower()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()


Use list of standard locations for upper GI endoscopy

Description

The is a list of standard locations at endoscopy that is used in the extraction of the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function

Usage

LocationListUniversal()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUpper(), LocationList(), RFACath(), WordsToNumbers()


Use list of standard locations for upper GI endoscopy

Description

The is a list of standard locations at endoscopy that is used in the extraction of the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function.

Usage

LocationListUpper()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationList(), RFACath(), WordsToNumbers()


Plot a metric by endoscopist

Description

This takes any of the numerical metrics in the dataset and plots it by endoscopist. It of course relies on a Endoscopist column being present

Usage

MetricByEndoscopist(dataframe, Column, EndoscopistColumn)

Arguments

dataframe

The dataframe

Column

The column (numeric data) of interest

EndoscopistColumn

The endoscopist column

See Also

Other Grouping by endoscopist: CategoricalByEndoscopist()

Examples

#The function gives a table with any numeric
# metric by endoscopist
# In this example we tabulate medication by
# endoscopist
# Lets bind the output of EndoscMeds to the main dataframe so we
# have a complete dataframe with all the meds extracted
MyendoNew<-cbind(EndoscMeds(Myendo$Medications),Myendo)

# Now lets look at the fentanly use per Endoscopist:
kk<-MetricByEndoscopist(MyendoNew,'Endoscopist','Fent')
#EndoBasicGraph(MyendoNew, "Endoscopist", "Fent") #run this
#if you want to see the graph
rm(Myendo)

Fake Endoscopies

Description

A dataset containing fake endoscopy reports. The report fields have already been The report field is derived from the whole report as follows: Myendo<-TheOGDReportFinal Myendo$OGDReportWhole<-gsub('2nd Endoscopist:','Second endoscopist:',Myendo$OGDReportWhole) EndoscTree<-list('Hospital Number:','Patient Name:','General Practitioner:', 'Date of procedure:','Endoscopist:','Second endoscopist:','Medications', 'Instrument','Extent of Exam:','Indications:','Procedure Performed:','Findings:', 'Endoscopic Diagnosis:') for(i in 1:(length(EndoscTree)-1)) Myendo<-Extractor(Myendo,'OGDReportWhole',as.character(EndoscTree[i]), as.character(EndoscTree[i+1]),as.character(EndoscTree[i])) Myendo$Dateofprocedure<-as.Date(Myendo$Dateofprocedure)

Usage

Myendo

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole

The whole report, in text

HospitalNumber

Hospital Number, in text

PatientName

Patient Name, in text

GeneralPractitioner

General Practitioner, in text

Dateofprocedure

Date of the procedure, as date

Endoscopist

Endoscopist, in text

Secondendoscopist

Secondendoscopist, in text

Medications

Medications, in text

Instrument

Instrument, in text

ExtentofExam

ExtentofExam, in text

Indications

Indications, in text

ProcedurePerformed

Procedure Performed, in text

Findings

Endoscopic findings, in text


Clean html endoscopic images

Description

This is used to pick and clean endoscopic images from html exports so they can be prepared before being linked to pathology and endoscopy reports

Usage

MyImgLibrary(file, delim, location)

Arguments

file

The html report to extract (the html will have all the images references in it)

delim

The phrase that separates individual endoscopies

location

The folder containing the actual images

See Also

Other Basic Column mutators: EntityPairs_OneSentence(), EntityPairs_TwoSentence(), ExtrapolatefromDictionary(), ListLookup()

Examples

# MyImgLibrary("~/Images Captured with Proc Data Audit_Findings1.html",
#                         "procedureperformed","~/")

Fake Pathology report

Description

A dataset containing fake pathology reports. The report field is derived from the whole report as follows: Mypath<-PathDataFrameFinalColon HistolTree<-list('Hospital Number','Patient Name','DOB:','General Practitioner:', 'Date of procedure:','Clinical Details:','Macroscopic description:','Histology:','Diagnosis:',”) for(i in 1:(length(HistolTree)-1)) Mypath<-Extractor(Mypath,'PathReportWhole',as.character(HistolTree[i]), as.character(HistolTree[i+1]),as.character(HistolTree[i])) Mypath$Dateofprocedure<-as.Date(Mypath$Dateofprocedure)

Usage

Mypath

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole

The whole report, in text

HospitalNumber

Hospital Number, in text

PatientName

Patient Name, in text

DOB

Date of Birth, in text

GeneralPractitioner

General Practitioner, in text

Dateofprocedure

Date of the procedure, as date

ClinicalDetails

Clinical Details, in text

Macroscopicdescription

Macroscopic description of the report, in text

Histology

Histology, in text

Diagnosis

Diagnosis, in text


Remove negative and normal sentences

Description

Extraction of the negative sentences so that normal findings can be removed and not counted when searching for true diseases. eg remove 'No evidence of candidal infection' so it doesn't get included if looking for candidal infections. It is used by default as part of the textPrep function but can be turned off as an optional parameter

Usage

NegativeRemove(inputText)

Arguments

inputText

column of interest

Value

This returns a column within a dataframe. THis should be changed to a character vector eventually

See Also

Other NLP - Text Cleaning and Extraction: ColumnCleanUp(), DictionaryInPlaceReplace(), Extractor(), NegativeRemoveWrapper(), textPrep()

Examples

# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
hh<-NegativeRemove(anexample$Thecol)

Wrapper for Negative Remove

Description

This performs negative removal on a per sentance basis

Usage

NegativeRemoveWrapper(inputText)

Arguments

inputText

the text to remove Negatives from

Value

This returns a column within a dataframe. This should be changed to a character vector eventually

See Also

Other NLP - Text Cleaning and Extraction: ColumnCleanUp(), DictionaryInPlaceReplace(), Extractor(), NegativeRemove(), textPrep()

Examples

# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
#hh<-NegativeRemoveWrapper(anexample$Thecol)

Fake Upper GI Pathology Set

Description

A dataset containing fake pathology reports for upper GI endoscopy tissue specimens. The report field is provided as a whole report without any fields having been already extracted

Usage

PathDataFrameFinal

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole

The whole report, in text


Fake Lower GI Pathology Set

Description

A dataset containing fake pathology reports for lower GI endoscopy tissue specimens. The report field is provided as a whole report without any fields having been already extracted

Usage

PathDataFrameFinalColon

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole

The whole report, in text


Create a Circos plot for patient flow

Description

This allows us to look at the overall flow from one type of procedure to another using circos plots. A good example of it's use might be to see how patients move from one state (e.g. having an EMR), to another state (e.g. undergoing RFA)

Usage

PatientFlow_CircosPlots(
  dataframe,
  Endo_ResultPerformed,
  HospNum_Id,
  ProcPerformed
)

Arguments

dataframe

dataframe

Endo_ResultPerformed

the column containing the date of the procedure

HospNum_Id

Column with the patient's unique hospital number

ProcPerformed

The procedure that you want to plot (eg EMR, radiofrequency ablation for Barrett's but can be any dscription of a procedure you desire)

Examples

# This function builds a circos plot which gives a more aggregated
# overview of how patients flow from one state to another than the
# SurveySankey function
# Build a list of procedures
Event <- list(
  x1 = "Therapeutic- Dilatation",
  x2 = "Other-", x3 = "Surveillance",
  x4 = "APC", x5 = "Therapeutic- RFA TTS",
  x5 = "Therapeutic- RFA 90",
  x6 = "Therapeutic- EMR", x7 = "Therapeutic- RFA 360"
)
EndoEvent <- replicate(2000, sample(Event, 1, replace = FALSE))
# Merge the list with the Myendo dataframe
fff <- unlist(EndoEvent)
fff <- data.frame(fff)
names(fff) <- "col1"
Myendo$EndoEvent<-fff$col1
names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
names(Myendo)[names(Myendo) == "fff$col1"] <- "EndoEvent"
# Myendo$EndoEvent<-as.character(Myendo$EndoEvent)
# Run the function using the procedure information (the date of the
# procedure, the Event type and the individual patient IDs)
hh <- PatientFlow_CircosPlots(Myendo, "Dateofprocedure", "PatientID", "EndoEvent")
rm(Myendo)
rm(EndoEvent)

Create a plot over time of patient categorical findings as a line chart

Description

This plots the findings at endoscopy (or pathology) over time for individual patients. An example might be with worst pathological grade on biopsy for Barrett's oesophagus over time

Usage

PatientFlowIndividual(
  theframe,
  EndoReportColumn,
  myNotableWords,
  DateofProcedure,
  PatientID
)

Arguments

theframe

dataframe

EndoReportColumn

the column containing the date of the procedure

myNotableWords

The terms from a column with categorical variables

DateofProcedure

Column with the date of the procedure

PatientID

Column with the patient's unique identifier

See Also

Other Patient Flow functions: SurveySankey()

Examples

# This function builds chart of categorical outcomes for individal patients over time
# It allows a two dimensional visualisation of patient progress. A perfect example is 
# visualising the Barrett's progression for patients on surveillance and then
# therapy if dysplasia develops and highlighting recurrence if it happens
# Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")
# myNotableWords<-c("No_IM","IM","LGD","HGD","T1a","IGD","SM1","SM2")
# PatientFlowIndividual(Barretts_df,"IMorNoIM",myNotableWords,DateofProcedure,"HospitalNumber")
# Once the function is run you should always call dev.off()

Use list of catheters used in radiofrequency ablation

Description

The takes a list of catheters used in radiofrequency ablation.

Usage

RFACath()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), WordsToNumbers()


Create a basic consort diagram from dataframes

Description

This function creates a consort diagram using diagrammeR by assessing all of the dataframes in your script and populating each box in the consort diagram with the number of rows in each dataframe as well as how the dataframes are linked together. The user just provides a pathname for the script

Usage

sanity(pathName)

Arguments

pathName

The string in the Indication to search for

Examples

#pathName<-paste0(here::here(),"/inst/TemplateProject/munge/PreProcessing.R")
#sanity(pathName)
# This creates a consort diagram from any R script (not Rmd). It
# basically tells you how all the dataframes are related and how many
# rows each dataframe has so you can see if any data has been lost
# on the way.

Set the colour theme for all the ggplots

Description

This standardises the colours for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

scale_colour_Publication()

See Also

Other Data Presentation helpers: EndoBasicGraph(), scale_fill_Publication(), theme_Publication()

Examples

# None needed

Set the fills for all the ggplots

Description

This standardises the fills for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

scale_fill_Publication()

See Also

Other Data Presentation helpers: EndoBasicGraph(), scale_colour_Publication(), theme_Publication()

Examples

# None needed

Find and Replace

Description

This is a helper function for finding and replacing from lexicons like the event list. The lexicons are all named lists where the name is the text to replace and the value what it should be replaced with It uses fuzzy find and replace to account for spelling errors

Usage

spellCheck(pattern, replacement, x, fixed = FALSE)

Arguments

pattern

the pattern to look for

replacement

the pattern replaceme with

x

the target string

fixed

whether the pattern is regex or not. Default not.

Value

This returns a character vector

Examples

L <- tolower(stringr::str_split(HistolType(),"\\|"))

Extracts the first test only per patient

Description

Extracts the first test only per patient and returns a new dataframe listing the patientID and the first test done

Usage

SurveilFirstTest(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

dataframe

dataframe

HospNum_Id

Patient ID

Endo_ResultPerformed

Date of the Endoscopy

See Also

Other Basic Analysis - Surveillance Functions: HowManyOverTime(), SurveilLastTest(), SurveilTimeByRow(), TimeToStatus()

Examples

dd <- SurveilFirstTest(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)

Extract the last test done by a patient only

Description

This extracts the last test only per patient and returns a new dataframe listing the patientID and the last test done

Usage

SurveilLastTest(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

dataframe

dataframe

HospNum_Id

Patient ID

Endo_ResultPerformed

Date of the Endoscopy

See Also

Other Basic Analysis - Surveillance Functions: HowManyOverTime(), SurveilFirstTest(), SurveilTimeByRow(), TimeToStatus()

Examples

cc <- SurveilLastTest(Myendo, "HospitalNumber", "Dateofprocedure")

Extract the time difference between each test in days

Description

This determines the time difference between each test for a patient in days It returns the time since the first and the last study as a new dataframe.

Usage

SurveilTimeByRow(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

dataframe

dataframe,

HospNum_Id

Patient ID

Endo_ResultPerformed

Date of the Endoscopy

See Also

Other Basic Analysis - Surveillance Functions: HowManyOverTime(), SurveilFirstTest(), SurveilLastTest(), TimeToStatus()

Examples

aa <- SurveilTimeByRow(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)

Create a Sankey plot for patient flow

Description

The purpose of the function is to provide a Sankey plot which allows the analyst to see the proportion of patients moving from one state (in this case type of Procedure) to another. This allows us to see for example how many EMRs are done after RFA.

Usage

SurveySankey(dfw, ProcPerformedColumn, PatientID)

Arguments

dfw

the dataframe extracted using the standard cleanup scripts

ProcPerformedColumn

the column containing the test like P rocPerformed for example

PatientID

the column containing the patients unique identifier eg hostpital number

See Also

Other Patient Flow functions: PatientFlowIndividual()

Examples

names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
gg <- SurveySankey(Myendo, "ProcedurePerformed", "PatientID")

Combine all the text cleaning and extraction functions into one

Description

This function prepares the data by cleaning punctuation, checking spelling against the lexicons, mapping terms according to the lexicons and lower casing everything. It contains several of the other functions in the package for ease of use.

Usage

textPrep(inputText, delim)

Arguments

inputText

The relevant pathology text columns

delim

the delimitors so the extractor can be used

Value

This returns a string vector.

See Also

Other NLP - Text Cleaning and Extraction: ColumnCleanUp(), DictionaryInPlaceReplace(), Extractor(), NegativeRemoveWrapper(), NegativeRemove()

Examples

mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
CleanResults<-textPrep(PathDataFrameFinal$PathReportWhole,mywords)

Set the publication theme for all the ggplots

Description

This standardises the theme for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

theme_Publication(base_size = 14, base_family = "Helvetica")

Arguments

base_size

the base size

base_family

the base family

See Also

Other Data Presentation helpers: EndoBasicGraph(), scale_colour_Publication(), scale_fill_Publication()

Examples

# None needed

Fake Upper GI Endoscopy Set

Description

A dataset containing fake endoscopy reports. The report field is provided as a whole report without any fields having been already extracted

Usage

TheOGDReportFinal

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole

The whole report, in text


Extract the time to an event

Description

This function selects patients who have had a start event and an end event of the users choosing so you can determine things like how long it takes to get a certain outcome. For example, how long does it take to get a patient into a fully squamous oesophagus after Barrett's ablation for dysplasia?

Usage

TimeToStatus(dataframe, HospNum, EVENT, indicatorEvent, endEvent)

Arguments

dataframe

The dataframe

HospNum

The Hospital Number column

EVENT

The column that contains the outcome of choice

indicatorEvent

The name of the start event (can be a regular expression)

endEvent

The name of the endpoint (can be a regular expression)

See Also

Other Basic Analysis - Surveillance Functions: HowManyOverTime(), SurveilFirstTest(), SurveilLastTest(), SurveilTimeByRow()

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$IMorNoIM <- Barretts_PathStage(b1, "Histology")
colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"

# The function groups the procedures by patient and gives
# all the procedures between
# the indicatorEvent amd the procedure just after the endpoint.
# Eg if the start is RFA and the
# endpoint is biopsies then it will give all RFA procedures and
# the first biopsy procedure

b1$EndoscopyEvent <- EndoscopyEvent(
  b1, "Findings", "ProcedurePerformed",
  "Macroscopicdescription", "Histology"
)
nn <- TimeToStatus(b1, "eHospitalNum", "EndoscopyEvent", "rfa", "dilat")
rm(v)

Fake Lower GI Endoscopy Set including Pathology

Description

A dataset containing fake lower GI endoscopy reports and pathology reports all pre-extracted

Usage

vColon

Format

A data frame with 2000 rows and 26 variables:

pHospitalNum

The HospitalNum, in text

PatientName.x

The PatientName, in text

GeneralPractitioner.x

The GeneralPractitioner report, in text

Date.x

The Date, in date

Endoscopist

The Endoscopist report, in text

Secondendoscopist

The Secondendoscopist report, in text

Medications

The Medications report, in text

Instrument

The Instrument report, in text

ExtentofExam

The ExtentofExam report, in text

Indications

The Indications report, in text

ProcedurePerformed

The ProcedurePerformed report, in text

Findings

The Findings report, in text

EndoscopicDiagnosis

The EndoscopicDiagnosis report, in text

Original.x

The Original endosocpy report, in text

eHospitalNum

The HospitalNum, in text

PatientName.y

The PatientName, in text

DOB

The DOB, in date

GeneralPractitioner.y

The GeneralPractitioner report, in text

Date.y

The Date.y , in date

ClinicalDetails

The ClinicalDetails report, in text

Natureofspecimen

The Natureofspecimen report, in text

Macroscopicdescription

The Macroscopicdescription report, in text

Histology

The Histology report, in text

Diagnosis

The Diagnosis report, in text

Original.y

The whole report, in text

Days

Days, in numbers


Convetr words to numbers especially for the histopathology text

Description

This function converts words to numbers.

Usage

WordsToNumbers()

See Also

Other NLP - Lexicons: BiopsyIndex(), EventList(), GISymptomsList(), HistolType(), LocationListLower(), LocationListUniversal(), LocationListUpper(), LocationList(), RFACath()