Package 'EndoMineR' reference manual

Title:	Functions to mine endoscopic and associated pathology datasets
Description:	This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Authors:	Sebastian Zeki [aut, cre]
Maintainer:	Sebastian Zeki <sebastiz@hotmail.com>
License:	GPL-3
Version:	2.0.1.9000
Built:	2025-03-20 20:42:05 UTC
Source:	https://github.com/ropensci/EndoMineR

Determine the Follow up group

Description

This determines the follow up rule a patient should fit in to (according to the British Society for Gastroenterology guidance on Barrett's oesophagus) Specfically it combines the presence of intestinal metaplasia with Prague score so the follow-up group can be determined. It relies on the presence of a Prague score. It should be run after Barretts_PathStage which looks for the worst stage of a specimen and which will determine the presence or absence of intestinal metaplasia if the sample is non-dysplastic. Because reports often do not record a Prague score a more pragmatic approach as been to assess the M stage and if this is not present then to use the C stage extrapolated using the Barretts_Prague function

Usage

Barretts_FUType(dataframe, CStage, MStage, IMorNoIM)
Barretts_FUType(dataframe, CStage, MStage, IMorNoIM)

Arguments

`dataframe`	the dataframe(which has to have been processed by the Barretts_PathStage function first to get IMorNoIM and the Barretts_PragueScore to get the C and M stage if available),
`CStage`	CStage column
`MStage`	MStage column
`IMorNoIM`	IMorNoIM column

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
v <- Barretts_PragueScore(v, "Findings")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
v$FU_Type <- Barretts_FUType(v, "CStage", "MStage", "IMorNoIM")
rm(v)
# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
v <- Barretts_PragueScore(v, "Findings")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
v$FU_Type <- Barretts_FUType(v, "CStage", "MStage", "IMorNoIM")
rm(v)

Get the worst pathological stage for Barrett's

Description

This extracts the pathological stage from the histopathology specimen. It is done using 'degradation' so that it will look for the worst overall grade in the histology specimen and if not found it will look for the next worst and so on. It looks per report not per biopsy (it is more common for histopathology reports to contain the worst overall grade rather than individual biopsy grades). Specfically it extracts the histopathology worst grade within the specimen FOr the sake of accuracy this should alwats be used after the HistolDx function and this removes negative sentences such as 'there is no dysplasia'. This current function should be used on the column derived from HistolDx which is called Dx_Simplified

Usage

Barretts_PathStage(dataframe, PathColumn)
Barretts_PathStage(dataframe, PathColumn)

Arguments

`dataframe`	dataframe with column of interest
`PathColumn`	column of interest

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# The function then takes the Histology column from the merged data set (v).
# It extracts the worst histological grade for a specimen
b <- Barretts_PathStage(Mypath, "Histology")
rm(v)
# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# The function then takes the Histology column from the merged data set (v).
# It extracts the worst histological grade for a specimen
b <- Barretts_PathStage(Mypath, "Histology")
rm(v)

Extract the Prague score

Description

The aim is to extract a C and M stage (Prague score) for Barrett's samples. This is done using a regex where C and M stages are explicitly mentioned in the free text Specfically it extracts the Prague score

Usage

Barretts_PragueScore(dataframe, EndoReportColumn, EndoReportColumn2)
Barretts_PragueScore(dataframe, EndoReportColumn, EndoReportColumn2)

Arguments

`dataframe`	dataframe with column of interest
`EndoReportColumn`	column of interest
`EndoReportColumn2`	second column of interest

Examples

# The example takes the endoscopy demo dataset and searches the
# Findings column (which contains endoscopy free text about the
# procedure itself). It then extracts the Prague score if relevant. I
# find it easiest to use this on a Barrett's subset of data rather than
# a dump of all endoscopies but of course this is a permissible dataset
# too


aa <- Barretts_PragueScore(Myendo, "Findings", "OGDReportWhole")
# The example takes the endoscopy demo dataset and searches the
# Findings column (which contains endoscopy free text about the
# procedure itself). It then extracts the Prague score if relevant. I
# find it easiest to use this on a Barrett's subset of data rather than
# a dump of all endoscopies but of course this is a permissible dataset
# too


aa <- Barretts_PragueScore(Myendo, "Findings", "OGDReportWhole")

Run all the basic Barrett's functions

Description

Function to encapsulate all the Barrett's functions together. This includes the Prague score and the worst pathological grade and then feeds both of these things into the follow up function. The output is a dataframe with all the original data as well as the new columns that have been created.

Usage

BarrettsAll(
  Endodataframe,
  EndoReportColumn,
  EndoReportColumn2,
  Pathdataframe,
  PathColumn
)
BarrettsAll(
  Endodataframe,
  EndoReportColumn,
  EndoReportColumn2,
  Pathdataframe,
  PathColumn
)

Arguments

`Endodataframe`	endoscopy dataframe of interest
`EndoReportColumn`	Endoscopy report field of interest as a string vector
`EndoReportColumn2`	Second endoscopy report field of interest as a string vector
`Pathdataframe`	pathology dataframe of interest
`PathColumn`	Pathology report field of interest as a string vector

Value

Newdf

Examples

Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")
Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")

Get the number of Barrett's biopsies taken

Description

This function gets the number of biopsies taken per endoscopy and compares it to the Prague score for that endoscopy.Endoscopists should be taking a certain number of biopsies given the length of a Barrett's segment so it should be straightforward to detect a shortfall in the number of biopsies being taken. The output is the shortfall per endoscopist

Usage

BarrettsBxQual(dataframe, Endo_ResultPerformed, PatientID, Endoscopist)
BarrettsBxQual(dataframe, Endo_ResultPerformed, PatientID, Endoscopist)

Arguments

`dataframe`	dataframe
`Endo_ResultPerformed`	Date of the Endoscopy
`PatientID`	Patient's unique identifier
`Endoscopist`	name of the column with the Endoscopist names

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath$BxSize <- HistolBxSize(Mypath$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", Mypath, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$PathStage <- Barretts_PathStage(b1, "Histology")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
b1$FU_Type <- Barretts_FUType(b1, "CStage", "MStage", "PathStage")


colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"
# The number of average number of biopsies is then calculated and
# compared to the average Prague C score so that those who are taking
# too few biopsies can be determined
hh <- BarrettsBxQual(
  b1, "Date.x", "HospitalNumber",
  "Endoscopist"
)
rm(v)
# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath$BxSize <- HistolBxSize(Mypath$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", Mypath, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$PathStage <- Barretts_PathStage(b1, "Histology")

# The follow-up group depends on the histology and the Prague score for a
# patient so it takes the processed Barrett's data and then looks in the
# Findings column for permutations of the Prague score.
b1$FU_Type <- Barretts_FUType(b1, "CStage", "MStage", "PathStage")


colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"
# The number of average number of biopsies is then calculated and
# compared to the average Prague C score so that those who are taking
# too few biopsies can be determined
hh <- BarrettsBxQual(
  b1, "Date.x", "HospitalNumber",
  "Endoscopist"
)
rm(v)

Run the Paris classification versus worst histopath grade for Barrett's

Description

This creates a column of Paris grade for all samples where this is mentioned.

Usage

BarrettsParisEMR(Column, Column2)
BarrettsParisEMR(Column, Column2)

Arguments

`Column`	Endoscopy report field of interest as a string vector
`Column2`	Another endoscopy report field of interest as a string vector

Value

a string vector

Examples

# 
Myendo$EMR<-BarrettsParisEMR(Myendo$ProcedurePerformed,Myendo$Findings)
# 
Myendo$EMR<-BarrettsParisEMR(Myendo$ProcedurePerformed,Myendo$Findings)

Index biopsy locations

Description

This function returns all the conversions from common version of events to a standardised event list, much like the Location standardidastion function This does not include EMR as this is extracted from the pathology so is part of pathology type. It is used for automated OPCS-4 coding.

Usage

BiopsyIndex()
BiopsyIndex()

Group anything by Endoscopist and returns the table

Description

This creates a proportion table for categorical variables by endoscopist It of course relies on a Endoscopist column being present

Usage

CategoricalByEndoscopist(ProportionColumn, EndoscopistColumn)
CategoricalByEndoscopist(ProportionColumn, EndoscopistColumn)

Arguments

`ProportionColumn`	The column (categorical data) of interest
`EndoscopistColumn`	The endoscopist column

Examples

# The function plots any numeric metric by endoscopist
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
colnames(v)[colnames(v) == "pHospitalNum"] <- "HospitalNumber"
# The function takes the column with the extracted worst grade of
# histopathology and returns the proportion of each finding (ie
# proportion with low grade dysplasia, high grade etc.) for each
# endoscopist
kk <- CategoricalByEndoscopist(v$IMorNoIM, v$Endoscopist)
rm(Myendo)
# The function plots any numeric metric by endoscopist
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)
# The function relies on the other Barrett's functions being run as well:
v$IMorNoIM <- Barretts_PathStage(v, "Histology")
colnames(v)[colnames(v) == "pHospitalNum"] <- "HospitalNumber"
# The function takes the column with the extracted worst grade of
# histopathology and returns the proportion of each finding (ie
# proportion with low grade dysplasia, high grade etc.) for each
# endoscopist
kk <- CategoricalByEndoscopist(v$IMorNoIM, v$Endoscopist)
rm(Myendo)

Fake Lower GI Endoscopy Set

Description

A dataset containing fake lower GI endoscopy reports. The report field is provided as a whole report without any fields having been already extracted

Usage

ColonFinal
ColonFinal

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole: The whole report, in text

Tidy up messy columns

Description

This does a general clean up of whitespace, semi-colons,full stops at the start of lines and converts end sentence full stops to new lines.

Usage

ColumnCleanUp(vector)
ColumnCleanUp(vector)

Arguments

vector

column of interest

Value

This returns a character vector

Examples

ii<-ColumnCleanUp(Myendo$Findings)
ii<-ColumnCleanUp(Myendo$Findings)

OPCS-4 Coding

Description

This function extracts the OPCS-4 codes for all Barrett's procedures It should take the OPCS-4 from the EVENT and perhaps also using extent depending on how the coding is done. The EVENT column will need to extract multiple findings The hope is that the OPCS-4 column will then map from the EVENT column. This returns a nested list column with the procedure, furthest path site and event performed

Usage

dev_ExtrapolateOPCS4Prep(dataframe, Procedure, PathSite, Event, extentofexam)
dev_ExtrapolateOPCS4Prep(dataframe, Procedure, PathSite, Event, extentofexam)

Arguments

`dataframe`	the dataframe
`Procedure`	The Procedure column
`PathSite`	The column containing the Pathology site
`Event`	the EVENT column
`extentofexam`	the furthest point reached in the examination

Examples

# Need to run the HistolTypeSite and EndoscopyEvent functions first here
# SelfOGD_Dunn$OPCS4w<-ExtrapolateOPCS4Prep(SelfOGD_Dunn,"PROCEDUREPERFORMED",
# "PathSite","EndoscopyEvent")
# Need to run the HistolTypeSite and EndoscopyEvent functions first here
# SelfOGD_Dunn$OPCS4w<-ExtrapolateOPCS4Prep(SelfOGD_Dunn,"PROCEDUREPERFORMED",
# "PathSite","EndoscopyEvent")

Dictionary In Place Replace

Description

This maps terms in the text and replaces them with the standardised term (mapped in the lexicon file) within the text. It is used within the textPrep function.

Usage

DictionaryInPlaceReplace(inputString, list)
DictionaryInPlaceReplace(inputString, list)

Arguments

`inputString`	the input string (ie the full medical report)
`list`	The replacing list

Value

This returns a character vector

Examples

inputText<-DictionaryInPlaceReplace(TheOGDReportFinal$OGDReportWhole,LocationList())
inputText<-DictionaryInPlaceReplace(TheOGDReportFinal$OGDReportWhole,LocationList())

Basic graph creation using the template specified in theme_Publication.

Description

This creates a basic graph using the template specified in theme_Publication. It takes a numeric column and plots it against any non-numeric x axis in a ggplot

Usage

EndoBasicGraph(dataframe, xdata, number)
EndoBasicGraph(dataframe, xdata, number)

Arguments

`dataframe`	dataframe
`xdata`	The x column
`number`	The numeric column

Value

Myplot This is the final plot

Myplot

Examples

# This function plots numeric y vs non-numeric x
# Get some numeric columns e.g. number of biopsies and size
Mypath$Size <- HistolBxSize(Mypath$Macroscopicdescription)
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath2 <- Mypath[, c("NumBx", "Size")]
EndoBasicGraph(Mypath, "Size", "NumBx")
# This function plots numeric y vs non-numeric x
# Get some numeric columns e.g. number of biopsies and size
Mypath$Size <- HistolBxSize(Mypath$Macroscopicdescription)
Mypath$NumBx <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
Mypath2 <- Mypath[, c("NumBx", "Size")]
EndoBasicGraph(Mypath, "Size", "NumBx")

Merge endoscopy and histology data.

Description

This takes the endoscopy dataset date performed and the hospital number column and merges with the equivalent column in the pathology dataset. This is merged within a 7 day time frame as pathology is often reported after endoscopic

Usage

Endomerge2(x, EndoDate, EndoHospNumber, y, PathDate, PathHospNumber)
Endomerge2(x, EndoDate, EndoHospNumber, y, PathDate, PathHospNumber)

Arguments

`x`	Endoscopy dataframe
`EndoDate`	The date the endoscopy was performed
`EndoHospNumber`	The unique hospital number in the endoscopy dataset
`y`	Histopathology dataframe
`PathDate`	The date the endoscopy was performed
`PathHospNumber`	The unique hospital number in the endoscopy dataset

Examples

v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber",
  Mypath, "Dateofprocedure", "HospitalNumber"
)
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber",
  Mypath, "Dateofprocedure", "HospitalNumber"
)

EndoMineR: A package for analysis of endoscopic and related pathology

Description

The goal of EndoMineR is to extract as much information as possible from endoscopy reports and their associated pathology specimens. The package is intended for use by gastroenterologists, pathologists and anyone interested in the analysis of endoscopic and ppathological datasets Gastroenterology now has many standards against which practice is measured although many reporting systems do not include the reporting capability to give anything more than basic analysis. Much of the data is locked in semi-structured text.However the nature of semi-structured text means that data can be extracted in a standardised way- it just requires more manipulation. This package provides that manipulation so that complex endoscopic-pathological analyses, in line with recognised standards for these analyses, can be done.The package is basically in three parts/

Details

The extraction- This is really when the data is provided as full text reports. You may already have the data in a spreadsheet in which case this part isn't necessary.
Cleaning- These are a group of functions that allow the user to extract and clean data commonly found in endoscopic and pathology reports. The cleaning functions usually remove common typos or extraneous information and do some reformatting.
Analyses- The analyses provide graphing function as well as analyses according to the cornerstone questions in gastroenterology- namely surveillance, patient tracking, quality of endoscopy and pathology reporting and diagnostic yield questions.

To learn more about EndoMineR, start with the vignettes: 'browseVignettes(package = "EndoMineR")'

Paste endoscopy and histology results into one

Description

As spreadsheets are likely to be submitted with pre-segregated data as appears from endoscopy software output, these should be remerged prior to cleaning. This function takes the column headers and places it before each text so that the original full text is recreated. It will use the column headers as the delimiter. This should be used before textPrep as the textPrep function takes a character vector (ie the whole report and not a segregated one) only

Usage

EndoPaste(x)
EndoPaste(x)

Arguments

`x`	the dataframe

Value

This returns a list with a dataframe containing one column of the merged text and a character vector which is the delimiter list for when the textPrep function is used

Examples

testList<-structure(list(PatientName = c("Tom Hardy", "Elma Fudd", "Bingo Man"
), HospitalNumber = c("H55435", "Y3425345", "Z343424"), Text = c("All bad. Not good", 
"Serious issues", "from a land far away")), class = "data.frame", row.names = c(NA, -3L))
EndoPaste(testList)
testList<-structure(list(PatientName = c("Tom Hardy", "Elma Fudd", "Bingo Man"
), HospitalNumber = c("H55435", "Y3425345", "Z343424"), Text = c("All bad. Not good", 
"Serious issues", "from a land far away")), class = "data.frame", row.names = c(NA, -3L))
EndoPaste(testList)

Clean endoscopist column

Description

If an endoscopist column is part of the dataset once the extractor function has been used this cleans the endoscopist column from the report. It gets rid of titles It gets rid of common entries that are not needed. It should be used after the textPrep function

Usage

EndoscEndoscopist(EndoscopistColumn)
EndoscEndoscopist(EndoscopistColumn)

Arguments

EndoscopistColumn

The endoscopy text column

Value

This returns a character vector

Examples

Myendo$Endoscopist <- EndoscEndoscopist(Myendo$Endoscopist)
Myendo$Endoscopist <- EndoscEndoscopist(Myendo$Endoscopist)

Clean instrument column

Description

This cleans the Instument column from the report assuming such a column exists (where instrument usually refers to the endoscope number being used.) It gets rid of common entries that are not needed. It should be used after the textPrep function. Note this is possibly going to be deprecated in the next version as the endoscope coding used here is not widely used.

Usage

EndoscInstrument(EndoInstrument)
EndoscInstrument(EndoInstrument)

Arguments

EndoInstrument

column of interest

Value

This returns a character vector

Examples

Myendo$Instrument <- EndoscInstrument(Myendo$Instrument)
Myendo$Instrument <- EndoscInstrument(Myendo$Instrument)

Clean medication column

Description

This cleans medication column from the report assuming such a column exists. It gets rid of common entries that are not needed. It also splits the medication into fentanyl and midazolam numeric doses for use. It should be used after the textPrep function.

Usage

EndoscMeds(MedColumn)
EndoscMeds(MedColumn)

Arguments

MedColumn

column of interest as a string vector

Value

This returns a dataframe

Examples

MyendoNew <- cbind(EndoscMeds(Myendo$Medications), Myendo)
MyendoNew <- cbind(EndoscMeds(Myendo$Medications), Myendo)

Extract the endoscopic event.

Description

This extracts the endoscopic event. It looks for the event term and then looks in the event sentence as well as the one above to see if the location is listed. It only looks within the endoscopy fields. If tissue is taken then this will be extracted with the HistolTypeAndSite function rather than being listed as a result as this is cleaner and more robust.

Usage

EndoscopyEvent(dataframe, EventColumn1, Procedure, Macroscopic, Histology)
EndoscopyEvent(dataframe, EventColumn1, Procedure, Macroscopic, Histology)

Arguments

`dataframe`	datafrane of interest
`EventColumn1`	The relevant endoscopt free text column describing the findings
`Procedure`	Column saying which procedure was performed
`Macroscopic`	Column describing all the macroscopic specimens
`Histology`	Column with free text histology (usually microscopic histology)

Value

This returns a character vector

Examples

# Myendo$EndoscopyEvent<-EndoscopyEvent(Myendo,"Findings",
# "ProcedurePerformed","MACROSCOPICALDESCRIPTION","HISTOLOGY")
# Myendo$EndoscopyEvent<-EndoscopyEvent(Myendo,"Findings",
# "ProcedurePerformed","MACROSCOPICALDESCRIPTION","HISTOLOGY")

See if words from two lists co-exist within a sentence

Description

See if words from two lists co-exist within a sentence. Eg site and tissue type. This function only looks in one sentence for the two terms. If you suspect the terms may occur in adjacent sentences then use the EntityPairs_TwoSentence function.

Usage

EntityPairs_OneSentence(inputText, list1, list2)
EntityPairs_OneSentence(inputText, list1, list2)

Arguments

`inputText`	The relevant pathology text column
`list1`	First list to refer to
`list2`	The second list to look for

Examples

# tbb<-EntityPairs_OneSentence(Mypath$Histology,HistolType(),LocationList())
# tbb<-EntityPairs_OneSentence(Mypath$Histology,HistolType(),LocationList())

Look for relationships between site and event

Description

This is used to look for relationships between site and event especially for endoscopy events where sentences such as 'The stomach polyp was large. It was removed with a snare' ie the therapy and the site are in two different locations.

Usage

EntityPairs_TwoSentence(inputString, list1, list2)
EntityPairs_TwoSentence(inputString, list1, list2)

Arguments

`inputString`	The relevant pathology text column
`list1`	The intial list to assess
`list2`	The other list to look for

Examples

# tbb<-EntityPairs_TwoSentence(Myendo$Findings,EventList(),HistolType())
# tbb<-EntityPairs_TwoSentence(Myendo$Findings,EventList(),HistolType())

Extract the Prague score

Description

Usage

Eosinophilics(dataframe, findings, histol, IndicationsFroExamination)
Eosinophilics(dataframe, findings, histol, IndicationsFroExamination)

Arguments

`dataframe`	dataframe with column of interest
`findings`	column of interest
`histol`	second column of interest
`IndicationsFroExamination`	second column of interest

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)


aa <- Eosinophilics(v, "Findings", "Histology","Indications")
# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)
# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)


aa <- Eosinophilics(v, "Findings", "Histology","Indications")

Use list of endoscopic events and procedures

Description

This function returns all the conversions from common version of events to a standardised event list, much like the Location standardisation function This does not include EMR as this is extracted from the pathology so is part of pathology type.

Usage

EventList()
EventList()

Examples

# unique(unlist(EventList(), use.names = FALSE))
# unique(unlist(EventList(), use.names = FALSE))

Extract columns from the raw text

Description

This is the main extractor for the Endoscopy and Histology report. This relies on the user creating a list of words representing the subheadings. The list is then fed to the Extractor so that it acts as the beginning and the end of the regex used to split the text. Whatever has been specified in the list is used as a column header. Column headers don't tolerate special characters like : or ? and / and don't allow numbers as the start character so these have to be dealt with in the text before processing

Usage

Extractor(inputString, delim)
Extractor(inputString, delim)

Arguments

`inputString`	the column to extract from
`delim`	the vector of words that will be used as the boundaries to extract against

Examples

# As column names cant start with a number, one of the dividing
# words has to be converted
# A list of dividing words (which will also act as column names)
# is then constructed
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
Mypath2<-Extractor(PathDataFrameFinal$PathReportWhole,mywords)
# As column names cant start with a number, one of the dividing
# words has to be converted
# A list of dividing words (which will also act as column names)
# is then constructed
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
Mypath2<-Extractor(PathDataFrameFinal$PathReportWhole,mywords)

Extrapolate from Dictionary

Description

Provides term mapping and extraction in one. Standardises any term according to a mapping lexicon provided and then extracts the term. This is different to the DictionaryInPlaceReplace in that it provides a new column with the extracted terms as opposed to changing it in place

Usage

ExtrapolatefromDictionary(inputString, list)
ExtrapolatefromDictionary(inputString, list)

Arguments

`inputString`	The text string to process
`list`	of words to iterate through

Examples

#Firstly we extract histology from the raw report
# The function then standardises the histology terms through a series of
# regular expressions and then extracts the type of tissue 
Mypath$Tissue<-suppressWarnings(
suppressMessages(
ExtrapolatefromDictionary(Mypath$Histology,HistolType()
)
)
)
rm(MypathExtraction)
#Firstly we extract histology from the raw report
# The function then standardises the histology terms through a series of
# regular expressions and then extracts the type of tissue 
Mypath$Tissue<-suppressWarnings(
suppressMessages(
ExtrapolatefromDictionary(Mypath$Histology,HistolType()
)
)
)
rm(MypathExtraction)

Index of GI symptoms

Description

This function returns all the common GI symptoms. They are simply listed as is without grouping or mapping. They have been derived from a manual list with synonyms derived from the UMLS Methatharus using the browser.

Usage

GISymptomsList()
GISymptomsList()

Create GRS metrics by endoscopist (X-ref with pathology)

Description

This extracts the polyps types from the data (for colonoscopy and flexible sigmoidosscopy data) and outputs the adenoma,adenocarcinoma and hyperplastic detection rate by endoscopist as well as overall number of colonoscopies. This will be extended to other GRS outputs in the future.

Usage

GRS_Type_Assess_By_Unit(dataframe, ProcPerformed, Endo_Endoscopist, Dx, Histol)
GRS_Type_Assess_By_Unit(dataframe, ProcPerformed, Endo_Endoscopist, Dx, Histol)

Arguments

`dataframe`	The dataframe
`ProcPerformed`	The column containing the Procedure type performed
`Endo_Endoscopist`	column containing the Endoscopist name
`Dx`	The column with the Histological diagnosis
`Histol`	The column with the Histology text in it

Examples

nn <- GRS_Type_Assess_By_Unit(
  vColon, "ProcedurePerformed",
  "Endoscopist", "Diagnosis", "Original.y"
)
nn <- GRS_Type_Assess_By_Unit(
  vColon, "ProcedurePerformed",
  "Endoscopist", "Diagnosis", "Original.y"
)

Determine the largest biopsy size from the histology report

Description

This extracts the biopsy size from the report. If there are multiple biopsies it will extract the overall size of each one (size is calculated usually in cubic mm from the three dimensions provided). This will result in row duplication.

Usage

HistolBxSize(MacroColumn)
HistolBxSize(MacroColumn)

Arguments

MacroColumn

Macdescrip

Details

This is usually from the Macroscopic description column.

Examples

rr <- HistolBxSize(Mypath$Macroscopicdescription)
rr <- HistolBxSize(Mypath$Macroscopicdescription)

Extract the number of biopsies taken from the histology report

Description

This extracts the number of biopsies taken from the pathology report. This is usually from the Macroscopic description column. It collects everything from the regex [0-9]1,2.0,3 to whatever the string boundary is (z).

Usage

HistolNumbOfBx(inputString, regString)
HistolNumbOfBx(inputString, regString)

Arguments

`inputString`	The input text to process
`regString`	The keyword to remove and to stop at in the regex

Examples

qq <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")
qq <- HistolNumbOfBx(Mypath$Macroscopicdescription, "specimen")

Use list of pathology types

Description

This standardizes terms to describe the pathology tissue type being examined

Usage

HistolType()
HistolType()

Extract the site a specimen was removed from as well as the type

Description

This needs some blurb to be written. Used in the OPCS4 coding

Usage

HistolTypeAndSite(inputString1, inputString2, procedureString)
HistolTypeAndSite(inputString1, inputString2, procedureString)

Arguments

`inputString1`	The first column to look in
`inputString2`	The second column to look in
`procedureString`	The column with the procedure in it

Value

a list with two columns, one is the type and site and the other is the index to be used for OPCS4 coding later if needed.

Examples

Myendo2<-Endomerge2(Myendo,'Dateofprocedure','HospitalNumber',
Mypath,'Dateofprocedure','HospitalNumber')
PathSiteAndType <- HistolTypeAndSite(Myendo2$PathReportWhole,
Myendo2$Macroscopicdescription, Myendo2$ProcedurePerformed)
Myendo2<-Endomerge2(Myendo,'Dateofprocedure','HospitalNumber',
Mypath,'Dateofprocedure','HospitalNumber')
PathSiteAndType <- HistolTypeAndSite(Myendo2$PathReportWhole,
Myendo2$Macroscopicdescription, Myendo2$ProcedurePerformed)

Number of tests done per month and year by indication

Description

Get an overall idea of how many endoscopies have been done for an indication by year and month. This is a more involved version of SurveilCapacity function. It takes string for the Indication for the test

Usage

HowManyOverTime(dataframe, Indication, Endo_ResultPerformed, StringToSearch)
HowManyOverTime(dataframe, Indication, Endo_ResultPerformed, StringToSearch)

Arguments

`dataframe`	dataframe
`Indication`	Indication column
`Endo_ResultPerformed`	column containing date the Endoscopy was performed
`StringToSearch`	The string in the Indication to search for

Details

This returns a list which contains a plot (number of tests for that indication over time and a table with the same information broken down by month and year).

Examples

# This takes the dataframe MyEndo (part of the package examples) and looks in
# the column which holds the test indication (in this example it is called
# 'Indication' The date of the procedure column(which can be date format or
# POSIX format) is also necessary.  Finally the string which indicates the text
# indication needs to be inpoutted. In this case we are looking for all endoscopies done
# where the indication is surveillance (so searching on 'Surv' will do fine).
# If you want all the tests then put '.*' instead of Surv
rm(list = ls(all = TRUE))
ff <- HowManyOverTime(Myendo, "Indications", "Dateofprocedure", ".*")
# This takes the dataframe MyEndo (part of the package examples) and looks in
# the column which holds the test indication (in this example it is called
# 'Indication' The date of the procedure column(which can be date format or
# POSIX format) is also necessary.  Finally the string which indicates the text
# indication needs to be inpoutted. In this case we are looking for all endoscopies done
# where the indication is surveillance (so searching on 'Surv' will do fine).
# If you want all the tests then put '.*' instead of Surv
rm(list = ls(all = TRUE))
ff <- HowManyOverTime(Myendo, "Indications", "Dateofprocedure", ".*")

Cleans medication column if present

Description

This extracts all of the relevant IBD scores where present from the medical text.

Usage

IBD_Scores(inputColumn1)
IBD_Scores(inputColumn1)

Arguments

inputColumn1

column of interest as a string vector

Value

This returns a dataframe with all the scores in it

Examples

 # Example to be provided
# Example to be provided

Extract from report, using words from a list

Description

The aim here is simply to produce a document term matrix to get the frequency of all the words, then extract the words you are interested in with tofind then find which reports have those words. Then find what proportion of the reports have those terms.

Usage

ListLookup(theframe, EndoReportColumn, myNotableWords)
ListLookup(theframe, EndoReportColumn, myNotableWords)

Arguments

`theframe`	the dataframe,
`EndoReportColumn`	the column of interest,
`myNotableWords`	list of words you are interested in

Examples

# The function relies on defined a list of
# words you are interested in and then choosing the column you are
# interested in looking in for these words. This can be for histopathology
# free text columns or endoscopic. In this example it is for endoscopic
# columns
myNotableWords <- c("arrett", "oeliac")
jj <- ListLookup(Myendo, "Findings", myNotableWords)
# The function relies on defined a list of
# words you are interested in and then choosing the column you are
# interested in looking in for these words. This can be for histopathology
# free text columns or endoscopic. In this example it is for endoscopic
# columns
myNotableWords <- c("arrett", "oeliac")
jj <- ListLookup(Myendo, "Findings", myNotableWords)

Use list of upper and lower GI standard locations

Description

The is a list of standard locations at endoscopy. It used for the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function.

Usage

LocationList()
LocationList()

Use list of standard locations for lower GI endoscopy

Description

The is a list of standard locations at endoscopy that is used in the extraction of the site of biopsies/EMRs and potentially in functions looking at the site of a therapeutic event. It just returns the list in the function

Usage

LocationListLower()
LocationListLower()

Use list of standard locations for upper GI endoscopy

Description

Usage

LocationListUniversal()
LocationListUniversal()

Use list of standard locations for upper GI endoscopy

Description

Usage

LocationListUpper()
LocationListUpper()

Plot a metric by endoscopist

Description

This takes any of the numerical metrics in the dataset and plots it by endoscopist. It of course relies on a Endoscopist column being present

Usage

MetricByEndoscopist(dataframe, Column, EndoscopistColumn)
MetricByEndoscopist(dataframe, Column, EndoscopistColumn)

Arguments

`dataframe`	The dataframe
`Column`	The column (numeric data) of interest
`EndoscopistColumn`	The endoscopist column

Examples

#The function gives a table with any numeric
# metric by endoscopist
# In this example we tabulate medication by
# endoscopist
# Lets bind the output of EndoscMeds to the main dataframe so we
# have a complete dataframe with all the meds extracted
MyendoNew<-cbind(EndoscMeds(Myendo$Medications),Myendo)

# Now lets look at the fentanly use per Endoscopist:
kk<-MetricByEndoscopist(MyendoNew,'Endoscopist','Fent')
#EndoBasicGraph(MyendoNew, "Endoscopist", "Fent") #run this
#if you want to see the graph
rm(Myendo)
#The function gives a table with any numeric
# metric by endoscopist
# In this example we tabulate medication by
# endoscopist
# Lets bind the output of EndoscMeds to the main dataframe so we
# have a complete dataframe with all the meds extracted
MyendoNew<-cbind(EndoscMeds(Myendo$Medications),Myendo)

# Now lets look at the fentanly use per Endoscopist:
kk<-MetricByEndoscopist(MyendoNew,'Endoscopist','Fent')
#EndoBasicGraph(MyendoNew, "Endoscopist", "Fent") #run this
#if you want to see the graph
rm(Myendo)

Fake Endoscopies

Description

A dataset containing fake endoscopy reports. The report fields have already been The report field is derived from the whole report as follows: Myendo<-TheOGDReportFinal Myendo$OGDReportWhole<-gsub('2nd Endoscopist:','Second endoscopist:',Myendo$OGDReportWhole) EndoscTree<-list('Hospital Number:','Patient Name:','General Practitioner:', 'Date of procedure:','Endoscopist:','Second endoscopist:','Medications', 'Instrument','Extent of Exam:','Indications:','Procedure Performed:','Findings:', 'Endoscopic Diagnosis:') for(i in 1:(length(EndoscTree)-1)) Myendo<-Extractor(Myendo,'OGDReportWhole',as.character(EndoscTree[i]), as.character(EndoscTree[i+1]),as.character(EndoscTree[i])) Myendo$Dateofprocedure<-as.Date(Myendo$Dateofprocedure)

Usage

Myendo
Myendo

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole: The whole report, in text
HospitalNumber: Hospital Number, in text
PatientName: Patient Name, in text
GeneralPractitioner: General Practitioner, in text
Dateofprocedure: Date of the procedure, as date
Endoscopist: Endoscopist, in text
Secondendoscopist: Secondendoscopist, in text
Medications: Medications, in text
Instrument: Instrument, in text
ExtentofExam: ExtentofExam, in text
Indications: Indications, in text
ProcedurePerformed: Procedure Performed, in text
Findings: Endoscopic findings, in text

Clean html endoscopic images

Description

This is used to pick and clean endoscopic images from html exports so they can be prepared before being linked to pathology and endoscopy reports

Usage

MyImgLibrary(file, delim, location)
MyImgLibrary(file, delim, location)

Arguments

`file`	The html report to extract (the html will have all the images references in it)
`delim`	The phrase that separates individual endoscopies
`location`	The folder containing the actual images

Examples

# MyImgLibrary("~/Images Captured with Proc Data Audit_Findings1.html",
#                         "procedureperformed","~/")
# MyImgLibrary("~/Images Captured with Proc Data Audit_Findings1.html",
#                         "procedureperformed","~/")

Fake Pathology report

Description

A dataset containing fake pathology reports. The report field is derived from the whole report as follows: Mypath<-PathDataFrameFinalColon HistolTree<-list('Hospital Number','Patient Name','DOB:','General Practitioner:', 'Date of procedure:','Clinical Details:','Macroscopic description:','Histology:','Diagnosis:',”) for(i in 1:(length(HistolTree)-1)) Mypath<-Extractor(Mypath,'PathReportWhole',as.character(HistolTree[i]), as.character(HistolTree[i+1]),as.character(HistolTree[i])) Mypath$Dateofprocedure<-as.Date(Mypath$Dateofprocedure)

Usage

Mypath
Mypath

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole: The whole report, in text
HospitalNumber: Hospital Number, in text
PatientName: Patient Name, in text
DOB: Date of Birth, in text
GeneralPractitioner: General Practitioner, in text
Dateofprocedure: Date of the procedure, as date
ClinicalDetails: Clinical Details, in text
Macroscopicdescription: Macroscopic description of the report, in text
Histology: Histology, in text
Diagnosis: Diagnosis, in text

Remove negative and normal sentences

Description

Extraction of the negative sentences so that normal findings can be removed and not counted when searching for true diseases. eg remove 'No evidence of candidal infection' so it doesn't get included if looking for candidal infections. It is used by default as part of the textPrep function but can be turned off as an optional parameter

Usage

NegativeRemove(inputText)
NegativeRemove(inputText)

Arguments

inputText

column of interest

Value

This returns a column within a dataframe. THis should be changed to a character vector eventually

Examples

# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
hh<-NegativeRemove(anexample$Thecol)
# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
hh<-NegativeRemove(anexample$Thecol)

Wrapper for Negative Remove

Description

This performs negative removal on a per sentance basis

Usage

NegativeRemoveWrapper(inputText)
NegativeRemoveWrapper(inputText)

Arguments

inputText

the text to remove Negatives from

Value

This returns a column within a dataframe. This should be changed to a character vector eventually

Examples

# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
#hh<-NegativeRemoveWrapper(anexample$Thecol)
# Build a character vector and then
# incorporate into a dataframe
anexample<-c("There is no evidence of polyp here",
"Although the prep was poor,there was no adenoma found",
"The colon was basically inflammed, but no polyp was seen",
"The Barrett's segment was not biopsied",
"The C0M7 stretch of Barrett's was flat")
anexample<-data.frame(anexample)
names(anexample)<-"Thecol"
# Run the function on the dataframe and it should get rid of sentences (and
# parts of sentences) with negative parts in them.
#hh<-NegativeRemoveWrapper(anexample$Thecol)

Fake Upper GI Pathology Set

Description

A dataset containing fake pathology reports for upper GI endoscopy tissue specimens. The report field is provided as a whole report without any fields having been already extracted

Usage

PathDataFrameFinal
PathDataFrameFinal

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole: The whole report, in text

Fake Lower GI Pathology Set

Description

A dataset containing fake pathology reports for lower GI endoscopy tissue specimens. The report field is provided as a whole report without any fields having been already extracted

Usage

PathDataFrameFinalColon
PathDataFrameFinalColon

Format

A data frame with 2000 rows and 1 variables:

PathReportWhole: The whole report, in text

Create a Circos plot for patient flow

Description

This allows us to look at the overall flow from one type of procedure to another using circos plots. A good example of it's use might be to see how patients move from one state (e.g. having an EMR), to another state (e.g. undergoing RFA)

Usage

PatientFlow_CircosPlots(
  dataframe,
  Endo_ResultPerformed,
  HospNum_Id,
  ProcPerformed
)
PatientFlow_CircosPlots(
  dataframe,
  Endo_ResultPerformed,
  HospNum_Id,
  ProcPerformed
)

Arguments

`dataframe`	dataframe
`Endo_ResultPerformed`	the column containing the date of the procedure
`HospNum_Id`	Column with the patient's unique hospital number
`ProcPerformed`	The procedure that you want to plot (eg EMR, radiofrequency ablation for Barrett's but can be any dscription of a procedure you desire)

Examples

# This function builds a circos plot which gives a more aggregated
# overview of how patients flow from one state to another than the
# SurveySankey function
# Build a list of procedures
Event <- list(
  x1 = "Therapeutic- Dilatation",
  x2 = "Other-", x3 = "Surveillance",
  x4 = "APC", x5 = "Therapeutic- RFA TTS",
  x5 = "Therapeutic- RFA 90",
  x6 = "Therapeutic- EMR", x7 = "Therapeutic- RFA 360"
)
EndoEvent <- replicate(2000, sample(Event, 1, replace = FALSE))
# Merge the list with the Myendo dataframe
fff <- unlist(EndoEvent)
fff <- data.frame(fff)
names(fff) <- "col1"
Myendo$EndoEvent<-fff$col1
names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
names(Myendo)[names(Myendo) == "fff$col1"] <- "EndoEvent"
# Myendo$EndoEvent<-as.character(Myendo$EndoEvent)
# Run the function using the procedure information (the date of the
# procedure, the Event type and the individual patient IDs)
hh <- PatientFlow_CircosPlots(Myendo, "Dateofprocedure", "PatientID", "EndoEvent")
rm(Myendo)
rm(EndoEvent)
# This function builds a circos plot which gives a more aggregated
# overview of how patients flow from one state to another than the
# SurveySankey function
# Build a list of procedures
Event <- list(
  x1 = "Therapeutic- Dilatation",
  x2 = "Other-", x3 = "Surveillance",
  x4 = "APC", x5 = "Therapeutic- RFA TTS",
  x5 = "Therapeutic- RFA 90",
  x6 = "Therapeutic- EMR", x7 = "Therapeutic- RFA 360"
)
EndoEvent <- replicate(2000, sample(Event, 1, replace = FALSE))
# Merge the list with the Myendo dataframe
fff <- unlist(EndoEvent)
fff <- data.frame(fff)
names(fff) <- "col1"
Myendo$EndoEvent<-fff$col1
names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
names(Myendo)[names(Myendo) == "fff$col1"] <- "EndoEvent"
# Myendo$EndoEvent<-as.character(Myendo$EndoEvent)
# Run the function using the procedure information (the date of the
# procedure, the Event type and the individual patient IDs)
hh <- PatientFlow_CircosPlots(Myendo, "Dateofprocedure", "PatientID", "EndoEvent")
rm(Myendo)
rm(EndoEvent)

Create a plot over time of patient categorical findings as a line chart

Description

This plots the findings at endoscopy (or pathology) over time for individual patients. An example might be with worst pathological grade on biopsy for Barrett's oesophagus over time

Usage

PatientFlowIndividual(
  theframe,
  EndoReportColumn,
  myNotableWords,
  DateofProcedure,
  PatientID
)
PatientFlowIndividual(
  theframe,
  EndoReportColumn,
  myNotableWords,
  DateofProcedure,
  PatientID
)

Arguments

`theframe`	dataframe
`EndoReportColumn`	the column containing the date of the procedure
`myNotableWords`	The terms from a column with categorical variables
`DateofProcedure`	Column with the date of the procedure
`PatientID`	Column with the patient's unique identifier

Examples

# This function builds chart of categorical outcomes for individal patients over time
# It allows a two dimensional visualisation of patient progress. A perfect example is 
# visualising the Barrett's progression for patients on surveillance and then
# therapy if dysplasia develops and highlighting recurrence if it happens
# Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")
# myNotableWords<-c("No_IM","IM","LGD","HGD","T1a","IGD","SM1","SM2")
# PatientFlowIndividual(Barretts_df,"IMorNoIM",myNotableWords,DateofProcedure,"HospitalNumber")
# Once the function is run you should always call dev.off()
# This function builds chart of categorical outcomes for individal patients over time
# It allows a two dimensional visualisation of patient progress. A perfect example is 
# visualising the Barrett's progression for patients on surveillance and then
# therapy if dysplasia develops and highlighting recurrence if it happens
# Barretts_df <- BarrettsAll(Myendo, "Findings", "OGDReportWhole", Mypath, "Histology")
# myNotableWords<-c("No_IM","IM","LGD","HGD","T1a","IGD","SM1","SM2")
# PatientFlowIndividual(Barretts_df,"IMorNoIM",myNotableWords,DateofProcedure,"HospitalNumber")
# Once the function is run you should always call dev.off()

Use list of catheters used in radiofrequency ablation

Description

The takes a list of catheters used in radiofrequency ablation.

Usage

RFACath()
RFACath()

Create a basic consort diagram from dataframes

Description

This function creates a consort diagram using diagrammeR by assessing all of the dataframes in your script and populating each box in the consort diagram with the number of rows in each dataframe as well as how the dataframes are linked together. The user just provides a pathname for the script

Usage

sanity(pathName)
sanity(pathName)

Arguments

pathName

The string in the Indication to search for

Examples

#pathName<-paste0(here::here(),"/inst/TemplateProject/munge/PreProcessing.R")
#sanity(pathName)
# This creates a consort diagram from any R script (not Rmd). It
# basically tells you how all the dataframes are related and how many
# rows each dataframe has so you can see if any data has been lost
# on the way.
#pathName<-paste0(here::here(),"/inst/TemplateProject/munge/PreProcessing.R")
#sanity(pathName)
# This creates a consort diagram from any R script (not Rmd). It
# basically tells you how all the dataframes are related and how many
# rows each dataframe has so you can see if any data has been lost
# on the way.

Set the colour theme for all the ggplots

Description

This standardises the colours for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

scale_colour_Publication()
scale_colour_Publication()

Examples

# None needed
# None needed

Set the fills for all the ggplots

Description

This standardises the fills for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

scale_fill_Publication()
scale_fill_Publication()

Examples

# None needed
# None needed

Find and Replace

Description

This is a helper function for finding and replacing from lexicons like the event list. The lexicons are all named lists where the name is the text to replace and the value what it should be replaced with It uses fuzzy find and replace to account for spelling errors

Usage

spellCheck(pattern, replacement, x, fixed = FALSE)
spellCheck(pattern, replacement, x, fixed = FALSE)

Arguments

`pattern`	the pattern to look for
`replacement`	the pattern replaceme with
`x`	the target string
`fixed`	whether the pattern is regex or not. Default not.

Value

This returns a character vector

Examples

L <- tolower(stringr::str_split(HistolType(),"\\|"))
L <- tolower(stringr::str_split(HistolType(),"\\|"))

Extracts the first test only per patient

Description

Extracts the first test only per patient and returns a new dataframe listing the patientID and the first test done

Usage

SurveilFirstTest(dataframe, HospNum_Id, Endo_ResultPerformed)
SurveilFirstTest(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

`dataframe`	dataframe
`HospNum_Id`	Patient ID
`Endo_ResultPerformed`	Date of the Endoscopy

Examples

dd <- SurveilFirstTest(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)
dd <- SurveilFirstTest(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)

Extract the last test done by a patient only

Description

This extracts the last test only per patient and returns a new dataframe listing the patientID and the last test done

Usage

SurveilLastTest(dataframe, HospNum_Id, Endo_ResultPerformed)
SurveilLastTest(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

`dataframe`	dataframe
`HospNum_Id`	Patient ID
`Endo_ResultPerformed`	Date of the Endoscopy

Examples

cc <- SurveilLastTest(Myendo, "HospitalNumber", "Dateofprocedure")
cc <- SurveilLastTest(Myendo, "HospitalNumber", "Dateofprocedure")

Extract the time difference between each test in days

Description

This determines the time difference between each test for a patient in days It returns the time since the first and the last study as a new dataframe.

Usage

SurveilTimeByRow(dataframe, HospNum_Id, Endo_ResultPerformed)
SurveilTimeByRow(dataframe, HospNum_Id, Endo_ResultPerformed)

Arguments

`dataframe`	dataframe,
`HospNum_Id`	Patient ID
`Endo_ResultPerformed`	Date of the Endoscopy

Examples

aa <- SurveilTimeByRow(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)
aa <- SurveilTimeByRow(
  Myendo, "HospitalNumber",
  "Dateofprocedure"
)

Create a Sankey plot for patient flow

Description

The purpose of the function is to provide a Sankey plot which allows the analyst to see the proportion of patients moving from one state (in this case type of Procedure) to another. This allows us to see for example how many EMRs are done after RFA.

Usage

SurveySankey(dfw, ProcPerformedColumn, PatientID)
SurveySankey(dfw, ProcPerformedColumn, PatientID)

Arguments

`dfw`	the dataframe extracted using the standard cleanup scripts
`ProcPerformedColumn`	the column containing the test like P rocPerformed for example
`PatientID`	the column containing the patients unique identifier eg hostpital number

Examples

names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
gg <- SurveySankey(Myendo, "ProcedurePerformed", "PatientID")

names(Myendo)[names(Myendo) == "HospitalNumber"] <- "PatientID"
gg <- SurveySankey(Myendo, "ProcedurePerformed", "PatientID")

Combine all the text cleaning and extraction functions into one

Description

This function prepares the data by cleaning punctuation, checking spelling against the lexicons, mapping terms according to the lexicons and lower casing everything. It contains several of the other functions in the package for ease of use.

Usage

textPrep(inputText, delim)
textPrep(inputText, delim)

Arguments

`inputText`	The relevant pathology text columns
`delim`	the delimitors so the extractor can be used

Value

This returns a string vector.

Examples

mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
CleanResults<-textPrep(PathDataFrameFinal$PathReportWhole,mywords)
mywords<-c("Hospital Number","Patient Name:","DOB:","General Practitioner:",
"Date received:","Clinical Details:","Macroscopic description:",
"Histology:","Diagnosis:")
CleanResults<-textPrep(PathDataFrameFinal$PathReportWhole,mywords)

Set the publication theme for all the ggplots

Description

This standardises the theme for any ggplot plot produced. If you do use it, like all ggplots it can be extended using the "+" to add whatever else is necessary

Usage

theme_Publication(base_size = 14, base_family = "Helvetica")
theme_Publication(base_size = 14, base_family = "Helvetica")

Arguments

`base_size`	the base size
`base_family`	the base family

Examples

# None needed
# None needed

Fake Upper GI Endoscopy Set

Description

A dataset containing fake endoscopy reports. The report field is provided as a whole report without any fields having been already extracted

Usage

TheOGDReportFinal
TheOGDReportFinal

Format

A data frame with 2000 rows and 1 variables:

OGDReportWhole: The whole report, in text

Extract the time to an event

Description

This function selects patients who have had a start event and an end event of the users choosing so you can determine things like how long it takes to get a certain outcome. For example, how long does it take to get a patient into a fully squamous oesophagus after Barrett's ablation for dysplasia?

Usage

TimeToStatus(dataframe, HospNum, EVENT, indicatorEvent, endEvent)
TimeToStatus(dataframe, HospNum, EVENT, indicatorEvent, endEvent)

Arguments

`dataframe`	The dataframe
`HospNum`	The Hospital Number column
`EVENT`	The column that contains the outcome of choice
`indicatorEvent`	The name of the start event (can be a regular expression)
`endEvent`	The name of the endpoint (can be a regular expression)

Examples

# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$IMorNoIM <- Barretts_PathStage(b1, "Histology")
colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"

# The function groups the procedures by patient and gives
# all the procedures between
# the indicatorEvent amd the procedure just after the endpoint.
# Eg if the start is RFA and the
# endpoint is biopsies then it will give all RFA procedures and
# the first biopsy procedure

b1$EndoscopyEvent <- EndoscopyEvent(
  b1, "Findings", "ProcedurePerformed",
  "Macroscopicdescription", "Histology"
)
nn <- TimeToStatus(b1, "eHospitalNum", "EndoscopyEvent", "rfa", "dilat")
rm(v)
# Firstly relevant columns are extrapolated from the
# Mypath demo dataset. These functions are all part of Histology data
# cleaning as part of the package.
v <- Mypath
v$NumBx <- HistolNumbOfBx(v$Macroscopicdescription, "specimen")
v$BxSize <- HistolBxSize(v$Macroscopicdescription)

# The histology is then merged with the Endoscopy dataset. The merge occurs
# according to date and Hospital number
v <- Endomerge2(
  Myendo, "Dateofprocedure", "HospitalNumber", v, "Dateofprocedure",
  "HospitalNumber"
)

# The function relies on the other Barrett's functions being run as well:
b1 <- Barretts_PragueScore(v, "Findings")
b1$IMorNoIM <- Barretts_PathStage(b1, "Histology")
colnames(b1)[colnames(b1) == "pHospitalNum"] <- "HospitalNumber"

# The function groups the procedures by patient and gives
# all the procedures between
# the indicatorEvent amd the procedure just after the endpoint.
# Eg if the start is RFA and the
# endpoint is biopsies then it will give all RFA procedures and
# the first biopsy procedure

b1$EndoscopyEvent <- EndoscopyEvent(
  b1, "Findings", "ProcedurePerformed",
  "Macroscopicdescription", "Histology"
)
nn <- TimeToStatus(b1, "eHospitalNum", "EndoscopyEvent", "rfa", "dilat")
rm(v)

Fake Lower GI Endoscopy Set including Pathology

Description

A dataset containing fake lower GI endoscopy reports and pathology reports all pre-extracted

Usage

vColon
vColon

Format

A data frame with 2000 rows and 26 variables:

pHospitalNum: The HospitalNum, in text
PatientName.x: The PatientName, in text
GeneralPractitioner.x: The GeneralPractitioner report, in text
Date.x: The Date, in date
Endoscopist: The Endoscopist report, in text
Secondendoscopist: The Secondendoscopist report, in text
Medications: The Medications report, in text
Instrument: The Instrument report, in text
ExtentofExam: The ExtentofExam report, in text
Indications: The Indications report, in text
ProcedurePerformed: The ProcedurePerformed report, in text
Findings: The Findings report, in text
EndoscopicDiagnosis: The EndoscopicDiagnosis report, in text
Original.x: The Original endosocpy report, in text
eHospitalNum: The HospitalNum, in text
PatientName.y: The PatientName, in text
DOB: The DOB, in date
GeneralPractitioner.y: The GeneralPractitioner report, in text
Date.y: The Date.y , in date
ClinicalDetails: The ClinicalDetails report, in text
Natureofspecimen: The Natureofspecimen report, in text
Macroscopicdescription: The Macroscopicdescription report, in text
Histology: The Histology report, in text
Diagnosis: The Diagnosis report, in text
Original.y: The whole report, in text
Days: Days, in numbers

Convetr words to numbers especially for the histopathology text

Description

This function converts words to numbers.

Usage

WordsToNumbers()
WordsToNumbers()

Package 'EndoMineR'

Help Index

Determine the Follow up group

Description

Usage

Arguments

See Also

Examples

Get the worst pathological stage for Barrett's

Description

Usage

Arguments

See Also

Examples

Extract the Prague score

Description

Usage

Arguments

See Also

Examples

Run all the basic Barrett's functions

Description

Usage

Arguments

Value

See Also

Examples

Get the number of Barrett's biopsies taken

Description

Usage

Arguments

See Also

Examples

Run the Paris classification versus worst histopath grade for Barrett's

Description

Usage

Arguments

Value

See Also

Examples

Index biopsy locations

Description

Usage

See Also

Group anything by Endoscopist and returns the table

Description

Usage

Arguments

See Also

Examples

Fake Lower GI Endoscopy Set

Description

Usage

Format

Tidy up messy columns

Description

Usage

Arguments

Value

See Also

Examples

OPCS-4 Coding

Description

Usage

Arguments

Examples

Dictionary In Place Replace

Description

Usage

Arguments

Value

See Also

Examples

Basic graph creation using the template specified in theme_Publication.

Description

Usage

Arguments

Value

See Also

Examples