Title: | Highlight Conserved Edits Across Versions of a Document |
---|---|
Description: | Input multiple versions of a source document, and receive HTML code for a highlighted version of the source document indicating the frequency of occurrence of phrases in the different versions. This method is described in Chapter 3 of Rogers (2024) <https://digitalcommons.unl.edu/dissertations/AAI31240449/>. |
Authors: | Center for Statistics and Applications in Forensic Evidence [aut, cph, fnd], Rachel Rogers [aut, cre] , Susan VanderPlas [aut] |
Maintainer: | Rachel Rogers <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2.9000 |
Built: | 2024-11-16 06:22:27 UTC |
Source: | https://github.com/rachelesrogers/highlightr |
This function provides the frequency of collocations in comments that correspond to the provided transcript.
collocate_comments(transcript_token, note_token, collocate_length = 5)
collocate_comments(transcript_token, note_token, collocate_length = 5)
transcript_token |
transcript token to act as baseline for notes, resulting
from |
note_token |
tokenized document of notes, resulting from |
collocate_length |
the length of the collocation. Default is 5 |
data frame of the transcript and corresponding note frequency
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename[1:100,]) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments(toks_transcript, toks_comment)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename[1:100,]) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments(toks_transcript, toks_comment)
This function provides the frequency of collocations in comments that correspond to the provided transcript, using fuzzy matching.
collocate_comments_fuzzy(transcript_token, note_token, collocate_length = 5)
collocate_comments_fuzzy(transcript_token, note_token, collocate_length = 5)
transcript_token |
transcript token to act as baseline for notes, resulting
from |
note_token |
tokenized document of notes, resulting from |
collocate_length |
the length of the collocation. Default is 5 |
data frame of the transcript and corresponding note frequency
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment)
This assigns colors based on frequency to the words in the transcript.
collocation_plot( frequency_doc, n_scenario = 1, colors = c("#f251fc", "#f8ff1b") )
collocation_plot( frequency_doc, n_scenario = 1, colors = c("#f251fc", "#f8ff1b") )
frequency_doc |
document of frequencies (returned from
|
n_scenario |
number of scenarios for which this transcript appeared. Defualt is 1 |
colors |
list for color specification for the gradient. Default is c("#f251fc","#f8ff1b") |
list of plot, plot object, and frequency
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object) freq_plot <- collocation_plot(merged_frequency)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object) freq_plot <- collocation_plot(merged_frequency)
Participant comments for the initial description used in the jury perception study
comment_example
comment_example
comment_example
A data frame with 125 rows and 2 columns:
Participant Identifier
Participant notes
Jury Perception Study (see Rogers (2024) https://digitalcommons.unl.edu/dissertations/AAI31240449/)
Adds html tags to create a highlighted testimony corresponding to word frequency.
highlighted_text(plot_object, labels = c("", ""))
highlighted_text(plot_object, labels = c("", ""))
plot_object |
plot object resulting from |
labels |
lower and upper labels for the gradient scale |
html code for highlighted text
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object) freq_plot <- collocation_plot(merged_frequency) page_highlight <- highlighted_text(freq_plot, merged_frequency)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object) freq_plot <- collocation_plot(merged_frequency) page_highlight <- highlighted_text(freq_plot, merged_frequency)
This function tokenizes comments that are to be used in collocate_comments_fuzzy()
or collocate_comments()
token_comments(comment_document)
token_comments(comment_document)
comment_document |
document containing notes by individual, where the column containing the notes is named page_notes |
tokenized comments
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename)
This function tokenizes a transcript document that is to be used in
collocate_comments_fuzzy()
or collocate_comments()
token_transcript(transcript_file)
token_transcript(transcript_file)
transcript_file |
data frame of the transcript, where the transcript text is in a column named text. |
a tokenized object
transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename)
transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename)
Text corresponding to participant comments
transcript_example
transcript_example
transcript_example
A data frame with 1 row and 1 column:
Transcript text corresponding to the jury perception study
Jury Perception Study (see Rogers (2024) https://digitalcommons.unl.edu/dissertations/AAI31240449/ and Garrett et. al. (2020) doi:10.1037/lhb0000423)
This function connects the collocation frequency calculated in
collocate_comments_fuzzy()
to the base transcript.
transcript_frequency(transcript, collocate_object)
transcript_frequency(transcript, collocate_object)
transcript |
transcript document |
collocate_object |
collocation object (returned
from |
a dataframe of the transcript document with collocation values by word
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object)
comment_example_rename <- dplyr::rename(comment_example, page_notes=Notes) toks_comment <- token_comments(comment_example_rename) transcript_example_rename <- dplyr::rename(transcript_example, text=Text) toks_transcript <- token_transcript(transcript_example_rename) collocation_object <- collocate_comments_fuzzy(toks_transcript, toks_comment) merged_frequency <- transcript_frequency(transcript_example_rename, collocation_object)
Text corresponding to versions of the Wikipedia article for Highlighter
wiki_pages
wiki_pages
wiki_pages
A data frame with 50 rows and 1 column:
text of the Wikipedia page for Highlighter
Wikipedia: https://en.wikipedia.org/w/index.php?title=Highlighter&action=history