Sorts and keeps the top n integration sites based on the values in a given column.
Source:R/analysis-functions.R
top_integrations.Rd
The input data frame will be sorted by the highest values in the columns specified and the top n rows will be returned as output. The user can choose to keep additional columns in the output by passing a vector of column names or passing 2 "shortcuts":
keep = "everything"
keeps all columns in the original data framekeep = "nothing"
only keeps the mandatory columns (mandatory_IS_vars()
) plus the columns in thecolumns
parameter.
Usage
top_integrations(
x,
n = 20,
columns = "fragmentEstimate_sum_RelAbundance",
keep = "everything",
key = NULL
)
Arguments
- x
An integration matrix (data frame containing
mandatory_IS_vars()
)- n
How many integrations should be sliced (in total or for each group)? Must be numeric or integer and greater than 0
- columns
Columns to use for the sorting. If more than a column is supplied primary ordering is done on the first column, secondary ordering on all other columns
- keep
Names of the columns to keep besides
mandatory_IS_vars()
andcolumns
- key
Either
NULL
or a character vector of column names to group by. If notNULL
the input will be grouped and the top fraction will be extracted from each group.
Value
Either a data frame with at most n rows or a data frames with at most n*(number of groups) rows.
Required tags
The function will explicitly check for the presence of these tags:
All columns declared in
mandatory_IS_vars()
See also
Other Analysis functions:
CIS_grubbs()
,
HSC_population_size_estimate()
,
compute_abundance()
,
cumulative_is()
,
gene_frequency_fisher()
,
is_sharing()
,
iss_source()
,
sample_statistics()
,
top_targeted_genes()
Examples
smpl <- tibble::tibble(
chr = c("1", "2", "3", "4", "5", "6"),
integration_locus = c(14536, 14544, 14512, 14236, 14522, 14566),
strand = c("+", "+", "-", "+", "-", "+"),
CompleteAmplificationID = c("ID1", "ID2", "ID1", "ID1", "ID3", "ID2"),
Value = c(3, 10, 40, 2, 15, 150),
Value2 = c(456, 87, 87, 9, 64, 96),
Value3 = c("a", "b", "c", "d", "e", "f")
)
top <- top_integrations(smpl,
n = 3,
columns = c("Value", "Value2"),
keep = "nothing"
)
top_key <- top_integrations(smpl,
n = 3,
columns = "Value",
keep = "Value2",
key = "CompleteAmplificationID"
)