Produces a summary of the number of integration events per gene, orders the table in decreasing order and slices the first n rows - either on all the data frame or by group.
Usage
top_targeted_genes(
x,
n = 20,
key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
consider_chr = TRUE,
consider_gene_strand = TRUE,
as_df = TRUE
)
Arguments
- x
An integration matrix - must be annotated
- n
Number of rows to slice
- key
If slice has to be performed for each group, the character vector of column names that identify the groups. If
NULL
considers the whole input data frame.- consider_chr
Logical, should the chromosome be taken into account? See details.
- consider_gene_strand
Logical, should the gene strand be taken into account? See details.
- as_df
If computation is performed by group,
TRUE
returns all groups merged in a single data frame with a column containing the group id. IfFALSE
returns a named list.
Details
Gene grouping
When producing a summary of IS by gene, there are different options that
can be chosen.
The argument consider_chr
accounts for the fact that some genes (same
gene symbol) may span more than one chromosome: if set to TRUE
counts of IS will be separated for those genes that span 2 or more
chromosomes - in other words they will be in 2 different rows of the
output table. On the contrary, if the argument is set to FALSE
,
counts will be produced in a single row.
NOTE: the function counts DISTINCT integration events, which logically corresponds to a union of sets. Be aware of the fact that counts per group and counts with different arguments might be different: if for example counts are performed by considering chromosome and there is one gene symbol with 2 different counts, the sum of those 2 will likely not be equal to the count obtained by performing the calculations without considering the chromosome.
The same reasoning can be applied for the argument consider_gene_strand
,
that takes into account the strand of the gene.
Required tags
The function will explicitly check for the presence of these tags:
chromosome
locus
gene_symbol
gene_strand
Note that the tags "gene_strand" and "chromosome" are explicitly required
only if consider_chr = TRUE
and/or consider_gene_strand = TRUE
.
See also
Other Analysis functions:
CIS_grubbs()
,
HSC_population_size_estimate()
,
compute_abundance()
,
cumulative_is()
,
gene_frequency_fisher()
,
is_sharing()
,
iss_source()
,
sample_statistics()
,
top_integrations()
Examples
data("integration_matrices", package = "ISAnalytics")
top_targ <- top_targeted_genes(
integration_matrices,
key = NULL
)
top_targ
#> # A tibble: 20 × 4
#> GeneName GeneStrand chr n_IS
#> <chr> <chr> <chr> <int>
#> 1 KMT5B - 11 5
#> 2 RERE - 1 4
#> 3 ACOX1 - 17 3
#> 4 ADD1 + 4 3
#> 5 ANKFY1 - 17 3
#> 6 R3HDM2 - 12 3
#> 7 SPATS2 + 12 3
#> 8 ATF1 + 12 2
#> 9 ATF7 - 12 2
#> 10 AXIN1 - 16 2
#> 11 C6orf10 - 6 2
#> 12 CD163 - 12 2
#> 13 CDH2 - 18 2
#> 14 CLECL1 - 12 2
#> 15 CNOT6L - 4 2
#> 16 DNMT1 - 19 2
#> 17 FCHSD2 - 11 2
#> 18 GRB2 - 17 2
#> 19 HBE1 - 11 2
#> 20 HNRNPM + 19 2