Aggregates matrices values based on specified key.
Source:R/aggregate-functions.R
aggregate_values_by_key.Rd
Performs aggregation on values contained in the integration matrices based
on the key and the specified lambda. For more details on how to use this
function:
vignette("workflow_start", package = "ISAnalytics")
Usage
aggregate_values_by_key(
x,
association_file,
value_cols = "Value",
key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
lambda = list(sum = ~sum(.x, na.rm = TRUE)),
group = c(mandatory_IS_vars(), annotation_IS_vars()),
join_af_by = "CompleteAmplificationID"
)
Arguments
- x
A single integration matrix or a list of imported integration matrices
- association_file
The imported association file
- value_cols
A character vector containing the names of the columns to apply the given lambdas. Must be numeric or integer columns.
- key
A string or a character vector with column names of the association file to take as key
- lambda
A named list of functions or purrr-style lambdas. See details section.
- group
Other variables to include in the grouping besides
key
, can be set to NULL- join_af_by
A character vector representing the joining key between the matrix and the metadata. Useful to re-aggregate already aggregated matrices.
Details
Setting the lambda parameter
The lambda parameter should always contain a named list of either functions or purrr-style lambdas. It is also possible to specify the namespace of the function in both ways, for example:
Using purrr-style lambdas allows to specify arguments for the functions,
keeping in mind that the first parameter should always be .x
:
It is also possible to use custom user-defined functions, keeping in mind that the symbol will be evaluated in the calling environment, for example if the function is called in the global environment and lambda contains "foo" as a function, "foo" will be evaluated in the global environment.
Constraints on aggregation functions
Functions passed in the lambda parameters must respect a few constraints to properly work and it's the user responsibility to ensure this.
Functions have to accept as input a numeric or integer vector
Function should return a single value or a list/data frame: if a list or a data frame is returned as a result, all the columns will be added to the final data frame.
See also
Other Data cleaning and pre-processing:
aggregate_metadata()
,
compute_near_integrations()
,
default_meta_agg()
,
outlier_filter()
,
outliers_by_pool_fragments()
,
purity_filter()
,
realign_after_collisions()
,
remove_collisions()
,
threshold_filter()
Examples
data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
aggreg <- aggregate_values_by_key(
x = integration_matrices,
association_file = association_file,
value_cols = c("seqCount", "fragmentEstimate")
)
head(aggreg)
#> # A tibble: 6 × 11
#> chr integration_locus strand GeneName GeneStrand SubjectID CellMarker Tissue
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 8464757 - RERE - PT001 MNC BM
#> 2 1 8464757 - RERE - PT001 MNC BM
#> 3 1 8607357 + RERE - PT001 MNC BM
#> 4 1 8607357 + RERE - PT001 MNC BM
#> 5 1 8607357 + RERE - PT001 MNC BM
#> 6 1 8607362 - RERE - PT001 MNC BM
#> # ℹ 3 more variables: TimePoint <chr>, seqCount_sum <dbl>,
#> # fragmentEstimate_sum <dbl>