A collision is an integration (aka a unique combination of the provided
mandatory_IS_vars()
) which is observed in more than one
independent sample.
The function tries to decide to which independent sample should
an integration event be assigned to, and if no
decision can be taken, the integration is completely removed from the data
frame.
For more details refer to the vignette "Collision removal functionality":
vignette("workflow_start", package = "ISAnalytics")
Usage
remove_collisions(
x,
association_file,
independent_sample_id = c("ProjectID", "SubjectID"),
date_col = "SequencingDate",
reads_ratio = 10,
quant_cols = c(seqCount = "seqCount", fragmentEstimate = "fragmentEstimate"),
report_path = default_report_path(),
max_workers = NULL
)
Arguments
- x
Either a multi-quantification matrix (recommended) or a named list of matrices (names must be quantification types)
- association_file
The association file imported via
import_association_file()
- independent_sample_id
A character vector of column names that identify independent samples
- date_col
The date column that should be considered.
- reads_ratio
A single numeric value that represents the ratio that has to be considered when deciding between
seqCount
value.- quant_cols
A named character vector where names are quantification types and values are the names of the corresponding columns. The quantification
seqCount
MUST be included in the vector.- report_path
The path where the report file should be saved. Can be a folder or
NULL
if no report should be produced. Defaults to{user_home}/ISAnalytics_reports
.- max_workers
Maximum number of parallel workers to distribute the workload. If
NULL
(default) produces the maximum amount of workers allowed, a numeric value is requested otherwise. WARNING: a higher number of workers speeds up computation at the cost of memory consumption! Tune this parameter accordingly.
Required tags
The function will explicitly check for the presence of these tags:
project_id
pool_id
pcr_replicate
See also
Other Data cleaning and pre-processing:
aggregate_metadata()
,
aggregate_values_by_key()
,
compute_near_integrations()
,
default_meta_agg()
,
outlier_filter()
,
outliers_by_pool_fragments()
,
purity_filter()
,
realign_after_collisions()
,
threshold_filter()
Examples
data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
no_coll <- remove_collisions(
x = integration_matrices,
association_file = association_file,
report_path = NULL
)
#> Identifying collisions...
#> Processing collisions...
#> Finished!
head(no_coll)
#> # A tibble: 6 × 8
#> chr integration_locus strand GeneName GeneStrand CompleteAmplificationID
#> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 16602483 + FBXO42 - PJ01_POOL01_LTR83LC46_PT00…
#> 2 1 16602483 + FBXO42 - PJ01_POOL01_LTR37LC2_PT001…
#> 3 1 16602483 + FBXO42 - PJ01_POOL01_LTR85LC54_PT00…
#> 4 1 26446899 + PDIK1L + PJ01_POOL01_LTR85LC54_PT00…
#> 5 1 26446899 + PDIK1L + PJ01_POOL01_LTR83LC46_PT00…
#> 6 1 26446899 + PDIK1L + PJ01_POOL01_LTR69LC52_PT00…
#> # ℹ 2 more variables: seqCount <dbl>, fragmentEstimate <dbl>