Identifies and removes collisions.

A collision is an integration (aka a unique combination of the provided mandatory_IS_vars()) which is observed in more than one independent sample. The function tries to decide to which independent sample should an integration event be assigned to, and if no decision can be taken, the integration is completely removed from the data frame. For more details refer to the vignette "Collision removal functionality": vignette("workflow_start", package = "ISAnalytics")

Usage

remove_collisions(
  x,
  association_file,
  independent_sample_id = c("ProjectID", "SubjectID"),
  date_col = "SequencingDate",
  reads_ratio = 10,
  quant_cols = c(seqCount = "seqCount", fragmentEstimate = "fragmentEstimate"),
  report_path = default_report_path(),
  max_workers = NULL
)

Arguments

x: Either a multi-quantification matrix (recommended) or a named list of matrices (names must be quantification types)
association_file: The association file imported via import_association_file()
independent_sample_id: A character vector of column names that identify independent samples
date_col: The date column that should be considered.
reads_ratio: A single numeric value that represents the ratio that has to be considered when deciding between seqCount value.
quant_cols: A named character vector where names are quantification types and values are the names of the corresponding columns. The quantification seqCount MUST be included in the vector.
report_path: The path where the report file should be saved. Can be a folder or NULL if no report should be produced. Defaults to {user_home}/ISAnalytics_reports.
max_workers: Maximum number of parallel workers to distribute the workload. If NULL (default) produces the maximum amount of workers allowed, a numeric value is requested otherwise. WARNING: a higher number of workers speeds up computation at the cost of memory consumption! Tune this parameter accordingly.

Value

Either a multi-quantification matrix or a list of data frames

Required tags

The function will explicitly check for the presence of these tags:

project_id
pool_id
pcr_replicate

Examples

data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
no_coll <- remove_collisions(
    x = integration_matrices,
    association_file = association_file,
    report_path = NULL
)
#> Identifying collisions...
#> Processing collisions...
#> Loading required package: foreach
#> Loading required package: foreach
#> Finished!
head(no_coll)
#> # A tibble: 6 × 8
#>   chr   integration_locus strand GeneName GeneStrand CompleteAmplificationID    
#>   <chr>             <dbl> <chr>  <chr>    <chr>      <chr>                      
#> 1 1              16602483 +      FBXO42   -          PJ01_POOL01_LTR83LC46_PT00…
#> 2 1              16602483 +      FBXO42   -          PJ01_POOL01_LTR37LC2_PT001…
#> 3 1              16602483 +      FBXO42   -          PJ01_POOL01_LTR85LC54_PT00…
#> 4 1              26446899 +      PDIK1L   +          PJ01_POOL01_LTR85LC54_PT00…
#> 5 1              26446899 +      PDIK1L   +          PJ01_POOL01_LTR83LC46_PT00…
#> 6 1              26446899 +      PDIK1L   +          PJ01_POOL01_LTR69LC52_PT00…
#> # ℹ 2 more variables: seqCount <dbl>, fragmentEstimate <dbl>

Usage

Arguments

Value

Required tags

See also

Examples