Skip to contents

[Experimental] Given an input integration matrix that can be grouped over time, this function adds integrations in groups assuming that if an integration is observed at time point "t" then it is also observed in time point "t+1".

Usage

cumulative_is(
  x,
  key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
  timepoint_col = "TimePoint",
  include_tp_zero = FALSE,
  counts = TRUE,
  keep_og_is = FALSE,
  expand = TRUE
)

Arguments

x

An integration matrix, ideally aggregated via aggregate_values_by_key()

key

The aggregation key used

timepoint_col

The name of the time point column

include_tp_zero

Should time point 0 be included?

counts

Add cumulative counts? Logical

keep_og_is

Keep original set of integrations as a separate column?

expand

If FALSE, for each group, the set of integration sites is returned in a separate column as a nested table, otherwise the resulting column is unnested.

Value

A data frame

Required tags

The function will explicitly check for the presence of these tags:

Examples

data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
aggreg <- aggregate_values_by_key(
    x = integration_matrices,
    association_file = association_file,
    value_cols = c("seqCount", "fragmentEstimate")
)
cumulated_is <- cumulative_is(aggreg)
cumulated_is
#> $coordinates
#> # A tibble: 2,375 × 9
#>    SubjectID CellMarker Tissue TimePoint chr   integration_locus strand GeneName
#>    <chr>     <chr>      <chr>      <dbl> <chr>             <dbl> <chr>  <chr>   
#>  1 PT001     MNC        BM            30 1               8464757 -      RERE    
#>  2 PT001     MNC        BM            30 1              16186297 -      SPEN    
#>  3 PT001     MNC        BM            30 1              40689188 +      RLF     
#>  4 PT001     MNC        BM            30 1             157759338 -      FCRL1   
#>  5 PT001     MNC        BM            30 1             234596545 -      TARBP1  
#>  6 PT001     MNC        BM            30 10            122533902 -      WDR11-A…
#>  7 PT001     MNC        BM            30 11              5306480 +      HBE1    
#>  8 PT001     MNC        BM            30 11             64633964 +      EHD1    
#>  9 PT001     MNC        BM            30 11             65949729 -      PACS1   
#> 10 PT001     MNC        BM            30 11             72097513 +      CLPB    
#> # ℹ 2,365 more rows
#> # ℹ 1 more variable: GeneStrand <chr>
#> 
#> $counts
#> # A tibble: 20 × 5
#>    SubjectID CellMarker Tissue TimePoint is_n_cumulative
#>    <chr>     <chr>      <chr>      <dbl>           <int>
#>  1 PT001     MNC        BM            30              54
#>  2 PT001     MNC        BM            60             147
#>  3 PT001     MNC        BM            90             179
#>  4 PT001     MNC        BM           180             240
#>  5 PT001     MNC        BM           360             240
#>  6 PT001     MNC        PB            30              28
#>  7 PT001     MNC        PB            60              77
#>  8 PT001     MNC        PB            90             104
#>  9 PT001     MNC        PB           180             121
#> 10 PT001     MNC        PB           360             121
#> 11 PT002     MNC        BM            30              98
#> 12 PT002     MNC        BM            60             126
#> 13 PT002     MNC        BM            90             141
#> 14 PT002     MNC        BM           180             184
#> 15 PT002     MNC        BM           360             265
#> 16 PT002     MNC        PB            30              15
#> 17 PT002     MNC        PB            60              26
#> 18 PT002     MNC        PB            90              38
#> 19 PT002     MNC        PB           180              62
#> 20 PT002     MNC        PB           360             109
#>