Skip to contents

Conversion of precursor, modified peptide and proteinGroup entries to standardized format.

Usage

convert_all_levels(
  input_df,
  input_MQ_pg,
  software = c("MaxQuant", "DIA-NN", "Spectronaut", "PD")
)

Arguments

input_df

A tibble with precursor, modified peptide and proteinGroup level information. For MaxQuant: evidence.txt and proteinGroups.txt, for PD: PSMs.txt with R-friendly headers enabled, for DIA-NN and Spectronaut default output reports.

input_MQ_pg

For MaxQuant: A tibble with proteinGroup level information - proteinGroups.txt.

software

The used analysis software - MaxQuant, PD, DIA-NN or Spectronaut. Default is MaxQuant.

Value

This function returns the original submitted tibble - input_df - including the following new columns:

  • traceR_precursor - software-independent standardized text for precursor entries.

  • traceR_precursor_unknownMods - logical value, if TRUE: a modification is detected, which is not converted to a standardized format.

  • traceR_mod.peptides - software-independent standardized text for modified peptide entries.

  • traceR_mod.peptides_unknownMods - logical value, if TRUE: a modification is detected, which is not converted to a standardized format.

  • traceR_proteinGroups - software-independent standardized text for proteinGroups.

Details

The input entries are converted to a software independent format. The generated entries are appended to the submitted dataframe.

Author

Oliver Kardell

Examples

# Load libraries
library(dplyr)
library(stringr)
library(tidyr)
library(comprehenr)
library(tibble)

# MaxQuant example data
evidence <- tibble::tibble(
  "Modified sequence" = c("_AACLLPK_",
   "_ALTDM(Oxidation (M))PQM(Oxidation (M))R_",
   "ALTDM(Dummy_Modification)PQMK"),
  Charge = c(2,2,3),
  "Protein group IDs" = c("26", "86;17", "86;17")
)

proteingroups <- tibble::tibble(
"Protein IDs" = c("A0A075B6P5;P01615;A0A087WW87;P01614;A0A075B6S6", "P02671", "P02672"),
id = c(26, 86, 17)
)

# Conversion
convert_all_levels(
 input_df = evidence,
 input_MQ_pg = proteingroups,
 software = "MaxQuant"
)
#> # A tibble: 5 x 10
#>   `Protein IDs`      id `Modified sequenc~ Precursor.Id  Charge traceR_mod.pept~
#>   <chr>           <dbl> <chr>              <chr>          <dbl> <chr>           
#> 1 A0A075B6P5;P01~    26 _AACLLPK_          AACLLPK2           2 AACLLPK         
#> 2 P02671             86 _ALTDM(Oxidation ~ ALTDM(Oxidat~      2 ALTDM(UniMod:35~
#> 3 P02671             86 ALTDM(Dummy_Modif~ ALTDM(DummyM~      3 ALTDM(DummyModi~
#> 4 P02672             17 _ALTDM(Oxidation ~ ALTDM(Oxidat~      2 ALTDM(UniMod:35~
#> 5 P02672             17 ALTDM(Dummy_Modif~ ALTDM(DummyM~      3 ALTDM(DummyModi~
#> # ... with 4 more variables: traceR_mod.peptides_unknownMods <lgl>,
#> #   traceR_precursor <chr>, traceR_precursor_unknownMods <lgl>,
#> #   traceR_proteinGroups <chr>