Title: | Categorical Functional Data Analysis |
---|---|
Description: | Package for the analysis of categorical functional data. The main purpose is to compute an encoding (real functional variable) for each state <doi:10.3390/math9233074>. It also provides functions to perform basic statistical analysis on categorical functional data. |
Authors: | Cristian Preda [aut], Quentin Grimonprez [aut, cre], Vincent Vandewalle [ctb] |
Maintainer: | Quentin Grimonprez <[email protected]> |
License: | AGPL-3 |
Version: | 0.12.0 |
Built: | 2024-11-22 10:25:59 UTC |
Source: | https://github.com/modal-inria/cfda |
2000 16 year-long family life sequences built from the retrospective biographical survey carried out by the Swiss
Household Panel (SHP) in 2002. Data from TraMineR
package.
data(biofam2)
data(biofam2)
A data.frame containing three columns:
id id of individuals (2000 different ids)
time age in years where a change occurs
state new state.
The biofam2 dataset derives from the biofam dataset from TraMineR
package.
The biofam2 format is adapted to cfda
functions.
The biofam data set was constructed by Müller et al. (2007) from the data of the retrospective biographical survey
carried out by the Swiss Household Panel (SHP) in 2002.
The data set contains sequences of family life states from age 15 to 30 (sequence length is 16).
The sequences are a sample of 2000 sequences of those created from the SHP biographical survey.
It includes only individuals who were at least 30 years old at the time of the survey.
The biofam data set describes family life courses of 2000 individuals born between 1909 and 1972.
The eight states are defined from the combination of five basic states, namely Living with parents (Parent), Left home (Left), Married (Marr), Having Children (Child), Divorced: "Parent", "Left", "Married", "Left+Marr", "Child", "Left+Child", "Left+Marr+Child", "Divorced"
Swiss Household Panel https://forscenter.ch/projects/swiss-household-panel/
Müller, N. S., M. Studer, G. Ritschard (2007). Classification de parcours de vie à l'aide de l'optimal matching. In XIVe Rencontre de la Société francophone de classification (SFC 2007), Paris, 5 - 7 septembre 2007, pp. 157–160.
data(biofam2) head(biofam2) plotData(biofam2) # It is recommended to increase the number of cores to reduce computation time set.seed(42) basis <- create.bspline.basis(c(15, 30), nbasis = 4, norder = 4) fmca <- compute_optimal_encoding(biofam2, basis, nCores = 2) plot(fmca, harm = 1) plot(fmca, harm = 2) plotEigenvalues(fmca, cumulative = TRUE, normalize = TRUE) plotComponent(fmca, comp = c(1, 2), addNames = FALSE)
data(biofam2) head(biofam2) plotData(biofam2) # It is recommended to increase the number of cores to reduce computation time set.seed(42) basis <- create.bspline.basis(c(15, 30), nbasis = 4, norder = 4) fmca <- compute_optimal_encoding(biofam2, basis, nCores = 2) plot(fmca, harm = 1) plot(fmca, harm = 2) plotEigenvalues(fmca, cumulative = TRUE, normalize = TRUE) plotComponent(fmca, comp = c(1, 2), addNames = FALSE)
Boxplot of time spent in each state
## S3 method for class 'timeSpent' boxplot(x, col = NULL, ...)
## S3 method for class 'timeSpent' boxplot(x, col = NULL, ...)
x |
output of |
col |
a vector containing color for each state |
... |
extra parameters for |
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other Descriptive statistics:
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) # compute time spent by each id in each state timeSpent <- compute_time_spent(d_JK2) # plot the result boxplot(timeSpent, col = c("#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F")) # modify the plot using ggplot2 library(ggplot2) boxplot(timeSpent, notch = TRUE, outlier.colour = "black") + coord_flip() + labs(title = "Time spent in each state")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) # compute time spent by each id in each state timeSpent <- compute_time_spent(d_JK2) # plot the result boxplot(timeSpent, col = c("#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F")) # modify the plot using ggplot2 library(ggplot2) boxplot(timeSpent, notch = TRUE, outlier.colour = "black") + coord_flip() + labs(title = "Time spent in each state")
Care trajectories of patients diagnosed with a serious and chronic condition
data(care)
data(care)
A data.frame containing three columns:
id id of individuals (2929 different ids)
time number of months since the diagnosis
state new state.
In this study, patients were followed from the time they were diagnosed with a serious and chronic condition and their care trajectories were tracked monthly from the time of diagnosis. The status variable contains the care status of each individual for each month of follow-up. Trajectories have different lengths.
The four states are:
D: diagnosed, but not in care
C: in care, but not on treatment
T: on treatment, but infection not suppressed
S: on treatment and suppressed infection
https://larmarange.github.io/analyse-R/data/care_trajectories.RData https://larmarange.github.io/analyse-R/trajectoires-de-soins.html
Other datasets:
biofam2
,
flours
data(care) head(care) plotData(care) # Individuals has not the same length. In order to compute the encoding, # we keep individuals with at least 18 months of history and work # with the 18 first months. duration <- compute_duration(care) idToKeep <- as.numeric(names(duration[duration >= 18])) care2 <- cut_data(care[care$id %in% idToKeep, ], 18) head(care2) # It is recommended to increase the number of cores to reduce computation time set.seed(42) basis <- create.bspline.basis(c(0, 18), nbasis = 10, norder = 4) fmca <- compute_optimal_encoding(care2, basis, nCores = 2) plotEigenvalues(fmca, cumulative = TRUE, normalize = TRUE) plot(fmca) plot(fmca, addCI = TRUE) plotComponent(fmca, addNames = FALSE)
data(care) head(care) plotData(care) # Individuals has not the same length. In order to compute the encoding, # we keep individuals with at least 18 months of history and work # with the 18 first months. duration <- compute_duration(care) idToKeep <- as.numeric(names(duration[duration >= 18])) care2 <- cut_data(care[care$id %in% idToKeep, ], 18) head(care2) # It is recommended to increase the number of cores to reduce computation time set.seed(42) basis <- create.bspline.basis(c(0, 18), nbasis = 10, norder = 4) fmca <- compute_optimal_encoding(care2, basis, nCores = 2) plotEigenvalues(fmca, cumulative = TRUE, normalize = TRUE) plot(fmca) plot(fmca, addCI = TRUE) plotComponent(fmca, addNames = FALSE)
For each individual, compute the duration
compute_duration(data)
compute_duration(data)
data |
data.frame containing |
a vector containing the duration of each trajectories
Cristian Preda, Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute duration of each individual duration <- compute_duration(d_JK) hist(duration)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute duration of each individual duration <- compute_duration(d_JK) hist(duration)
For each individual, compute the number of jumps performed
compute_number_jumps(data, countDuplicated = FALSE)
compute_number_jumps(data, countDuplicated = FALSE)
data |
data.frame containing |
countDuplicated |
if |
A vector containing the number of jumps for each individual
Cristian Preda, Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute the number of jumps nJump <- compute_number_jumps(d_JK)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute the number of jumps nJump <- compute_number_jumps(d_JK)
Compute the optimal encoding for categorical functional data using an extension of the multiple correspondence analysis to a stochastic process.
compute_optimal_encoding( data, basisobj, computeCI = TRUE, nBootstrap = 50, propBootstrap = 1, method = c("precompute", "parallel"), verbose = TRUE, nCores = max(1, ceiling(detectCores()/2)), ... )
compute_optimal_encoding( data, basisobj, computeCI = TRUE, nBootstrap = 50, propBootstrap = 1, method = c("precompute", "parallel"), verbose = TRUE, nCores = max(1, ceiling(detectCores()/2)), ... )
data |
data.frame containing |
basisobj |
basis created using the |
computeCI |
if TRUE, perform a bootstrap to estimate the variance of encoding functions coefficients |
nBootstrap |
number of bootstrap samples |
propBootstrap |
size of bootstrap samples relative to the number of individuals: propBootstrap * number of individuals |
method |
computation method: "parallel" or "precompute": precompute all integrals (efficient when the number of unique time values is low) |
verbose |
if TRUE print some information |
nCores |
number of cores used for parallelization (only if method == "parallel"). Default is half the cores. |
... |
parameters for |
See the vignette for the mathematical background: RShowDoc("cfda", package = "cfda")
Extra parameters (...) for the integrate
function can be:
subdivisions the maximum number of subintervals.
rel.tol relative accuracy requested.
abs.tol absolute accuracy requested.
A list containing:
eigenvalues
eigenvalues
alpha
optimal encoding coefficients associated with each eigenvectors
pc
principal components
F
matrix containing the
V
matrix containing the
G
covariance matrix of V
basisobj
basisobj
input parameter
pt
output of estimate_pt
function
bootstrap
Only if computeCI = TRUE
. Output of every bootstrap run
varAlpha
Only if computeCI = TRUE
. Variance of alpha parameters
runTime
Total elapsed time
Cristian Preda, Quentin Grimonprez
Deville J.C. (1982) Analyse de données chronologiques qualitatives : comment analyser des calendriers ?, Annales de l'INSEE, No 45, p. 45-104.
Deville J.C. et Saporta G. (1980) Analyse harmonique qualitative, DIDAY et al. (editors), Data Analysis and Informatics, North Holland, p. 375-389.
Saporta G. (1981) Méthodes exploratoires d'analyse de données temporelles, Cahiers du B.U.R.O, Université Pierre et Marie Curie, 37-38, Paris.
Preda C, Grimonprez Q, Vandewalle V. Categorical Functional Data Analysis. The cfda R Package. Mathematics. 2021; 9(23):3074. https://doi.org/10.3390/math9233074
link{plot.fmca}
link{print.fmca}
link{summary.fmca}
link{plotComponent}
link{get_encoding}
Other encoding functions:
get_encoding()
,
plot.fmca()
,
plotComponent()
,
plotEigenvalues()
,
predict.fmca()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 5 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 5 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) summary(encoding) # plot the optimal encoding plot(encoding) # plot the two first components plotComponent(encoding, comp = c(1, 2)) # extract the optimal encoding get_encoding(encoding, harm = 1)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 5 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 5 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) summary(encoding) # plot the optimal encoding plot(encoding) # plot the two first components plotComponent(encoding, comp = c(1, 2)) # extract the optimal encoding get_encoding(encoding, harm = 1)
For each individual, compute the time spent in each state
compute_time_spent(data)
compute_time_spent(data)
data |
data.frame containing |
a matrix with K
columns containing the total time spent in each state for each individual
Cristian Preda, Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) # compute time spent by each id in each state timeSpent <- compute_time_spent(d_JK2)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) # compute time spent by each id in each state timeSpent <- compute_time_spent(d_JK2)
Convert data to categorical functional data
convertToCfd( x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, times = NULL, idLabels = NULL, nx = 200, byrow = FALSE )
convertToCfd( x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, times = NULL, idLabels = NULL, nx = 200, byrow = FALSE )
x |
matrix or fd object |
breaks |
either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut. |
labels |
labels for the levels of the resulting category. By default, labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor. |
include.lowest |
logical, indicating if an ‘x[i]’ equal to the lowest (or highest, for right = FALSE) ‘breaks’ value should be included. |
right |
logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa. |
times |
vector containing values at which |
idLabels |
vector containing id labels. If NULL it use the names found in the matrix or fd object |
nx |
Only if |
byrow |
Only if |
a data.frame in the cfda format
Other format:
cut_data()
,
matrixToCfd()
,
remove_duplicated_states()
# fd object data("CanadianWeather") temp <- CanadianWeather$dailyAv[, , "Temperature.C"] basis <- create.bspline.basis(c(1, 365), nbasis = 8, norder = 4) fd <- smooth.basis(1:365, temp, basis)$fd # "Very Cold" = [-50:-10), "Cold" = [-10:0), ... out <- convertToCfd(fd, breaks = c(-50, -10, 0, 10, 20, 50), labels = c("Very Cold", "Cold", "Fresh", "OK", "Hot"), times = 1:365 ) # matrix out2 <- convertToCfd(temp, breaks = c(-50, -10, 0, 10, 20, 50), labels = c("Very Cold", "Cold", "Fresh", "OK", "Hot"), times = 1:365, byrow = FALSE )
# fd object data("CanadianWeather") temp <- CanadianWeather$dailyAv[, , "Temperature.C"] basis <- create.bspline.basis(c(1, 365), nbasis = 8, norder = 4) fd <- smooth.basis(1:365, temp, basis)$fd # "Very Cold" = [-50:-10), "Cold" = [-10:0), ... out <- convertToCfd(fd, breaks = c(-50, -10, 0, 10, 20, 50), labels = c("Very Cold", "Cold", "Fresh", "OK", "Hot"), times = 1:365 ) # matrix out2 <- convertToCfd(temp, breaks = c(-50, -10, 0, 10, 20, 50), labels = c("Very Cold", "Cold", "Fresh", "OK", "Hot"), times = 1:365, byrow = FALSE )
Cut data to a maximal given time
cut_data( data, Tmax, prolongLastState = "all", NAstate = "Not observed", warning = FALSE )
cut_data( data, Tmax, prolongLastState = "all", NAstate = "Not observed", warning = FALSE )
data |
data.frame containing |
Tmax |
max time considered |
prolongLastState |
list of states to prolong (can be "all"). In the case where the last state of a trajectory is
lesser than |
NAstate |
state value used when the last state is not prolonged. |
warning |
if TRUE, the function raises warnings when it has prolonged a trajectory with NAstate |
a data.frame with the same format as data
where each individual has Tmax
as last time entry.
Cristian Preda
Other format:
convertToCfd()
,
matrixToCfd()
,
remove_duplicated_states()
# Simulate the Jukes-Cantor model of nucleotide replacement set.seed(42) K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) tail(d_JK) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) tail(d_JK2) # do not prolong any state try(d_JK2 <- cut_data(d_JK, Tmax = 12, prolongLastState = c()))
# Simulate the Jukes-Cantor model of nucleotide replacement set.seed(42) K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) tail(d_JK) # cut at Tmax = 8 d_JK2 <- cut_data(d_JK, Tmax = 8) tail(d_JK2) # do not prolong any state try(d_JK2 <- cut_data(d_JK, Tmax = 12, prolongLastState = c()))
Calculates crude initial values for transition intensities by assuming that the data represent the exact transition times of the Markov process.
estimate_Markov(data)
estimate_Markov(data)
data |
data.frame containing |
list of two elements: Q
, the estimated transition matrix, and lambda
,
the estimated time spent in each state
Cristian Preda
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # estimation mark <- estimate_Markov(d_JK) mark$P mark$lambda
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # estimation mark <- estimate_Markov(d_JK) mark$P mark$lambda
Estimate probabilities to be in each state
estimate_pt(data, NAafterTmax = FALSE, timeValues = NULL)
estimate_pt(data, NAafterTmax = FALSE, timeValues = NULL)
data |
data.frame containing |
NAafterTmax |
if TRUE, return NA if t > Tmax otherwise return the state associated with Tmax (useful when individuals has different lengths) |
timeValues |
time values at which probabilities are computed, if NULL, unique(data$time) are used |
A list of two elements:
t: vector of time
pt: a matrix with K (= number of states) rows and with length(t)
columns containing the
probabilities to be in each state at each time.
Cristian Preda, Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) d_JK2 <- cut_data(d_JK, 10) # estimate probabilities estimate_pt(d_JK2)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) d_JK2 <- cut_data(d_JK, 10) # estimate probabilities estimate_pt(d_JK2)
Resistance of dough during the kneading process
data(flours)
data(flours)
flours
is a list of 3 elements:
data
A matrix of size 241*115 containing the resistance of dough (measured every 2s) during the kneading
process. One dough batch = 1 column
quality
Quality of cookies baked with the associated dough (1=Good, 2=Medium, 3=Bad)
time
time values
data(flours) matplot(flours$time, flours$data, col = flours$quality, type = "l", lty = 1) # convert to categorical data flours_cfd <- convertToCfd(flours$data, breaks = c(-Inf, 150, 300, 450, 600, Inf), times = flours$time ) plotData(flours_cfd, group = flours$quality) # convert to categorical data after projecting in a basis of functions basis <- create.bspline.basis(c(0, 480), nbasis = 10) flours_fd <- Data2fd(flours$time, flours$data, basis) plot(flours_fd) flours_cfd2 <- convertToCfd(flours_fd, breaks = c(-Inf, 150, 300, 450, 600, Inf)) plotData(flours_cfd2, group = flours$quality)
data(flours) matplot(flours$time, flours$data, col = flours$quality, type = "l", lty = 1) # convert to categorical data flours_cfd <- convertToCfd(flours$data, breaks = c(-Inf, 150, 300, 450, 600, Inf), times = flours$time ) plotData(flours_cfd, group = flours$quality) # convert to categorical data after projecting in a basis of functions basis <- create.bspline.basis(c(0, 480), nbasis = 10) flours_fd <- Data2fd(flours$time, flours$data, basis) plot(flours_fd) flours_cfd2 <- convertToCfd(flours_fd, breaks = c(-Inf, 150, 300, 450, 600, Inf)) plotData(flours_cfd2, group = flours$quality)
Generate individuals such that each individual starts at time 0 with state 0 and then an unique change
to state 1 occurs at a time generated using an uniform law between 0 and 1.
generate_2State(n)
generate_2State(n)
n |
number of individuals |
a data.frame with 3 columns: id
, id of the trajectory, time
, time at which a change occurs and
state
, new state.
Cristian Preda, Quentin Grimonprez
Simulate individuals from a Markov process defined by a transition matrix, time spent in each time and initial probabilities.
generate_Markov( n = 5, K = 2, P = (1 - diag(K))/(K - 1), lambda = rep(1, K), pi0 = c(1, rep(0, K - 1)), Tmax = 1, labels = NULL )
generate_Markov( n = 5, K = 2, P = (1 - diag(K))/(K - 1), lambda = rep(1, K), pi0 = c(1, rep(0, K - 1)), Tmax = 1, labels = NULL )
n |
number of trajectories to generate |
K |
number of states |
P |
matrix containing the transition probabilities from one state to another. Each row contains positive reals summing to 1. |
lambda |
time spent in each state |
pi0 |
initial distribution of states |
Tmax |
maximal duration of trajectories |
labels |
state names. If |
For one individual, assuming the current state is at time
,
the next state and time is simulated as follows:
generate one sample, , of an exponential law of parameter
lambda[s_j]
define the next time values as:
generate the new state using a multinomial law with probabilities
Q[s_j,]
a data.frame with 3 columns: id
, id of the trajectory, time
,
time at which a change occurs and state
, new state.
Cristian Preda
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10, labels = c("A", "C", "G", "T") ) head(d_JK)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10, labels = c("A", "C", "G", "T") ) head(d_JK)
Extract the encoding as an fd
object or as a matrix
get_encoding(x, harm = 1, fdObject = FALSE, nx = NULL)
get_encoding(x, harm = 1, fdObject = FALSE, nx = NULL)
x |
Output of |
harm |
harmonic to use for the encoding |
fdObject |
If TRUE returns a |
nx |
(Only if |
The encoding is .
a fd
object or a list of two elements y
, a matrix with nx
rows containing
the encoding of the state and x
, the vector with time values.
Cristian Preda
Other encoding functions:
compute_optimal_encoding()
,
plot.fmca()
,
plotComponent()
,
plotEigenvalues()
,
predict.fmca()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # extract the encoding using 1 harmonic encodFd <- get_encoding(encoding, fdObject = TRUE) encodMat <- get_encoding(encoding, nx = 200)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # extract the encoding using 1 harmonic encodFd <- get_encoding(encoding, fdObject = TRUE) encodMat <- get_encoding(encoding, nx = 200)
Extract the state of each individual at a given time
get_state(data, t, NAafterTmax = FALSE)
get_state(data, t, NAafterTmax = FALSE)
data |
data.frame containing |
t |
time at which extract the state |
NAafterTmax |
if TRUE, return NA if t > Tmax otherwise return the state associated with Tmax (useful when individuals has different lengths) |
a vector containing the state of each individual at time t
Cristian Preda, Quentin Grimonprez
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # get the state of each individual at time t = 6 get_state(d_JK, 6) # get the state of each individual at time t = 12 (> Tmax) get_state(d_JK, 12) # if NAafterTmax = TRUE, it will return NA for t > Tmax get_state(d_JK, 12, NAafterTmax = TRUE)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # get the state of each individual at time t = 6 get_state(d_JK, 6) # get the state of each individual at time t = 12 (> Tmax) get_state(d_JK, 12) # if NAafterTmax = TRUE, it will return NA for t > Tmax get_state(d_JK, 12, NAafterTmax = TRUE)
Plot the duration
## S3 method for class 'duration' hist(x, breaks = NULL, ...)
## S3 method for class 'duration' hist(x, breaks = NULL, ...)
x |
output of |
breaks |
number of breaks. If not given, use the Sturges rule |
... |
parameters for |
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute duration of each individual duration <- compute_duration(d_JK) hist(duration) # modify the plot using ggplot2 library(ggplot2) hist(duration) + labs(title = "Distribution of the duration")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # compute duration of each individual duration <- compute_duration(d_JK) hist(duration) # modify the plot using ggplot2 library(ggplot2) hist(duration) + labs(title = "Distribution of the duration")
Plot the number of jumps
## S3 method for class 'njump' hist(x, breaks = NULL, ...)
## S3 method for class 'njump' hist(x, breaks = NULL, ...)
x |
output of |
breaks |
number of breaks. If not given, use the Sturges rule |
... |
parameters for |
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
plot.pt()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) nJump <- compute_number_jumps(d_JK) hist(nJump) # modify the plot using ggplot2 library(ggplot2) hist(nJump, fill = "#984EA3") + labs(title = "Distribution of the number of jumps")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) nJump <- compute_number_jumps(d_JK) hist(nJump) # modify the plot using ggplot2 library(ggplot2) hist(nJump, fill = "#984EA3") + labs(title = "Distribution of the number of jumps")
Convert a matrix to a cfda data.frame
matrixToCfd(X, times = NULL, labels = NULL, byrow = FALSE)
matrixToCfd(X, times = NULL, labels = NULL, byrow = FALSE)
X |
matrix containing the states |
times |
time values. If |
labels |
id labels. If |
byrow |
if |
a data.frame in the cfda format
Other format:
convertToCfd()
,
cut_data()
,
remove_duplicated_states()
x <- matrix( c( "a", "b", "c", "c", "c", "a", "a", "a", "b", "c", "a", "b" ), ncol = 4, byrow = TRUE, dimnames = list(NULL, paste0("ind", 1:4)) ) matrixToCfd(x)
x <- matrix( c( "a", "b", "c", "c", "c", "a", "a", "a", "b", "c", "a", "b" ), ncol = 4, byrow = TRUE, dimnames = list(NULL, paste0("ind", 1:4)) ) matrixToCfd(x)
Plot the optimal encoding
## S3 method for class 'fmca' plot( x, harm = 1, states = NULL, addCI = FALSE, coeff = 1.96, col = NULL, nx = 128, ... )
## S3 method for class 'fmca' plot( x, harm = 1, states = NULL, addCI = FALSE, coeff = 1.96, col = NULL, nx = 128, ... )
x |
output of |
harm |
harmonic to use for the encoding |
states |
states to plot (default = NULL, it plots all states) |
addCI |
if TRUE, plot confidence interval (only when |
coeff |
the confidence interval is computed with +- coeff * the standard deviation |
col |
a vector containing color for each state |
nx |
number of time points used to plot |
... |
not used |
The encoding for the harmonic h
is .
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plotComponent()
,
plotEigenvalues()
,
predict.fmca()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # plot the encoding produced by the first harmonic plot(encoding) # modify the plot using ggplot2 library(ggplot2) plot(encoding, harm = 2, col = c("red", "blue", "darkgreen", "yellow")) + labs(title = "Optimal encoding")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # plot the encoding produced by the first harmonic plot(encoding) # modify the plot using ggplot2 library(ggplot2) plot(encoding, harm = 2, col = c("red", "blue", "darkgreen", "yellow")) + labs(title = "Optimal encoding")
Plot the transition graph between the different states. A node corresponds to a state with the mean time spent in this state. Each arrow represents the probability of transition between states.
## S3 method for class 'Markov' plot(x, ...)
## S3 method for class 'Markov' plot(x, ...)
x |
output of |
... |
parameters of |
Some useful extra parameters:
main
main title.
dtext
controls the position of arrow text relative to arrowhead (default = 0.3).
relsize
scaling factor for size of the graph (default = 1).
box.size
size of label box, one value or a vector with dimension = number of rows of x$P
.
box.cex
relative size of text in boxes, one value or a vector with dimension=number of rows of x$P
.
arr.pos
relative position of arrowhead on arrow segment/curve.
No return value, called for side effects
Cristian Preda
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # estimation mark <- estimate_Markov(d_JK) # transition graph plot(mark)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 100, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # estimation mark <- estimate_Markov(d_JK) # transition graph plot(mark)
Plot the probabilities of each state at each given time
## S3 method for class 'pt' plot(x, col = NULL, ribbon = FALSE, ...)
## S3 method for class 'pt' plot(x, col = NULL, ribbon = FALSE, ...)
x |
output of |
col |
a vector containing color for each state |
ribbon |
if TRUE, use ribbon to plot probabilities |
... |
only if |
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plotData()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) d_JK2 <- cut_data(d_JK, 10) pt <- estimate_pt(d_JK2) plot(pt, ribbon = TRUE)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) d_JK2 <- cut_data(d_JK, 10) pt <- estimate_pt(d_JK2) plot(pt, ribbon = TRUE)
Plot Components
plotComponent( x, comp = c(1, 2), addNames = TRUE, nudge_x = 0.1, nudge_y = 0.1, size = 4, ... )
plotComponent( x, comp = c(1, 2), addNames = TRUE, nudge_x = 0.1, nudge_y = 0.1, size = 4, ... )
x |
output of |
comp |
a vector of two elements indicating the components to plot |
addNames |
if TRUE, add the id labels on the plot |
nudge_x , nudge_y
|
horizontal and vertical adjustment to nudge labels by |
size |
size of labels |
... |
|
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plot.fmca()
,
plotEigenvalues()
,
predict.fmca()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) plotComponent(encoding, comp = c(1, 2)) # modify the plot using ggplot2 library(ggplot2) plotComponent(encoding, comp = c(1, 2), shape = 23) + labs(title = "Two first components")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) plotComponent(encoding, comp = c(1, 2)) # modify the plot using ggplot2 library(ggplot2) plotComponent(encoding, comp = c(1, 2), shape = 23) + labs(title = "Two first components")
Plot categorical functional data
plotData( data, group = NULL, col = NULL, addId = TRUE, addBorder = TRUE, sort = FALSE, nCol = NULL )
plotData( data, group = NULL, col = NULL, addId = TRUE, addBorder = TRUE, sort = FALSE, nCol = NULL )
data |
data.frame containing |
group |
vector, of the same length as the number individuals of |
col |
a vector containing color for each state (can be named) |
addId |
If TRUE, add id labels |
addBorder |
If TRUE, add black border to each individual |
sort |
If TRUE, id are sorted according to the duration in their first state |
nCol |
number of columns when |
a ggplot
object that can be modified using ggplot2
package.
On the plot, each row represents an individual over [0:Tmax].
The color at a given time gives the state of the individual.
Cristian Preda, Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
statetable()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # add a line with time Tmax at the end of each individual d_JKT <- cut_data(d_JK, Tmax = 10) plotData(d_JKT) # modify the plot using ggplot2 library(ggplot2) plotData(d_JKT, col = c("red", "blue", "green", "brown")) + labs(title = "Trajectories of a Markov process") # use the group variable: create a group with the 3 first variables and one with the others group <- rep(1:2, c(3, 7)) plotData(d_JKT, group = group) # use the group variable: remove the id number 5 and 6 group[c(5, 6)] <- NA plotData(d_JKT, group = group)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # add a line with time Tmax at the end of each individual d_JKT <- cut_data(d_JK, Tmax = 10) plotData(d_JKT) # modify the plot using ggplot2 library(ggplot2) plotData(d_JKT, col = c("red", "blue", "green", "brown")) + labs(title = "Trajectories of a Markov process") # use the group variable: create a group with the 3 first variables and one with the others group <- rep(1:2, c(3, 7)) plotData(d_JKT, group = group) # use the group variable: remove the id number 5 and 6 group[c(5, 6)] <- NA plotData(d_JKT, group = group)
Plot Eigenvalues
plotEigenvalues(x, cumulative = FALSE, normalize = FALSE, ...)
plotEigenvalues(x, cumulative = FALSE, normalize = FALSE, ...)
x |
output of |
cumulative |
if TRUE, plot the cumulative eigenvalues |
normalize |
if TRUE eigenvalues are normalized for summing to 1 |
... |
|
a ggplot
object that can be modified using ggplot2
package.
Quentin Grimonprez
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plot.fmca()
,
plotComponent()
,
predict.fmca()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # plot eigenvalues plotEigenvalues(encoding, cumulative = TRUE, normalize = TRUE) # modify the plot using ggplot2 library(ggplot2) plotEigenvalues(encoding, shape = 23) + labs(caption = "Jukes-Cantor model of nucleotide replacement")
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # plot eigenvalues plotEigenvalues(encoding, cumulative = TRUE, normalize = TRUE) # modify the plot using ggplot2 library(ggplot2) plotEigenvalues(encoding, shape = 23) + labs(caption = "Jukes-Cantor model of nucleotide replacement")
Plot reconstructed indicators
plotIndicatorsReconstruction(reconstruction, id, states = NULL)
plotIndicatorsReconstruction(reconstruction, id, states = NULL)
reconstruction |
output of |
id |
id of the individual to plot. |
states |
states to plot, by default all states are plotted |
ggplot
Quentin Grimonprez
set.seed(42) # Simulate the Jukes-Cantor model of nucleotide replacement K <- 3 Tmax <- 1 d_JK <- generate_Markov(n = 100, K = K, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 20 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) indicators <- reconstructIndicators(encoding) # we plot the first path and its reconstructed indicators iInd <- 3 plotData(d_JK2[d_JK2$id == iInd, ]) plotIndicatorsReconstruction(indicators, id = iInd) # the column state contains the state associated with the greatest indicator. # So, the output can be used with plotData function plotData(remove_duplicated_states(indicators[indicators$id == iInd, ]))
set.seed(42) # Simulate the Jukes-Cantor model of nucleotide replacement K <- 3 Tmax <- 1 d_JK <- generate_Markov(n = 100, K = K, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 20 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) indicators <- reconstructIndicators(encoding) # we plot the first path and its reconstructed indicators iInd <- 3 plotData(d_JK2[d_JK2$id == iInd, ]) plotIndicatorsReconstruction(indicators, id = iInd) # the column state contains the state associated with the greatest indicator. # So, the output can be used with plotData function plotData(remove_duplicated_states(indicators[indicators$id == iInd, ]))
Predict the principal components for new trajectories
## S3 method for class 'fmca' predict( object, newdata = NULL, method = c("precompute", "parallel"), verbose = TRUE, nCores = max(1, ceiling(detectCores()/2)), ... )
## S3 method for class 'fmca' predict( object, newdata = NULL, method = c("precompute", "parallel"), verbose = TRUE, nCores = max(1, ceiling(detectCores()/2)), ... )
object |
output of |
newdata |
data.frame containing |
method |
computation method: "parallel" or "precompute": precompute all integrals (efficient when the number of unique time values is low) |
verbose |
if TRUE print some information |
nCores |
number of cores used for parallelization (only if method == "parallel"). Default is half the cores. |
... |
parameters for |
principal components for the individuals
Quentin Grimonprez
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plot.fmca()
,
plotComponent()
,
plotEigenvalues()
,
print.fmca()
,
summary.fmca()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # predict principal components d_JK_predict <- generate_Markov( n = 5, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK_predict2 <- cut_data(d_JK, Tmax) pc <- predict(encoding, d_JK_predict2, nCores = 1)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 Tmax <- 6 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov( n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 6 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) # predict principal components d_JK_predict <- generate_Markov( n = 5, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax, labels = c("A", "C", "G", "T") ) d_JK_predict2 <- cut_data(d_JK, Tmax) pc <- predict(encoding, d_JK_predict2, nCores = 1)
fmca
objectPrint a fmca
object
## S3 method for class 'fmca' print(x, n = 6, ...)
## S3 method for class 'fmca' print(x, n = 6, ...)
x |
|
n |
maximal number of rows and cols to print |
... |
Not used. |
No return value, called for side effects
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plot.fmca()
,
plotComponent()
,
plotEigenvalues()
,
predict.fmca()
,
summary.fmca()
The reconstruction formula is:
)
with , the i-th principal component,
encoding
and
reconstructIndicators( x, nComp = NULL, timeValues = NULL, propMinEigenvalues = 1e-04 )
reconstructIndicators( x, nComp = NULL, timeValues = NULL, propMinEigenvalues = 1e-04 )
x |
output of |
nComp |
number of components to use for the reconstruction. By default, all are used. |
timeValues |
vector containing time values at which compute the indicators. If NULL, the time values from the data |
propMinEigenvalues |
Only if nComp = NULL. Minimal proportion used to estimate the number of non-null eigenvalues |
a data.frame with columns: time, id, state1, ..., stateK, state. state1 contains the estimated indicator values for the first state. state contains the state with the maximum values of all indicators
Quentin Grimonprez
set.seed(42) # Simulate the Jukes-Cantor model of nucleotide replacement K <- 3 Tmax <- 1 d_JK <- generate_Markov(n = 100, K = K, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 20 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) indicators <- reconstructIndicators(encoding) # we plot the first path and its reconstructed indicators iInd <- 3 plotData(d_JK2[d_JK2$id == iInd, ]) plotIndicatorsReconstruction(indicators, id = iInd) # the column state contains the state associated with the greatest indicator. # So, the output can be used with plotData function plotData(remove_duplicated_states(indicators[indicators$id == iInd, ]))
set.seed(42) # Simulate the Jukes-Cantor model of nucleotide replacement K <- 3 Tmax <- 1 d_JK <- generate_Markov(n = 100, K = K, Tmax = Tmax) d_JK2 <- cut_data(d_JK, Tmax) # create basis object m <- 20 b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4) # compute encoding encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1) indicators <- reconstructIndicators(encoding) # we plot the first path and its reconstructed indicators iInd <- 3 plotData(d_JK2[d_JK2$id == iInd, ]) plotIndicatorsReconstruction(indicators, id = iInd) # the column state contains the state associated with the greatest indicator. # So, the output can be used with plotData function plotData(remove_duplicated_states(indicators[indicators$id == iInd, ]))
Remove duplicated consecutive states from data. If for an individual there is two or more consecutive states that are identical, only the first is kept. Only time when the state changes are kept.
remove_duplicated_states(data, keep.last = TRUE)
remove_duplicated_states(data, keep.last = TRUE)
data |
data.frame containing |
keep.last |
if TRUE, keep the last state for every individual even if it is a duplicated state. |
data
without duplicated consecutive states
Quentin Grimonprez
Other format:
convertToCfd()
,
cut_data()
,
matrixToCfd()
data <- data.frame( id = rep(1:3, c(10, 3, 8)), time = c(1:10, 1:3, 1:8), state = c(rep(1:5, each = 2), 1:3, rep(1:3, c(1, 6, 1))) ) out <- remove_duplicated_states(data)
data <- data.frame( id = rep(1:3, c(10, 3, 8)), time = c(1:10, 1:3, 1:8), state = c(rep(1:5, each = 2), 1:3, rep(1:3, c(1, 6, 1))) ) out <- remove_duplicated_states(data)
Calculates a frequency table counting the number of times each pair of states were observed in successive observation times.
statetable(data, removeDiagonal = FALSE)
statetable(data, removeDiagonal = FALSE)
data |
data.frame containing |
removeDiagonal |
if TRUE, does not count transition from a state i to i |
a matrix of size K*K
containing the number of transition for each pair
Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
summary_cfd()
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # table of transitions statetable(d_JK)
# Simulate the Jukes-Cantor model of nucleotide replacement K <- 4 PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K)) lambda_PJK <- c(1, 1, 1, 1) d_JK <- generate_Markov(n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = 10) # table of transitions statetable(d_JK)
Get a summary of the data.frame containing categorical functional data
summary_cfd(data, max.print = 10)
summary_cfd(data, max.print = 10)
data |
data.frame containing |
max.print |
maximal number of states to display |
a list containing:
nRow
number of rows
nInd
number of individuals
timeRange
minimal and maximal time value
uniqueStart
TRUE, if all individuals have the same time start value
uniqueEnd
TRUE, if all individuals have the same time start value
states
vector containing the different states
visit
number of individuals visiting each state
Quentin Grimonprez
Other Descriptive statistics:
boxplot.timeSpent()
,
compute_duration()
,
compute_number_jumps()
,
compute_time_spent()
,
estimate_pt()
,
hist.duration()
,
hist.njump()
,
plot.pt()
,
plotData()
,
statetable()
data(biofam2) summary_cfd(biofam2)
data(biofam2) summary_cfd(biofam2)
Summary of a fmca
object
## S3 method for class 'fmca' summary(object, n = 6, ...)
## S3 method for class 'fmca' summary(object, n = 6, ...)
object |
|
n |
maximal number of rows and cols to print |
... |
Not used. |
No return value, called for side effects
Other encoding functions:
compute_optimal_encoding()
,
get_encoding()
,
plot.fmca()
,
plotComponent()
,
plotEigenvalues()
,
predict.fmca()
,
print.fmca()