Computes (average) Idenity-by-State for a set of people and markers

ibs {GenABEL}R Documentation

Computes (average) Idenity-by-State for a set of people and markers

Description

Given a set of SNPs, computes a matrix of average IBS for a group of people

Usage

ibs(data, snpsubset, idsubset, cross.idsubset, weight="no", snpfreq)

Arguments

data object of snp.data-class
snpsubset Index, character or logical vector with subset of SNPs to run analysis on. If missing, all SNPs from data are used for analysis.
idsubset IDs of people to be analysed. If missing, all people from data are used for analysis.
cross.idsubset Parameter allowing parallel implementation. Not to be used normally. If supplied together with idsubset, the ibs/kinship for all pairs between idsubset and cross.idsubset computed.
weight "no" for direct IBS computations, "freq" to weight by allelic frequency
snpfreq when option weight="freq" used, you can provide fixed allele frequencies

Details

This function facilitates quality control of genomic data. E.g. people with exteremly high (close to 1) IBS may indicate duplicated samples (or twins), simply high values of IBS may indicate relatives.

When weight "freq" is used, IBS for a pair of people i and j is computed as

f_{i,j} = Σ_k \frac{(x_{i,k} - p_k) * (x_{j,k} - p_k)}{(p_k * (1 - p_k))}

where k changes from 1 to N = number of SNPs GW, x_{i,k} is a genotype of ith person at the kth SNP, coded as 0, 1/2, 1 and p_k is the frequency of the "+" allele. This apparently provides an unbiased estimate of the kinship coefficient.

Only with "freq" option monomorphic SNPs are regarded as non-informative.

ibs() operation may be very lengthy for a large number of people.

Value

A (Npeople X Npeople) matrix giving average IBS (kinship) values between a pair below the diagonal and number of SNP genotype measured for both members of the pair above the diagonal.

On the diagonal, homozygosity 0.5*(1+inbreeding) is provided.

attr(computedobject,"Var") returns variance (replaing the diagonal when the object is used by egscore

Author(s)

Yurii Aulchenko

See Also

check.marker, summary.snp.data, snp.data-class

Examples

data(ge03d2c)
# compute IBS based on a random sample of 1000 autosomal marker
a <- ibs(ge03d2c,snps=sample(autosomal(ge03d2c),1000,replace=FALSE))
a[1:5,1:5]
mds <- cmdscale(as.dist(1-a))
plot(mds)
# identify smaller cluster of outliers
km <- kmeans(mds,centers=2,nstart=1000)
cl1 <- names(which(km$cluster==1))
cl2 <- names(which(km$cluster==2))
if (length(cl1) > length(cl2)) cl1 <- cl2;
cl1
# PAINT THE OUTLIERS IN RED
points(mds[cl1,],pch=19,col="red")
# compute genomic kinship matrix to be used with e.g. polygenic, mmscore, etc
a <- ibs(ge03d2c,snps=sample(autosomal(ge03d2c),1000,replace=FALSE),weight="freq")
a[1:5,1:5]
# now replace diagonal with EIGENSTRAT-type of diaganal to be used for egscore
diag(a) <- hom(ge03d2c[,autosomal(ge03d2c)])$Var
a[1:5,1:5]

[Package GenABEL version 1.6-7 Index]