function to convert genotypic data in transposed-ped format (.tped and .tfam) to internal genotypic data formatted file

convert.snp.tped {GenABEL}R Documentation

function to convert genotypic data in transposed-ped format (.tped and .tfam) to internal genotypic data formatted file

Description

Converts genotypic data in transposed-ped format (.tped and .tfam) to internal genotypic data formatted file

Usage

convert.snp.tped(tpedfile, tfamfile, outfile,strand = "+", bcast = 10000)

Arguments

tpedfile Name of transposed-ped format (.tped) file to read
tfamfile Name of individual data (.tfam) file to read
outfile Name for output data file
strand Specification of strand, one of "u" (unknown), "+", "-" or "file". In the latter case, extra column specifying the strand (again, one of "u", "+", or "-") should be included on the infile.
bcast Reports progress every time this number of SNPs have been read

Details

The transposed-ped file format may be preferred when extremely large numbers of markers have been genotyped. This file format is supported by plink! See http://pngu.mgh.harvard.edu/~purcell/plink/ for details.

The conversion is performed by C++ code that is both fast and memory efficient.

The genotype data are stored in the main transposed-ped format file, usually with a .tped file extension. If there are NSNP markers genotyped in NIND individuals, this file has NSNP rows and 4+NIND*2 columns. There is one row per marker, and no header. The first four columns are:

Chromosome

Marker name (e.g. rs number)

Genetic position (in Morgans)

Physical position (in bp)

These are followed by two columns per individual, which contain the genotype, coded as two characters. The ‘0’ character is used for missing data. For example, a file containing data for six individuals genotyped at two SNPs would look like:

1 rs1234 0 5000650 A A 0 0 C C A C C C C C

1 rs5678 0 5000830 G T G T G G T T G T T T

In this example, the second individual is missing data for SNP rs1234, etc. The alleles can be coded by any two distinct characters, e.g. 'C' and 'G', or '1' and '2'. The '0' character is reserved for missing data, and each individual genotype must be either complete, or completely missing. In the current implementation, only the physical positions of the SNPs are read, and the genetic positions are ignored.

The indices for the columns are stored in a separate file, usually with a .tfam file extension. Traditionally, this file has six columns, and no header. In the current implementation, only the second column is used. This column must contain the individual id. Other columns are ignored.

Value

Does not return any value

Note

The function does not check if "outfile" already exists, thus it is always over-written

Author(s)

Toby Johnson <toby.johnson@unil.ch>

See Also

convert.snp.ped, convert.snp.illumina, convert.snp.text, convert.snp.mach, load.gwaa.data

Examples

#
# convert.snp.tped("c21.tped",map="c21.tfam",out="c21.raw")
#

[Package GenABEL version 1.6-7 Index]