10 overlapping CR1 cDNA clones that span 5.5 kb were isolated from a tonsillar library and sequenced in whole or in part. A single long open reading frame beginning at the 5' end of the clones and extending 4.7 kb downstream to a stop codon was identified. This sequence represents approximately 80% of the estimated 6 kb of coding sequence for the F allotype of CR1. Three tandem, direct, long homologous repeats (LHRs) of 450 amino acids were identified. Analysis of the sequences of tryptic peptides provided evidence for a fourth LHR in the F allotype of CR1. Amino acid identity between the LHRs ranged from 70% between the first and third repeats to 99% between the NH2-terminal 250 amino acids of the first and second repeats. Each LHR comprises seven short consensus repeats (SCRs) of 60-70 amino acids that resemble the SCRs of other C3/C4 binding proteins, such as complement receptor type 2, factors B and H, C4 binding protein, and C2. Two additional SCRs join the LHRs to a single membrane-spanning domain of 25 amino acids; thus, the F allotype of CR1 probably contains at least 30 SCRs, 23 of which have been sequenced. Each SCR is predicted to form a triple loop structure in which the four conserved half-cystines form disulfide linkages. The linear alignment of 30 SCRs as a semi-rigid structure would extend 1,140A from the plasma membrane and might facilitate the interaction of CR1 with C3b and C4b located within the interstices of immune complexes and microbial cell walls. The COOH-terminal cytoplasmic domain of 43 residues contains a six-amino-acid sequence that is homologous to the sequence in the epidermal growth factor receptor that is phosphorylated by protein kinase C.