# Problem
# For positive integers a and n, a modulo n (written amodn in shorthand) is the remainder when a is divided by n. For example, 29mod11=7 because 29=11×2+7.
# Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that a and b are congruent modulo n if amodn=bmodn; in this case, we use the notation a≡bmodn.
# Two useful facts in modular arithmetic are that if a≡bmodn and c≡dmodn, then a+c≡b+dmodn and a×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.
# As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.
#
# Given: A protein string of length at most 1000 aa.
# Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)
#
# Sample Dataset
# MA
# Sample Output
# 12
# 這道題目是關(guān)于模運算和蛋白質(zhì)翻譯的問題。給定一個不超過1000個氨基酸(字母“F”,“L”,“I”,“M”,“V”,“S”,“P”,“T”,“A”,“Y”,“H”,“Q”,“N”,“K”,“D”,“E”,“C”,“W”,“R”,“G”的字符串),求可以翻譯成該蛋白質(zhì)的不同RNA序列數(shù)目,答案對1,000,000取模。
# 簡單介紹一下蛋白質(zhì)翻譯。RNA是由四種堿基(腺嘌呤、尿嘧啶、胞嘧啶和鳥嘌呤)組成的。氨基酸則由RNA上的三個堿基編碼而來,其中有一個“起始密碼子”(AUG),還有三個“停止密碼子”(UAA,UAG和UGA),分別表示蛋白質(zhì)鏈的開端和結(jié)束。因此,一個蛋白質(zhì)可以用RAN序列編碼,一個RNA序列也可能被翻譯成不同的蛋白質(zhì)。
# 這里需要用到模運算。模運算是一種基本的整數(shù)運算,它給出了除法的余數(shù)。例如a mod n(記作amodn)表示a除以n的余數(shù)。當(dāng)比較大的整數(shù)進行計算時,可以使用模運算來簡化問題。如果兩個整數(shù)在模n時相等,則它們被稱為模n同余。
# 這道題目的思路如下:
# 準(zhǔn)備好DNA中20個氨基酸對應(yīng)的RNA序列表格;
# 對于給定的蛋白質(zhì)序列,根據(jù)RNA序列表格找到所有可能的RNA序列組合;
# 注意考慮到終止密碼子;
# 對于所有可能的RNA序列,計算它們的總數(shù),并將結(jié)果對1,000,000取模。
# 計算RNA序列數(shù)
def count_rna_strings(protein_string):
? ? # 翻譯表
? ? codon_table = {
? ? ? ? "UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",
? ? ? ? "UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",
? ? ? ? "UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",
? ? ? ? "UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",
? ? ? ? "UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",
? ? ? ? "UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",
? ? ? ? "UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",
? ? ? ? "UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",
? ? ? ? "UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",
? ? ? ? "UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",
? ? ? ? "UAA": "*", "CAA": "Q", "AAA": "K", "GAA": "E",
? ? ? ? "UAG": "*", "CAG": "Q", "AAG": "K", "GAG": "E",
? ? ? ? "UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",
? ? ? ? "UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",
? ? ? ? "UGA": "*", "CGA": "R", "AGA": "R", "GGA": "G",
? ? ? ? "UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"
? ? }
? ? # 統(tǒng)計RNA序列數(shù)量
? ? rna_count = 1? # 起始為1,包含終止密碼子
? ? for aa in protein_string:
? ? ? ? codons_count = list(codon_table.values()).count(aa)
? ? ? ? rna_count = (rna_count * codons_count) % 1000000
? ? return rna_count * 3? # 乘上三個終止密碼子
# 給出蛋白質(zhì)序列
protein_string = "MA"
# 輸出結(jié)果
print(count_rna_strings(protein_string))