用多了python,基礎(chǔ)常用的技能也不多,這里匯總下。
1. 數(shù)據(jù)類型轉(zhuǎn)換
1.1. int <-> string
//convert int to string : s = str(int_value)
//convert string to int : int = int(str_value)
1.2. string <-> list
//convert string to list : l = list(str)
str0 = "asdf"
list0 = list(str0)
print list0 #['a','s','d','f']
str1 = "www.google.com"
list1 = str1.split(".")
print list1 #['www','google','com']
str2 = "i am yanan"
list2 = str2.split(" ")
print list2 #['i','am','yanan']
//convert list to string : s = "".join(list) 或 s = ".".join(list)
list0 = ['a','b','c']
str0 = "".join(list0)
print str0 #abcd
str1 = ".".join(list0)
print str1 #a.b.c.d
1.3. list <-> dictionary
2. 集合:交集、并集、差補、對稱差分
??相比有序的列表,集合對象是無序的,已經(jīng)是Python的基本數(shù)據(jù)類型,被創(chuàng)建的唯一方法是其工廠方法set()和frozenset(),分別對應(yīng)可變集合set(可以添加或刪除元素)和不可變集合frozenset。
list0 = [0,1,3,5,9]
list1 = [1,3,5]
list2 = [3,5,7]
0.list <-> set 列表和集合的互轉(zhuǎn)
//convert list to set : set(list)
>>>s = set(list0)
>>>s
set([0,1,3,5,9])
//convert set to list : list(set)
>>>list(s)
[0,1,3,5,9]
1.操作符和內(nèi)建方法實現(xiàn)交集、并集、差補、對稱差分
1.1. 交集
>>>set(list1) & set(list2)
set([3,5])
>>>list(set(list1) & set(list2))
[3,5]
>>>list(set(list1).intersection(set(list2)))
[3,5]
1.2. 并集
>>>set(list1) | set(list2)
set([1,3,5,7])
>>>list(set(list1) | set(list2))
[1,3,5,7]
>>>list(set(list1).union(set(list2)))
[1,3,5,7]
1.3. 差補或相對補集(s-t指結(jié)果中元素只屬于s不屬于t)
list1 = [1,3,5]
list2 = [3,5,7]
list1相對于list2差1,list2相對于list1差7
>>>set(list1) - set(list2)
set([1])
>>>list(set(list1) - set(list2))
[1]
>>>list(set(list1).difference(set(list2)))
[1]
1.4. 對稱差分
>>>set(list1) ^ set(list2)
set([1,7])
>>>list(set(list1) ^ set(list2))
[1,7]
>>>list(set(list1).symmetric_difference(set(list2)))
[1,7]

四種集合類型的區(qū)別.png
3. 特殊容器類型的模塊:collections
??collections模塊自Python 2.4版本開始被引入,包含了dict、set、list、tuple以外的一些特殊的容器類型,分別是:
- OrderedDict類:排序字典,是字典的子類。引入自2.7。
- namedtuple()函數(shù):命名元組,是一個工廠函數(shù)。引入自2.6。
- Counter類(***):為hashable對象計數(shù),是字典的子類。引入自2.7。
- deque:雙向隊列。引入自2.4。
- defaultdict:使用工廠函數(shù)創(chuàng)建字典,使不用考慮缺失的字典鍵。引入自2.5。
文檔參見:collections.Counter
//計數(shù)器Counter
from collections import Counter
>>>li = ['aa','aa','aa','bb','bb','cc']
>>>c = Counter(li)
>>>c
Counter({'aa':3,'bb':2,'cc':1})
>>>c_top_list = c.most_common(2)
>>>c_top_list
[('aa',3),('bb',2)]
>>>c_top_dict = dict(c_top_list)
>>>c_top_dict
{'aa':3,'bb':2}
>>>c_top_sorted_list = sorted(c_top_dict.items(),key=lambda item:item[1],reverse=True)
>>>c_top_sorted_list
[('aa',3),('bb',2)]
4. 文件或目錄模塊:os
import os
//列出文件夾中所有文件
files_list = []
if os.path.exists(dir_path):
files = os.listdir(dir_path)
files.sort()
for i in range(len(files)):
file_path = os.path.join(dir_path,files[i])
files_list.append(file_path)
print files_list #absolute path of all files in dir_path
//創(chuàng)建文件夾
if not os.path.exists(dir_path):
os.system('mkdir %s'%dir_path)
//取存在的非空的文件
if os.path.exists(file_path):
if os.path.getsize(file_path):#返回文件大小,如果文件不存在就返回錯誤
print 'file exists and is not empty.'
//區(qū)別文件幾個路徑
os.path.abspath(path) #返回絕對路徑
os.path.dirname(path) #返回文件路徑
os.path.basename(path) #返回文件名
//刪除某種類型的文件-擴展名區(qū)別-
os.path.splitext(path) #分割路徑,返回文件路徑名和文件擴展名的元組
if os.path.splitext(file_path)[1] == '.txt' :
os.remove(file_path)
shutil:高級的文件操作模塊-復(fù)制、刪除等,對os的補充
import shutil
shutil.rmtree(dir_path) 遞歸刪除一個目錄以及目錄內(nèi)的所有內(nèi)容
5. 獲取命令行參數(shù)
#-*- coding:utf-8 -*-
import os
import time
import sys
'''some function description of current script'''
def method1():
code...
def main():
##some global paras
time_start = time.time()
if len(sys.argv) < 2:
print 'no pe or se, no viruses or bacterias specified.'
sys.exit()
if sys.argv[1].startswith('-'):
option = sys.argv[1][1:]
if option == 'version':
print 'Version 1.0.0\n----------------'
if option == 'help':
print 'this is NGS analisis pipelines.\nMainly,for se or pe sequencing,viruses or bacterias identification.\n----------------'
if sys.argv[1] != '-se' and sys.argv[1]!= '-pe' and len(sys.argv) == 2:
print 'And,you need to resign the para as \'-pe\'or\'-se\'.'
sys.exit()
if sys.argv[1] == '-se' or sys.argv[1]== '-pe':
if len(sys.argv) == 2:
print 'you need to supply the para(\'-viruses\'or\'-bacterias\').'
sys.exit()
if sys.argv[2] != '-viruses' and sys.argv[2]!= '-bacterias' and len(sys.argv) == 3:
print 'you need to resign the para as \'-viruses\'or\'-bacterias\'.'
sys.exit()
else:
if sys.argv[1].startswith('-') and sys.argv[2].startswith('-'):
option_method = sys.argv[1][1:]
option_database = sys.argv[2][1:]
if option_method == 'pe':
print 'this is for pair end input fastq datas.'
if option_database == 'viruses':
pipeliner_pe(base_path,option_database)
if option_method == 'se':
print 'this is for single end input fastq datas.'
if option_database == 'viruses':
pipeliner_se(base_path,option_database)
time_end = time.time()
total_time = time_end - time_start
print 'total_time:%s'%(total_time)
if __name__ == '__main__':
main()
6. 生物信息中,求read的反向互補序列
seq = 'ATGCATGC'
"".join(list(reversed(seq)))
seq[::-1]
MORE is comming:
- 讀寫excel模塊:xlrd/xlwt/pandas
- 畫圖模塊:matplotlib
- 訪問MySQL數(shù)據(jù)庫
END