原文:https://medium.com/@mattklein123/the-human-scalability-of-devops-e36c37d3db6a
主要觀點(diǎn):
1.不要總想著招聘全能程序員,應(yīng)該招聘某個領(lǐng)域的專才程序員。
2.專業(yè)的運(yùn)維人員仍然是必須的,他們關(guān)注更底層的運(yùn)維:網(wǎng)絡(luò),安全,伸縮擴(kuò)展,等等。
3.開發(fā)人員也要參與日常的運(yùn)維,但更側(cè)重產(chǎn)品相關(guān)的運(yùn)維。
4.專業(yè)運(yùn)維人員嵌入到各個開發(fā)項(xiàng)目組,作為開發(fā)組和運(yùn)維組溝通的橋梁,同時對開發(fā)人員進(jìn)行運(yùn)維培訓(xùn):比如服務(wù)接口文檔的編寫,最佳運(yùn)維經(jīng)驗(yàn),等。
What is the right SRE model?
Given the plethora of examples currently implemented in the industry, there is no right answer to this question and all models have their holes and resultant issues. I will outline what I think the sweet spot is based on my observations over the last 10 years:
Recognize that operations and reliability engineering is a discrete and hugely valuable skillset. Our rush to automate everything and the idea that software engineers are fungible is marginalizing a subset of the engineering workforce that is equally (if not more!) valuable than software engineers. An operations engineer doesn’t have to be comfortable with empty source files just the same as a software engineer doesn’t have to be comfortable debugging and firefighting during a stressful outage. Operations engineers and software engineers are partners, not interchangeable cogs.
SREs are not on-call, dashboard, and deploy monkeys. They are software engineers who focus on reliability tasks not product tasks. An ideal structure requires all engineers to perform basic operational tasks including on-call, deployments, monitoring, etc. I think this is critically important as it helps to avoid class/job stratification between reliability and software engineers and makes software engineers more directly accountable for product quality.
SREs should be embedded into product teams, while not reporting to the product team engineering manager. This allows the SREs to scrum with their team, gain mutual trust, and still have appropriate checks and balances in place such that a real conversation can take place when attempting to weigh reliability versus features.
The goal of embedded SREs is to increase the reliability of their products by implementing reliability oriented features and automation, mentoring and educating the rest of the team on operational best practices, and acting as a liaison between product teams and infrastructure teams (feedback on documentation, pain points, needed features, etc.).
A successful SRE program implemented early in the growth phase as outlined above, along with real investment in new hire and continuing education and documentation, can raise the bar of the entire engineering organization while mitigating many of the human scaling issues previously described.