SAS Base

變量名

名字的長(zhǎng)度要小于等于 32 個(gè)字節(jié)。(一個(gè)字母 1 個(gè)字節(jié), 一個(gè)漢字 2 個(gè)字節(jié))
以字母或下劃線開頭。
可以包含字母、 數(shù)字、 或者是下劃線, 不能是%$!*&#@。
可以是小寫或大寫字母, 且不區(qū)分大小寫
Missing numeric data are represented by a single period (.) and missing character data are represented by blanks.


library name

1-8個(gè)字符,字母或者下劃線開頭,剩余部分為字母,數(shù)字或者下劃線

注釋

星號(hào)開頭 ;結(jié)尾
星號(hào)斜杠開頭, 星斜杠結(jié)尾 asterisk (*)

DATA steps與PROC steps區(qū)別


The DATA statement does three things

  1. Tells SAS that a DATA step is starting.
  2. Names the SAS dataset being created.
  3. Set variables used in the DATA step to missing values

three default windows

1.program editor window
2.log window
3.output window

The basics of using SAS

  1. Prepare the SAS program
  2. Submit it for analysis
  3. Review the resulting log for errors
  4. Examine the output files to view the results of your analysis

Executing the program

  1. Pull down the Locals menu and select Submit.
  2. Click on the run icon on taskbar, which is a picture of a man running.
  3. Push F8.
  4. Highlight text and click on run symbol
  5. Note: DATA or PROC step is not executed until next DATA and PROC. Use RUN; statement to force execution.

讀入dat文件;

DATA NAME;
INFILE 'E:\data\a.dat' FIRSTOBS=4 DLM=',';
INPUT V1 1-5   V2 5-10   V3 $ 15; 
RUN;
PROC PRINT DATA=NAME; RUN;

infile控制

格式 INFILE 'AAAAA.DAT' XXX;
FIRSTOBS=行數(shù) 從哪一行開始讀取數(shù)據(jù)
OBS=行數(shù) 一直讀取到哪一行
MISSOVER 表示數(shù)據(jù)讀到行末時(shí),如果字段長(zhǎng)度短于申明字段長(zhǎng)度,則不從下一行讀取數(shù)據(jù),否則會(huì)自動(dòng)從下一行讀取數(shù)據(jù)
TURNCOVER column input中指定最長(zhǎng)的一行

INPUT Notes

(1) Duplicate formats can be used when variables have the same format. The examples below represent the same formats of variables x1-x5.

INPUT x1 4. x2 4. x3 4. x4 4. x5 4.;
INPUT (x1 x2 x3 x4 x5) (4. 4. 4. 4. 4.);
INPUT (x1-x5) (5*4.);

(2) @@ tells SAS to hold the line of raw data and use it when processing the next
observation. The @@ must be the last entry in the INPUT statement.
(3) @ tells SAS to hold this line of data for possible use by INPUT statements later in theDATA step. The @ must be the last entry in the INPUT statement.
(4) / tells SAS to move to the next line of the raw dataset.
(5) #n tells SAS to skip to the nth line of the raw data for the observation.
(6) @n tells SAS to move to the nth column.

特殊字符

@40 跳至第40列 @‘a(chǎn)a’ 跳至aa后面
斜線/ 跳至原始數(shù)據(jù)第二行
#2 跳至某觀測(cè)值第二行
重復(fù)觀測(cè)值,將@@放在input句尾
input句尾加@, trailing at, 可用來選擇部分?jǐn)?shù)據(jù), 看例子


數(shù)據(jù)步讀取分隔符文件 delimited files

DLM=',' 指定逗號(hào)分隔符 '09'x Tab分隔符
DSD 忽略引號(hào)中數(shù)據(jù)的分隔符,例如一個(gè)觀測(cè) Joseph,76,"Red Racers, Washington"非引號(hào)中的逗號(hào)能識(shí)別成分隔符, 而引號(hào)中的逗號(hào)不能識(shí)別; 自動(dòng)將字符串中的引號(hào)去掉; 將兩個(gè)相鄰的分隔符當(dāng)作缺失值來處理。

Excel數(shù)據(jù)讀取

PROC IMPORT DATAFILE='D:\A.XLS' OUT=A  REPLACE DBMS=XLS; GETNAMES=YES; SHEET="Sheet1"; RUN;
PROC PRINT DATA=A; RUN;

OUT= 輸出數(shù)據(jù)集名稱
DBMS= XLS XLSX

sas7dbat文件讀取 (桌面上的文件)

data new; set 'C:\Users\sdkyc\Desktop\hsb2.sas7bdat'; run;
proc print data=new; run;

數(shù)據(jù)集是臨時(shí)還是永久

變量賦值與運(yùn)算

IF-THEN DO IF-ELSE

  1. DO 與END 是一個(gè)組合,內(nèi)部actions都會(huì)被執(zhí)行
DATA A;
INFILE 'C:\A.DAT';
INPUT V1 $ V2 V3;
IF V2 = .  THEN   V4='MISSING';
  ELSE IF V2<100  THEN   V4='LOW';
  ELSE IF V2<1000  THEN   V4='MEDIUM';
  ELSE V4 = 'HIGH';
RUN;
  1. 可以用來構(gòu)造子集

使用數(shù)組簡(jiǎn)化程序 ARRAY

ARRAY array-name <{n}> <$> <length> <elements> <(initialvalues)>;
array-name - is the name of the array.
{n} - is either the dimension of the array, or an asterisk (*) to indicate that the dimension is determined from the number of array elements or initial values.
$ indicates that the array type is character.
length - is the maximum length of elements in the array. For character arrays, the maximum length cannot exceed 200.
elements - are the variables that make up the array and they exist in a dataset or are created before the array definition.
initial-values - are the values to use to initialize some or all of the array elements. Separate these values with commas or blanks

ARRAY rain {5} janr febr marr aprr mayr;
ARRAY days{7} d1-d7;
ARRAY month{*} jan feb jul oct nov;
ARRAY x{*} _NUMERIC_;
ARRAY qbx{10};
ARRAY meal{3};

關(guān)于各個(gè)PROC的note鏈接

https://stats.idre.ucla.edu/other/annotatedoutput/

PROC CONTENTS 獲取數(shù)據(jù)集的描述部分,不包括數(shù)據(jù)本身

PROC MEANS

輸出一些Descriptive Statistics 功能與univariate重復(fù)
maxdec 小數(shù)位個(gè)數(shù)
proc means data=a N NMISS MEAN STD STDERR MAXDEC=4; run;

PROC UNIVARIATE t-test sample mean mu0

Test for location就是一個(gè)two-tail的t-test,查看student's t value,如果P<α,wirte的平均值不等于30.
proc univariate data = "D:\hsb2" plots normal mu0=30; var write; run;
用來測(cè)試normality,畫plot圖找到Shapiro-Wilk P value大于α,正態(tài)分布
proc univariate data=a normal plot; var write; run;

1.These tests check the assumption that the data is distributed as a normal distribution.
2.Null hypothesis: data is normal vs Alternate hypothesis: data not normal.
3.P-value large (eg > 0.05) indicate the data follow normal (we accept the null hypothesis) .
4.If 6 < sample size < 2001 use Shapiro-Wilk.
5.Sample size > 2000 use Kolmogorov-Smirnov test.
6.Within the appropriate sample size range Shapiro-Wilk is more powerful than Kolmogorov-Smirnov test.
7.Any departure from Skewness =0 and kurtosis = 0 implies non normality.


PROC FREQ TABLES chisq

用來測(cè)試變量之間有無association,相互是否獨(dú)立。找到輸出結(jié)果中chi-square值,大值對(duì)應(yīng)小p-value。如果P<α,兩個(gè)變量有相關(guān)關(guān)系,不相互獨(dú)立。
English: A large chi-square statistic will correspond to small p-value. If the p-value is small enough (say < 0.05), then we will reject the null hypothesis that the two variables are independent and conclude that there is an association between the row and the column variables.
PROC FREQ DATA=CLASSFIT2; TABLES SEX*HT/CHISQ; RUN;

PROC REG

Assumption

a.Normality of errors: The error distribution is normal.
b.Normality of errors is checked by doing residual analysis. In residual analysis we first calculate the residuals (r = y - ( ??) ???????????????) then verify the normality of the residuals using proc univariate or Q-Q plots.
c.Independence: The errors or observations are independent of each other. Example: apple stock price recorded on 10 consecutive days. Here the 10 observations are not independent
d.變量必須是numerical value

PROC ANOVA

Assumption sampled populations are normally distributed.
one-way ANOVA----only one factor (一個(gè)變量,這個(gè)變量可以有幾個(gè)level)
查看ppt

PROC GLM contrast

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#glm_toc.htm
1.問題:不同年齡的身高平均值相同嗎?μ1=μ2=μ3=μ4
proc glm data=a; class age; model height=age; run;
2.問題: 11歲與12歲孩子的平均身高13-16歲孩子的平均身高有區(qū)別嗎

proc glm data=a; class age; 
model height=age;
contrast '11&12 vs. rest' 
age 2 2 -1 -1 -1 -1; run; quit;

PROC CORR

查看變量間的相關(guān)系數(shù) pearson correlation coefficients,負(fù)值 負(fù)相關(guān);正值正相關(guān)。
nosimple 不顯示Descriptive Statistics
proc corr data = "D:\hsb2" pearson nosimple; var read write; run;

PROC TTEST t-test

Assumption: all variables are normally distributed.

  1. Single sample t-test 例子:檢驗(yàn)score的平均值是否與50相同, p小于α,顯著不同
    proc ttest data="D:\hsb2" H0=50; var score; run;
  2. Dependent group t-test (paired t-test) 例子:一群學(xué)生都考了兩門考試,學(xué)生的write 成績(jī)與read成績(jī)的平均值是否相同, p小于α,顯著不同
    proc ttest data="D:\hsb2"; paired write*read; run;
  3. Independent group t-test 例子:男女性別對(duì)write成績(jī)有無影響

如果equality of variances Pr>F的值小于α, 那么兩個(gè)性別group的variance不同,必須選擇Satterthwaite (unequal)方法,然后查看這個(gè)方法對(duì)應(yīng)的Pr>|t|
如果equality of variances Pr>F的值小于α,選Satterhwaite,否則選pooled
proc ttest data="D:\hsb2"; class sex; var write; run;

PROC NPAR1WAY

可以用來Wilcoxon test,問題舉例:
Are test scores different from 4th grade to 5th grade on the same students?
Does a particular diet drug have an effect on BMI when tested one the same individuals?
該test的假設(shè)是:
Data comes from two matched, or dependent, populations.
The data is continuous.
Because it is a non-parametric test it does not require a special distribution of the dependent variable in the analysis. 對(duì)數(shù)據(jù)的distribution不做要求??!
尤其適用small sample size

one- and two-tail test

P value

如果 test H0=0,結(jié)果p<α 那么reject the H0,the mean is significantly different from 0.

預(yù)制代碼

proc print data= ; run;

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容