Book contents
- Frontmatter
- Contents
- Acknowledgments
- About This Book
- How to Use This Book
- Chapter 1 Navigation
- Chapter 2 Preliminary Data Exploration
- Chapter 3 Storing and Manipulating Data
- Chapter 4 Advanced Concepts in Dataset and Variable Manipulation
- Chapter 5 Introduction to Common Procedures
- Chapter 6 Procedures for Simple Statistics
- Chapter 7 More about Common Procedures
- Chapter 8 Data Visualization
- Chapter 9 JMP as an Alternative
- Index
Chapter 7 - More about Common Procedures
Published online by Cambridge University Press: 05 June 2016
- Frontmatter
- Contents
- Acknowledgments
- About This Book
- How to Use This Book
- Chapter 1 Navigation
- Chapter 2 Preliminary Data Exploration
- Chapter 3 Storing and Manipulating Data
- Chapter 4 Advanced Concepts in Dataset and Variable Manipulation
- Chapter 5 Introduction to Common Procedures
- Chapter 6 Procedures for Simple Statistics
- Chapter 7 More about Common Procedures
- Chapter 8 Data Visualization
- Chapter 9 JMP as an Alternative
- Index
Summary
In this chapter, we delve deeper into the simple statistics procedures introduced in Chapter 6 and discuss how these procedures can be employed to accomplish even more tasks. Using BY and CLASS statements, output can be stratified by groups of interest. Various options for reporting missing values (MISSPRINT, MISSING, NMISS) are appropriate and useful depending on the purpose of the output. Simple statistical tests can be performed and results featured in the output while new datasets can be created containing the information provided by this output.
STRATIFIED OUTPUT USING THE BY AND CLASS STATEMENTS
Frequently, data needs to be reviewed in a stratified format. Variables like height and weight typically need to be reviewed separately for males and females, and in a clinical trial, patient outcomes must often be reviewed separately for each treatment group. Using a BY or CLASS statement allows the programmer to access the full functionality of a procedure in a stratified setting. A BY statement can be used in the MEANS, FREQ, UNIVARIATE, and CORR procedures. Before a BY statement can be invoked, the dataset must first be sorted according to the variable to be specified in that statement. The CLASS statement can only be used in the MEANS and UNIVARIATE procedure and does not require that the dataset be presorted.
Let's continue with the sashelp dataset ‘cars.’ Suppose we are curious about any differences in the distribution of ‘msrp’ based on ‘origin.’ Since we will be using a BY statement, we must first sort the dataset according to the variable we will use in that statement. SAS does not allow us to sort a dataset in the permanent sashelp library, so first we will create a work dataset and then sort it. Finally, we use the MEANS procedure with a BY statement (Example 7.1).
EXAMPLE 7.1. MEANS Procedure Syntax with BY Statement.
data cars; set sashelp.cars; run;
proc sort data = cars; by origin; run;
proc means data = cars;
var msrp ;
by origin;
run;
Figure 7.1 shows the resulting output. It is exactly like the output that would result from executing a PROC MEANS without a BY statement, but the output is stratified by the variable ‘origin.’
- Type
- Chapter
- Information
- Data Management Essentials Using SAS and JMP , pp. 94 - 107Publisher: Cambridge University PressPrint publication year: 2016