Email Me | Resume | Home Page | Publications | VB6 Code | Miscellaneous

Title Description (Page under Construction)
Some Introductory Comments about SAS...

Sas Introduction: SAS is statistical analysis software developed out of Cary, North Carolina. It has been around for years and probably is the largest supplier of statistical software in the nation. Some people use it but are not interested in its statistical capabilities. These individuals like the way SAS handles very large datasets and crunches a lot of numbers for reporting. Other people use it for modeling purposes. Some use it to develop and maintain a data warehouse.

SAS is primarily a command driven statistical programming language. It has something called PROCS that are typically short commands designed to do a certain task very quickly. There is a proc for just about everything - from printing to estimating a host of regression models. A good book for beginners The Little SAS Book - written by Lora Delwiche and Susan Slaughter. In addition to a lot of handy procs, SAS can be used as a flexible programming language. However, as a programming language, SAS will appear quirky if you are use to working in other languages.

The purpose of the following downloads is to give you some help in getting up to speed on programming in SAS as it relates to modeling. The code that follows was not written for efficiency's sake, but to illustrate concepts. As always, I have to say that these programs are without warranty and I accept no liability for their use or installation.

Calling SAS from VB6

Before we get started with the SAS stuff, there are times you may want to call SAS from a Visual Basic program. This is easy to do and can be done by leaving the user in SAS to examine the output, or you can do everything in the background and hide SAS completely. It's easy if you know how!

Summing Variables in SAS

One of the things SAS is quirky about is how you are forced to take a different appraoch in summing a column of numbers as opposed to other programming languages. This is because SAS automatically reads the rows or observations for you rather than you providing it with the code to do so like you would in Visual Basic and other languages. In order to sum a column of numbers, you need to use the RETAIN statement. This is highlighted in a simple example.

Reordering Variables

Its easy to reorder the variables in a SAS dataset if you know how. Simply use a RETAIN statement before the SET statement.

Handling Missing Data

You can use an ARRAY statement to quickly run through a dataset and substitute a selected value for missing information.

SAS Date Manipulation

Date manipulation can be tricky. Here is a useful programming example in SAS.

Numeric & Character Variables

Converting Numeric to Character values and vice-versa is easy if you know how.

Merging Datasets

Merging (joining) datasets together is easy using the MERGE statement along with the (IN=) statements. Be sure both datasets are properly sorted first.

Making .PDF Files

The ODS statement in SAS can make you a nice looking .pdf file for any output including graphs.

Simple Macro

This program shows you how to write a simple macro which is very similar to a subroutine in other languages where you can pass parameters. The secret to using a macro centers on the use of the "&" symbol where SAS uses it as a key to text substitution where it is dynamically resolved within the program.

Using SYMPUT in your Macro

In order to perform the same level of coding in SAS as you do in other porgramming languages, you need to understand the SYMPUT command. This command allows you to store information from the Data Step and hold it in memory. You really can't live without the SYMPUT command if your programming in SAS.

Removing 1st Character of Variable Name

When you do programming in SAS, sometimes you have to manipulate the names of the variables. This program drops the 1st character of the variable name for ALL variables in your dataset. This can be modified to other things very easily. SWEET!

Cross-Tab Macro

SAS can develop crosstabs easily through PROC FREQ and Tables command. Using these commands, you can print out a nice looking nXn matrix of counts and percentages. However, there is no super quick way to place this information in a SAS dataset unless you do a little programming with PROC TRANSPOSE and use something called the SPARSE command when you do PROC FREQ. Here is a macro that does all the hard stuff for you. Believe me, this will save you a great deal of time over the years!

Importance Ratio

Sometimes the analyst has the problem of having TOO many variables to work with. There are a number of alternatives in trying to condense the sheer volume of variables down to some managable number. You could perform some Factor Analysis on the data. A quick alternative, however, might be to calculate an index that measures the importance of the variable in relation to the dependent variable. This routine shows you how to calculate such a ratio that can be used in a univariate sense to trim down the number of variables for your regressions. The higher the 'importance ratio', the more predictive the variable from a relative perspective.

Identify / Remove Correlated Vars

In regression analysis, one of the most frustrating challenges is that sometimes variables are too correlated with one another to all be included as predictors. This program provides at least one satisfying solution to that problem.

Delete Missing Data Vars

In regression analysis, you typically do not want to include variables with a large percentage of missing information. This program deletes them for you automatically.

Variable Clustering

In regression analysis, you might have data that is correlated, so you could identify clusters of information that are attempting to explain the same thing. This program shows you how and extracts the recommended variable from each cluster that you might want to try in your regression.

Bootstrap Resampling

Bootstrap resampling is a way to determine if your regression model is over-optimistic with regard to accuracy. The idea is to automatically create resamples of your original dataset of the same size a number of times and re-estimate your model each time. Next, you would measure its accuracy each time (KS, R-sq, AIC, etc.) and average the results. This procedure is especially useful when you have a limited number of bads and cannot afford to have a separate hold-out sample. Note: this is a sampling procedure WITH replacement, so you could have an observation from your original dataset appear more than once in each bootstrap sample. Great idea!

Jackknife Estimation

In this example, Jackknife sampling holds out a single observation from the original dataset for model estimation. However, it does it repeatedly for as many times as there are observations in your data. Jackknifing can be used for a variety of purposes, but one application is to examine the influence of each observation on your regression estimates. If you see the estimate for a particular coefficient change a significant amount, then you might view that observation as a possible outlier. This example uses logistic regression to estimate the model and to score the single holdout observation each time.

Split Sample Jackknife

Simple Jackknife estimation typically uses a single holdout observation for analysis purposes. A more generalized approach is Split Sampling where the size of the holdout sample can vary. In this example, a holdout sample of 10% is selected, repeated 20 times, and scored. This is another good method of testing the accuracy of your model - especially when the number of bads is smaller.

Scanning Macro Var Lists

This is a neat code example where you can make a list of characters that can be used to scan a dataset to flag, say invalid records. Cool use of the Scan function in SAS.

VARS Upper/Lower Case

Here is a neat bit of code to automatically change your variable names to upper or lower case. Uses %sysfunc and RENAME statement. The macro also allows you to do a search and replace on a character value within the variable name in case there are some troublemakers there. A good example is to replace '_' with 'x' in the all the variable names. This part uses the TRANSLATE function. Very quick routine!

Alphabetically Reordering Variables

Although I did not write this code, this program shows you how to automatically reorder your variables alphabetically. Very handy.

Dynamically Reading Variable Names

This program shows you how to dynamically retrieve variable names using the SYSFUNC command from a DO LOOP in SAS.

Handling Missing Values

This program shows you how to take your data and substitute the mean, mode, or median values for missing values. Useful in credit scoring applications where you typically have 25-30% of your data missing. If you do not do something about missing values, your regression procedure will automatically skip the observation, substantially reducing the size of your data.

Handling Missing Values

Sometimes you may want to handle missing values a little differently than using the mean, mode, or median values as proxies. Typically in scoring applications, if the event probabilities are signficantly different between accounts that have missing values and accounts that have valid values, then assigning proxies from sample averages could be less than optimal. This program uses a discretizing process to collapse a continuous variable into intervals and determine the closest proxy for missing using the sample event probabilities.

Macro Variables in Regression

This program illustrates how to get the variable names found significant in a logistic regression and to use them later in a dymamic way in your program. Very useful.

Creating Credit Scoring Dataset

If your modeling data is in the form of monthly snapshots, you may have no built-in easy way to determine if a set of accounts went "BAD" over the performance period. This program illustrated how to create your dependent variable for a credit scoring application and merge it with your attribute data obtained from the observation point.


Look at correlations across cross sectional units over time.

Graphing Logit Models

This program illustrates how to automatically graph logit models for each of the independent variables in the regression. Very useful in determining where the model variables are most sensitivity and if the relationship to the event probability is S-shaped. Contains more advanced macro programming.

Validating Logit Models

This program illustrates how to validate a logit or probit model if you are given the predicted probability and the event variable for a set of observations. This program produces a ranking table as well as a lift curve (sometimes called a power curve). Very useful.

Implementation Code

This program illustrates how to dynamically produce the implementation code associated with a logit model so you can score a data set easily. Again, very useful.

Single Sided TOBIT

This program shows you how to estimate a single sided tobit model in SAS (PROC LIFEREG) and how to score a dataset with implementation code. Output matches SHAZAM econometrics program. Note: This may not work on SAS version 8.2+.

Double Sided TOBIT

This program shows you how to estimate a double sided tobit model in SAS (PROC QLIM) and how to score a dataset with implementation code. Output matches SHAZAM econometrics program which is provided on the MISCELLANEOUS download page. This will not work on SAS version 8.2+.

Pairwise & Bivariate Correlations

Have you ever needed to get information from the correlation matrix in a useful and meaningful way? Well, its a little tricky, but this program shows you how by pulling out the upper right hand portion of the matrix into useful tables for analysis purposes.

Stacking and Unstacking Data

Using Macros, this shows how you can stack and unstack your data if you are working with cross sectional data such as state and county level information.

Shading Recessions

Useful for showing economic recessions - simple line graph.

Shading Recessions

Plotting dual axis graphs with recession shading.

Forecasting a System of Equations

Only use PROC MODEL in SAS for Time Series (non Box-Jenkins) forecasting. This is because PROC Model allows you to control for autocorrelation like proc autoreg, but gives you the flexibility to do perform system of equations -i.e. SUR, 2SLS, 3SLS. However, it's syntax is a little unusual. This program gives you an example of how to do a number of things in PROC MODEL which is usually needed in time series regression modeling such as confidence levels, autocorrelation correction, and ex-post forecast validations.

Creating a Slideshow

Simple new proc in SAS creates a PDF slideshow and allows you to import pictures.

State Thematic Maps

State Thematic Maps are easy in SAS if you know how!

County Thematic Maps

County Thematic Maps are easy in SAS if you know how!

Color Based Scatter Plots

SAS can provide you with a Scatter Plot (XY graph) that is grouped by colors. This is an excellent way to show differences in a dataset.

Bivariate Granger Causality

The Bivariate Granger Causality Test is a good way to determine if one variable is a leading indicator of another variable. This knowledge is often desirable in forecasting applications.The program here is done using a pooled data format.

Multiple Bivariate Granger Causality

The Bivariate Granger Causality Test in this program is automated to include testing for numerous variables automatically from a list of variables supplied by the user. Again, it is done using a pooled framework.

Multiple Bivariate Granger Causality (NonPooled)

The Bivariate Granger Causality Test in this program is automated to include testing for numerous variables automatically from a list of variables supplied by the user. This is for nonpooled data.

Bootstrapping Logit Confidence Intervals

Useful for bootstrapping confidence intervals from a logistic regression model.

Time Series Bootstrapping Confidence Intervals

Useful for bootstrapping confidence intervals from a time series regression model.

Equal Slopes Graphical Plot

When using Ordinal Logistic Regression, you will often fail the proportional odds test. That is because the test is very anticonservative and especially problematic for models with many observations or with continuous variables. This is a plot to show parallel lines and can be used as a substitute for the Score test in SAS.

Mapping Zip code locations

If you know the 5 digit zipcode, you can easily map locations by state(s).

Mapping locations with Radius

Shows how to map customers around a 10 mile radius from a business. All you need is lat/long.

Mapping locations with Radius

Instead of State, let's look at counties. All you need is lat/long and the county and state FIPS code.

Mapping locations with Custom Legend Texts

Use proc format to change legend texts.

Summarizing and counts data automatically with one to many relationships.

Summarizing data with Character Data can be difficult. Here is a macro that automates the process.

Useful Frequency Rpt

This is a useful program written by Chris Swenson for consolidating Frequencies for Character and Numeric Variables. Check out his website at Nicely done, Chris.

WOE Binning Program for Credit Scoring Applications

Zipped File containing SAS Binning routine and an Excel application for collapsing the bins and producing a scorecard. Example files included.

Rolex Replica Watches Cheap Breitling Replica rolex replica uk cheap iwc replica Cheap Breitling Replica Breitling Replica UK Cheap IWC Replica IWC Replica Watches Replica Watches Swiss Replica Watches Cheap Omega Replica Omega Replica