1. codebook displays information about variables' names, labels and values.
2. clear command clears out the dataset that is currently in memory. We need to do this before we can create or read a new dataset.
3. describe displays a summary of a Stata dataset, describing the variables and other information.
4. edit command opens up a spreadsheet like window in which you can enter and change data. You can also get to the "Data Editor" from the pull-down "Window" menu or by clicking on the "Data Editor" icon on the tool bar.
Enter values and press return. Double click on the column head and you can change the name of the variables. When you are done click the "close box" for the "Data Editor" window.
5. inspect displays information about the values of variables and is useful for checking data accuracy.
6. list command without any variable names displays values of all the variables for all the cases. List with variable names displays values only the variables following the command.
7. log using filename.log opens a log file called "filename.log" that records everything you type and all of the output from the commands (For example: log using my).
8. log using filename.log, noproc opens a log file called " filename.log" that records only what you type with no output.
9. log using filename.log, append opens an existing log file called "oldlog.log" and adds the current log to the end of "oldlog.log."
10. log using filename.log, replace opens an existing log file called "oldlog.log," deletes the contents and places the current log into the file.
11. The command log off temporarily suspends logging while log on turns logging back on again.
12. The command log close closes and saves the current log file.
The log command is such an important command. Having a log of commands and output is extremely useful in documenting the data analyses and recoverying from errors that may occur. You will be tempted to not use the log feature of Stata, it can seem like a bother. But the time will come when you will need to recreate a data file or data analysis and the only way you will be successful is if you created and saved a log-file.
13. Stata displays --more-- whenever it fills up the computer screen. Pressing the 'space bar' will display the next screen, and so on, until all of the information has been displayed. To get out of --more--, you can click on the 'break' button, select 'Break" from the pull-down 'Tools' menu, or press the 'q' key.
14. The save command will save the dataset. (For example: "my.dta"). Editing the dataset changes data in the computer's memory, it does not change the data that is stored on the computer's disk. The replace option allows you to save a changed file to the disk, replacing the original file.
15. The type command displays the contents of a text file to the screen.
16. The use command loads a Stata dataset into memory for use.
Statistics with Stata
1. The summarize command displays basic descriptive statistics: n, mean, standard deviation, min and max. The detail option provides more descriptive statistics, including the variance, skewness, kurtosis, the median and other percentiles. (For example: summarize ses, detail)
2. The graph command is used to create one and two dimensional graphs.
3. The correlate command displays a matrix of Pearson correlations for the variables listed.
4. graph twoway scatter variable1 variable2: This version of the graph command uses two variables and displays a scatterplot. (For example: graph twoway scatter read math)
5. The tabulate command with one variable creates a frequency distribution table. Note that the nolabel option shows the numeric values of the variable instead of the "value label."
The tabulate command with two variables creates a two-way table or cross tabulation. (For example, tabulate ses female).
The tabulate command with the chi2 option the command includes the chi-square value along with its p-value. (For example, tabulate ses female, chi2)
6. t-tests: examples
This example involves the single-sample t-test, testing whether the sample was drawn from a population with a mean of 50. By the way, the standardized writing test in this sample was normed nationally with a mean of 50.
This example makes use of the t-test for dependent samples. In this case, we are testing whether there is a significant difference between the math and the science test scores.
ttest write, by(female)
ttest write, by(female) unequal
The t-test for independent groups comes in two varieties: pooled variance and unequal variance. We want to look at the differences in writing test scores between 'school types.' We will begin with the ttest for independent groups with pooled variances and compare the results to the ttest for independent groups using unequal variance.
There is a test for heterogeneity of variance, sdtest, but it is overly sensitive to non-normality and statisticians do not recommend using it to screen for heterogeneity of variance.
7. Analysis of Variance Examples:
oneway write prog, tabulate
anova write prog
by prog: summarize write
table prog, contents(n write mean write sd write)
Here are two different ways to perform a one-way analysis of variance (ANOVA). They both give the exact same answer. The most visible difference is that one-way includes a test for homogeneity of variance.
anova write female prog female*prog
This example demonstrates a 3 X 3 factorial analysis of variance.
8. Regression Examples:
regress write read
regress write read, beta
generate pre2 = 23.95944 + .5517051*read
list pre1 pre2
graph twoway (scatter write read)(lfit write read)
graph twoway (scatter write read, jitter(2))(lfit write read)
These are two examples of simple linear regression. The first one displays confidence intervals for the regression coefficients while the second one displays standardized regression coefficients along with the 'regular' regression coefficients. The predict command computes a predicted science score for each observation. Compare 'pre1' with 'pre2' that was created using the generate command. The graph command, in this example, displays a scatter plot of read and write along with showing the regression line of write on read. The second example uses the jitter option to help see the points where there are multiple observations on one point.
regress write read math
regress write read math female
This time we have two examples of a multiple regression, the first one with two predictor variables and the second one with three.
Copy Stata Output and Stata Graphs
Stata holds only about 500 lines of output anything after that output is discarded.
You cannot use the pull-down menus to save the contents of the results window (i.e. you cannot go File Save to save the results).
1. Copy from the Results Window and Paste into Word
You can use the mouse to scroll through the results window and mark an area that you want to save. You can then using the pull down menu choose Edit and then Copy .
You can then go to Microsoft Word and from its pull down menu choose Edit then Paste. Most likely, the results will look lousy with the text all horribly aligned. This happens because the output from Stata uses fixed space fonts, and most fonts in Microsoft Word are proportionally spaced fonts (for example, the text in the window above is Times New Roman). If you select the text in Word and choose a fixed space font like courier the output will then look as it did in Stata.
2. Copying Graphs to Word
If you create a graph in Stata, you can copy that graph and then paste it into Microsoft Word. You need to copy the graph by choosing from the pull down menu Edit then Copy or Copy Graph in Windows.
You can then open Word and then paste the graph by choosing from the Word pull down menu Edit and then Paste. You will then see the Stata graph pasted into the Word document.
When you know the name of the command (e.g. summarize) you can type help summarize in the command window to get help on the summarize command. You can also use the pull-down menu by clicking on Help and then Stata Command and then typing summarize in the window to get help for the summarize command.
When you don't know the name of the command, you can search the Stata help files (as well as their help on their website) based on keywords. For example, you want to know more about increasing memory in Stata so you want to search for the keyword "memory". You can type search memory or you can use the help pull down menu by clicking Help and then click search and then type in the window the keyword you want to search for, e.g. memory.
When you looking for a program to download. Findit works best if you know the name of the program but will also search on more general topics.