provincial) total for male and female combined, but not the grand total for all provinces: The results are in the same format, however this returns subsample (i.e. ![]() Notice how we also get the total, so if you are interested in the split samples and the total, no need to do both separately.įinally, the third line of command, with the bysort prefix, will do the same in turn for each province, and split each sub-sample into male and female. The second line tells Stata to do the same, but to split the sample between male and female. The first line will return the statistics (mean, standard deviation and frequency) for 4 variables (HWTGHTM HWTGWTK HWTGBMI PACFD) for the whole sample. The tabstat command allows more flexibility in terms of the statistics presented and the format of the table. With the by() option, tabstat resembles tabulate used with its summarize() option in that both report statistics of varlist for the different values of varname. Without the by() option, tabstat is a useful alternative to summarize because it allows you to specify the list of statistics to be displayed. The tabstat command displays summary statistics for a series of numeric variables in one table, possibly broken down on (conditioned by) another variable. ![]() The result seems to show a certain pattern: smokers look like they eat less fruit and vegetables than non-smokers, and women seem to eat more fruit and vegetable than men, on average. For example, if you wanted to look at patterns of daily fruit and vegetable consumption for men and women with different smoking habits, you could create a table for that: You can also use the tabulate, summarize() command to create a quick four-way summary statistics table. Note that you can also use the “if” qualifier here (as we did in the tabulate and summarize commands) to look at, say, one province only. If you want to know whether men and women from different provinces have different patterns in their average daily consumption of fruit and vegetables, you can use the bysort command again to do the same query province by province: This table will give us the mean, standard deviation and frequency of the daily consumption of fruit and vegetables for men and women in the sample: Let’s say you want to know how (whether) men and women differ in their daily consumption of fruit and vegetables: The second part will give summary statistics for another variable (preferably quantitative). The first part of the command (tabulate) will split your data according to a categorical variable (here we will use sex). This combination of commands let’s you create simple one-way and two-way summary statistics tables in Stata. The way you look at your data depends on the type of questions you want to ask the clearer your question, the more specific your analysis can be. In subsequent examples, we will look at men and women, smokers and non-smokers, physically active or not. In these examples we have focused on splitting the sample by province, but any categorical variable can be used. Using the “if” qualifier returns the summary statistics for a specific subgroup. First we look at the summary statistics for the whole sample, and then we look at the statistics for subsamples (each province). The example is built the same way the tabulate example was. The summarize command returns mean, standard deviation, minimum, maximum and frequency. Once you have tabulated your data, you can start looking at summary statistics other than frequency. Say you want to know how many of the women in the sample smoked over 100 cigarettes in their life: Here, let’s say we want to know the frequency of flu shots in the sample for Ontario:įinally, you can use the tabulate command to do a simple cross-tabulation using categorical variables. If you are interested in only one subgroup, you can also use the “if” qualifier with the tabulate command. For example, here let’s see if the patterns of flu shots look different for each province: ![]() The prefix “ bysort” is a combination of “by” and “sort” you could equivalently break it into two commands, but it is generally simpler to use "bysort" Stata will first sort the data, then return the information by category. Note that you can combine the tabulate command with the by (or bysort) prefix to look at the tabulation for subgroups in your dataset. Let’s say you want to know the proportion of respondent in the sample that ever got a flu shot: The tabulate command returns a frequency and cumulative distribution table in the Stata viewer. A simple tabulation should always be your first stab at your data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |