J'écris parce que j'oubliais

Monday, January 16, 2012

How do I identify runs of consecutive observations in panel data?

The STATA mailing list has a way to identify runs of consecutive observations. With some googling, SAS can do the same thing. Here's how.

Suppose you want to figure out how many observations you have per GVKEY:

data merge2; set merge; by gvkey; cnt+1; if first.gvkey then cnt=1; run;
proc sort data=merge2; by gvkey descending cnt; run;
data merge3; set merge2; by gvkey; retain totcnt; if first.gvkey then totcnt=cnt; output; run;

Monday, November 21, 2011

Merging with CIK

char_cik=put(cik,z10.)

CIKs are 10-digit characters, but if you manually code them from EDGAR, they won't have leading zeroes. So use "z10."

Tuesday, October 11, 2011

Averages across years

PROC MEANS nolabel DATA=[dataset] ;
CLASS [year];
VAR [variable];
OUTPUT OUT=[dataset] MEAN= ;
RUN;

MEAN can be replaced by MEDIAN.

Source

Sunday, October 09, 2011

Esttab and multiple dummy indicators

Suppose your empirical specification has both unit and time fixed effects. You don't the table to be cluttered with n or m variables, do you?

Including indicate("Time fixed effects = " "Unit fixed effects = ") after esttab will do the trick. Note the STATA output to determine what to put after the equal signs.

Ado Path

Emerald at UNC has STATA 11.2 but is not cooperative when it comes to downloading *.ado files (e.g. estout). So I downloaded estout in STATA 9.2 through Latte at Fuqua and then copied the folder to my Emerald account. Do note that you must use "scp" and not "sftp" due to the latter's restriction on recursive copying (i.e. folders).

The help for adopath suggested that the folder be copied to the "/netscr/[username]" folder. After that, esttab worked just fine.

Thursday, September 22, 2011

Destringing GVKEY

GVKEY as-is from COMPUSTAT in WRDS comes as a string variable. To destring it in STATA, type

destring gvkey, replace

To destring it in SAS, a ghetto way is to multiply it by 1.

Friday, July 22, 2011

Converting from 4-Digit SIC to 2-Digit SIC

Table 4 of "The power of the pen and executive compensation" (Core, Guay, and Larcker 2008) contains "fixed effects for year and 2-digit SIC codes." But ExecuComp in WRDS gives you the 4-digit SIC codes. How do you get from 4 digits to 2 in SAS?

First, note that the first two digits denote the "major industry group." Second, 4-digit SIC from ExecuComp is unsurprisingly numeric. So substr won't work unless you first convert from numeric to string:

sictwo=put(sic,4.0)

Then you can use substr. Just be sure to convert it back to numeric:

sictwo=input(sictwo,best4.)

Now run your fixed effects regression.