First names and FAME

First names
and FAME
Jørgen Ouren
Statistics Norway
Popularity in per cent
Ingrid
Andreas
2.5
Change over time
of popularity
2.0
1.5
1.0
0.5
0.0
1880
Most common
names 1998, boys
1900
1920
1940
No.
Number
Per 1000
1
2
3
4
5
6
7
8
9
10
672
635
615
590
534
525
512
481
457
449
22
21
20
19
17
17
17
16
15
15
1960
1980
Name
Kristian
Markus
Martin
Andreas
Sander
Kristoffer
Daniel
Thomas
Fredrik
Alexander
Norwegian Central Register of Persons
in Oracle
County Year of birth ; Surname ;
0847
1947
First name
; Middle name
; JENSEN ; RICHARD ARNOLD ;
BERG
AWK or gawk
0847 1947; JENSEN ; RICHARD ARNOLD ; BERG
out13.awk
BEGIN {FS=«;»}
print $1 $3}
0847 1947 RICHARD ARNOLD
printnr32.awk
{print $3 $2}
RICHARD 1947
gawk -f out13.awk NM | gawk -f printnr32.awk | sort | gawk -f count.awk > innmale.inp
#tella1.awk teller opp fra fil med aar navn
BEGIN {navn="" ;aar=""; antall=0}
{if ($1 == navn && $2 == aar) antall++; else
if ($1 == navn)
{
{ print "set " navn ".ANT[" aar "]=" antall ; aar=$2;antall=1;}
else { if (NR > 1) {print "set " navn ".ANT[" aar "]=" antall}
print "series " $1 ".ANT" ;
aar=$2;navn=$1;antall=1;}
}
}
END {print "set " navn ".ANT[" aar "]=" antall}
Result
series RICHARD.ANT
set
RICHARD.ANT[1903]= 4
set
RICHARD.ANT[1904]=7
...
OBS: ÆØÅ and ELSE
Now we are in FAME!
Sum up:
IGNORE ON
IGNORE ADD ON
series BOYS
loop for i in wildlist(nm,»?.ant»)
set boys = boys + i
end loop
TUNE ARGUMENT 50000
GRAPH RICHARD.ANT/BOYS*100
Richard
Popularity in per cent
0.4
0.3
0.2
0.1
0.0
1880
1900
1920
1940
1960
1980
We need a list of all the
«large» names
/LL = WILDLIST(NM,»?.ANT»)
/LL2 = SELECTNAMES( LL, SUM(@NAME) GT 1000)
LOOP FOR I IN LL2
TYPE name(i), sum(i), i[1997]
END LOOP
One file for each name:
$graphsimple «BOYS», LL2
procedure $graphsimple
loop for i in list
argument sex, list
block
date 1880 to *
if sex eq "GIRLS"
execute "picture <acc o> "+QUOTE+ "GIRLS/"+name(i)+".eps" +
QUOTE
work’gser=i&ant/GIRLS*100
else
execute "picture <acc o> "+QUOTE+ "BOYS/"+name(i)+".eps" +
QUOTE
work’gser=i&ant/BOYS*100
end if
channel reports picture
-- Set up the grafic page
device graphic postpa4
glue dot
deci 0
tick label area left height 0.10
tick label area right height 0
tick label area top height 0
tick label area bottom height auto
tick label size xsmall
tick label font f5
tick left zero on
tick mark bottom direct out
caption null
plot color green
plot thick medium
grid left on
grid style dotted
legend off
-- Define character size acording to points
character size xsmall 0.00882 -- 7 punkt
-- character size small 0.01134 -- 9 punkt
character size small 0.0126 -- 10 punkt
character size medium 0.01512 -- 12 punkt
page dim hori 6.2/2.52
page frame off
comma on
replace comma " "
if max(gser) lt 2
page dim vert 3.0/2.52
tick left numeric lin (0 to 2 step 1)
else if max(gser) lt 3
page dim vert 3.8/2.52
tick left numeric lin (0 to 3 step 1)
...
else if max(gser) lt 8
page dim vert 7.8/2.52
tick left numeric lin (0 to 5 step 1)
else
/lm = round(max(gser)+0.45)
page dim vert 8.6/2.52
tick left numeric lin (0 to lm step 1)
end if
date *
gra <date 1880 to *>
end loop
end block
end procedure
if gser eq 0 then ND else gser
At last – top 10 for each year.
Realy a challange for FAME
REPORT
TITLE «Most common used name»
SELECT CASE (1 TO 10)
LOOP FOR YEAR = 1880A TO 1997A
PRINT
SL(SORTNAMES(LL2,@NAME[YEAR],DOWN)
) AS STRING(YEAR)
END LOOP
END REPORT
Most popular names
Girls 1990-1998
1990
1991
1992
1993
1994
1995
1996
1997
1998
1
2
3
4
5
Ida
Silje
Ida
Ida
Karoline
Ida
Ida
Ingrid
Ida
Karoline
Marte
Camilla
Karoline
Silje
Karoline
Ingrid
Ida
Silje
Marte
Karoline
Silje
Silje
Ida
Marte
Julie
Marte
Malin
Silje
Ida
Marte
Camilla
Marte
Silje
Marte
Karoline
Marte
Camilla
Kristine
Karoline
Kristine
Camilla
Kristine
Karoline
Julie
Karoline