Title: | Spatial Microsimulation |
---|---|
Description: | Produce small area population estimates by fitting census data to survey data. |
Authors: | Dimitris Kavroudakis <[email protected]> |
Maintainer: | Dimitris Kavroudakis <[email protected]> |
License: | GPL-3 |
Version: | 2.3.1 |
Built: | 2025-02-05 02:38:47 UTC |
Source: | https://github.com/cran/sms |
Generate small area population microdata from census and survey datasets. Fit the survey data to census area descriptions and export the population of small areas (microdata).
Generate small area population microdata from census and panel datasets. Fit the survey data to census area descriptions and export the popultion of small areas.
Dimitris Kavroudakis [email protected]
Dimitris Kavroudakis D (2015). sms: An R Package for the Construction of Microdata for Geographical Analysis. Journal of Statistical Software, 68(2), pp. 1-23. http://10.18637/jss.v068.i02
Create a data lexicon for holding the associated column names
addDataAssociation(indf, data_names)
addDataAssociation(indf, data_names)
indf |
A data Lexicon (data.frame) created from the function: |
data_names |
A vector vith two elements. The first element should be the name of
the |
indf The imported data lexicon with one extra column.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) print(in.lexicon)
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) print(in.lexicon)
Calculate the error of a selection.
calculate_error(selection, area_census, lexicon)
calculate_error(selection, area_census, lexicon)
selection |
A population selection, to evaluate its error |
area_census |
An area from census (a row) |
lexicon |
A data.frame with details about data connections |
Calculates the Total Absolute Error (TAE) of a selection for a census area.
TAE Total Absolute Error of this selection against the census description of this area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) #Select the first area from the census table this_area=as.data.frame(census[1,]) #make a random selection of individuals for this area. selection=random_panel_selection( survey, this_area$population ) #evaluate the Total Absolute Error (TAE) for this selection error=calculate_error( selection, this_area, in.lexicon ) print( error ) # print the error of the selection
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) #Select the first area from the census table this_area=as.data.frame(census[1,]) #make a random selection of individuals for this area. selection=random_panel_selection( survey, this_area$population ) #evaluate the Total Absolute Error (TAE) for this selection error=calculate_error( selection, this_area, in.lexicon ) print( error ) # print the error of the selection
A sample census dataset containing descriptive information about 10 geographical areas. The variables in the dataset are as follows:
areaid: The unique indentifier of the area
population: The number of indivisuals in the area.
he: Number of individuals in the area, with at least Higher Education degree
females: Number of female individuals in the area
data(census)
data(census)
A data frame with 10 rows and 4 variables
Check the lexicon data.frame
check_lexicon(inlex)
check_lexicon(inlex)
inlex |
A data.frame which will be used a data lexicon for listing the associated data columns. |
Dimitris Kavroudakis [email protected]
library(sms) df=createLexicon() df=addDataAssociation(df, c("ena","duo")) check_lexicon(df)
library(sms) df=createLexicon() df=addDataAssociation(df, c("ena","duo")) check_lexicon(df)
Check the integrisy of the data Lexicon
checkIfNamesInDataColumns(names, incensus, insurvey)
checkIfNamesInDataColumns(names, incensus, insurvey)
names |
A vector with names to check if they exist as column names in the data (census and survey) |
incensus |
The census data |
insurvey |
The survey data |
anumber If both names are valid then it return '1' else if the names are not valid data column names, it returns '0'.
Dimitris Kavroudakis [email protected]
Create a data lexicon for holding the associated column names
createLexicon()
createLexicon()
dataLexicon A data.frame holding the associated column names.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) print(in.lexicon)
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) print(in.lexicon)
Find the best selection of individual records for a census area.
find_best_selection(area, insms, inseed = -1)
find_best_selection(area, insms, inseed = -1)
area |
A census area |
insms |
A microsimulation object which holds the data and details of the simulation such as iterations, lexicon. |
inseed |
test |
Calculate the best area representation, after a series of selection tries.
list A list with results (#areaid, #selection, #tae, #tries, #error_states).
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) this_area=as.data.frame(census[1,]) #Select the first area from the census table insms= new("microsimulation",census=census,panel=survey, lexicon=in.lexicon, iterations=10) best=find_best_selection(this_area, insms) print(best)
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) this_area=as.data.frame(census[1,]) #Select the first area from the census table insms= new("microsimulation",census=census,panel=survey, lexicon=in.lexicon, iterations=10) best=find_best_selection(this_area, insms) print(best)
Run a simulation in parallel mode with Simulated Annealing
find_best_selection_SA(area_census, insms, inseed = -1)
find_best_selection_SA(area_census, insms, inseed = -1)
area_census |
A census dataset consisting of various areas rows. |
insms |
A microsimulation object which holds the data and details of the simulation such as iterations, lexicon. |
inseed |
A number to be used for random seed. |
msm_results An object with the results of the simulation, of this area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) this_area=as.data.frame(census[1,]) #Select the first area from the census table insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) myselection= find_best_selection_SA( this_area, insms, inseed=1900) print(myselection)
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) this_area=as.data.frame(census[1,]) #Select the first area from the census table insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) myselection= find_best_selection_SA( this_area, insms, inseed=1900) print(myselection)
getInfo Generic
getInfo(object)
getInfo(object)
object |
A microsimulation object to get its information. |
Dimitris Kavroudakis [email protected]
Get information from a microsimulation object
## S4 method for signature 'microsimulation' getInfo(object)
## S4 method for signature 'microsimulation' getInfo(object)
object |
A microsimulation object to get its information. |
Dimitris Kavroudakis [email protected]
Get the TAE from a microsimulation object.
getTAEs(object)
getTAEs(object)
object |
A microsimulation object to get its information. |
Dimitris Kavroudakis [email protected]
getTAEs Method
## S4 method for signature 'microsimulation' getTAEs(object)
## S4 method for signature 'microsimulation' getTAEs(object)
object |
A microsimulation object to get its information. |
taes A list of numbers indicating the Total Absolute Error of the fitting process for each of the census areas.
Dimitris Kavroudakis [email protected]
It holds all microsimulation details and objects such as data, results etc.
census: |
A census data.frame where each row contains census information about a geographical area |
panel: |
A data.frame containing the individual based records from a panel survey. Those data will be fitted to small area contrains and will populate each vrtual area. |
lexicon: |
A data.frame containing the association of columns between census data and panel data. Each row contain a conection between census and panel data.frame. |
resuls: |
A list of results from the fitting process. |
iterations: |
The number of itertions until th end of the fitting process. |
Dimitris Kavroudakis [email protected]
mysetSeed
mysetSeed(inseed)
mysetSeed(inseed)
inseed |
A number to set as a random seed. |
mysetSeed
library(sms) sms::mysetSeed(1900)
library(sms) sms::mysetSeed(1900)
Plot the selection process of an area from a microsimulation object.
plotTries(insms, number)
plotTries(insms, number)
insms |
The input results |
number |
the number of the area to plot |
Plot errors during selection process for an area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) ansms = new("microsimulation", census=census, panel=survey, lexicon=in.lexicon, iterations=5) sa = run_parallel_SA(ansms, inseed=1900) plotTries( sa, 1 )
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) ansms = new("microsimulation", census=census, panel=survey, lexicon=in.lexicon, iterations=5) sa = run_parallel_SA(ansms, inseed=1900) plotTries( sa, 1 )
Select n random rows from a dataframe
random_panel_selection(indf, n)
random_panel_selection(indf, n)
indf |
The initial dataframe from wich a selection will be made. |
n |
The number of random rows |
Select n random rows from a dataframe
a selection of rows as a dataframe
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) some.individuals=random_panel_selection(survey,4) print(some.individuals) # Print the selection of individuals
library(sms) data(survey) #load the data data(census) some.individuals=random_panel_selection(survey,4) print(some.individuals) # Print the selection of individuals
Run a simulation in serial mode with Hill Climbing
run_parallel_HC(insms, inseed = -1)
run_parallel_HC(insms, inseed = -1)
insms |
A microsimulation object which holds the data and details of the simulation such as iterations, lexicon. |
inseed |
A number to be used for random seed. |
Run a simulation in serial mode with Hill Climbing
msm_results An object with the results of the simulation, for each area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census,panel=survey, lexicon=in.lexicon, iterations=10) re=run_parallel_HC(insms, inseed=1900) print(re)
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census,panel=survey, lexicon=in.lexicon, iterations=10) re=run_parallel_HC(insms, inseed=1900) print(re)
Run a simulation in parallel mode with Simulated Annealing
run_parallel_SA(insms, inseed = -1)
run_parallel_SA(insms, inseed = -1)
insms |
A microsimulation object which holds the data and details of the simulation such as iterations, lexicon. |
inseed |
A random number to be used for random seed. |
msm_results An object with the results of the simulation, for each area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) results= run_parallel_SA(insms, inseed=1900) print(results)
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) results= run_parallel_SA(insms, inseed=1900) print(results)
Run a simulation in serial mode
run_serial(insms)
run_serial(insms)
insms |
A microsimulation object which holds the data and details of the simulation such as iterations, lexicon. |
Run a simulation in serial mode.
msm_results An object with the results of the simulation, for each area.
Dimitris Kavroudakis [email protected]
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) results= run_serial( insms) print(results)
library(sms) data(survey) data(census) in.lexicon=createLexicon() in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) insms= new("microsimulation",census=census, panel=survey, lexicon=in.lexicon, iterations=5) results= run_serial( insms) print(results)
Make a single selection of individual records for a census area.
selection_for_area(inpanel, area_census, inlexicon)
selection_for_area(inpanel, area_census, inlexicon)
inpanel |
The panel dataset |
area_census |
A census area |
inlexicon |
A data lexicon showing the variable associations. |
Select a number of individual records from panel dataset, to represent a census description of an area.
list A list of results (#areaid, #selection, #error)
Dimitris Kavroudakis [email protected]
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) # Select the first area from the census table this_area=as.data.frame(census[1,]) #make a representation for this area. sel=selection_for_area(survey, this_area, in.lexicon) print(sel) #print the representation
library(sms) data(survey) #load the data data(census) in.lexicon=createLexicon() # Create a data lexicon for holding the associated column names. in.lexicon=addDataAssociation(in.lexicon, c("he","he")) in.lexicon=addDataAssociation(in.lexicon, c("females","female")) # Select the first area from the census table this_area=as.data.frame(census[1,]) #make a representation for this area. sel=selection_for_area(survey, this_area, in.lexicon) print(sel) #print the representation
A sample survey dataset containing binary (0 or 1) information about 200 individuals. Those individuals will be used to populate the simulated areas. The variables in the dataset are as follows:
pid: The unique indentifier of the individual
female: Binary value of the sex of the individual. 1-Female, 0-Male
agemature: Binary value indicating if the individual belongs to the mature age group. 0-No, 1-Yes
car_owner: Binary value indicating if the individual owns a car. 0-No, 1-Yes
house_owner: Binary value indicating if the individual owns a house. 0-No, 1-Yes
working: Binary value indicating if the individual is working. 0-No, 1-Yes
data(survey)
data(survey)
A data frame with 200 rows and 7 variables