A ROADMAP TO DETERMINE THE IMPORTANT FACTORS OF THE HOUSE VALUE: A CASE STUDY BY USING ACTUAL PRICE REGISTRATION DATA OF TAIPEI HOUSING TRANSACTIONS

 

Mingchin Chen

Fu Jen Catholic University, Taiwan

E-mail: 081438@mail.fju.edu.tw

 

Pei-De Wang

Fu Jen Catholic University, Taiwan

E-mail: kmpeterwang@gmail.com

 

Submission: 08/06/2017

Revision: 29/06/2017

Accept: 22/07/2017

 

ABSTRACT

While many studies have applied data mining techniques to judge housing prices, few have decoded the important attributes or prioritized them simultaneously. This paper aims to utilize five data mining techniques to discover the important attributes for three major types of real estate in Taipei city. The datasets, involving a total of 22,480 transactions, were publicly available from the Taiwan Actual Price Registration from July 2013 to August 2015. The five models are decision trees, random forests, model trees, artificial neural networks and multiple regression. The criteria used to measure the forecasting accuracy are MAPE, R², RMSE, MAE and COR. The model with the best performance for all houses is the Model Tree with a MAPE value of 27.59. As for apartments, the best is Random Forests. Artificial Neural Networks perform best for suites and buildings with elevators. Different housing types need different models. Furthermore, the attributes importance helps us to conclude the really critical attributes, which include the floor area, administrative districts, parking area and land area, and their rankings. This variable ranking and selection procedure proposed by this research can also be adopted to improve the prediction efficiency for most big data applications other than the housing transactions.

Keywords: data mining; housing pricing; forecasting accuracy; variables ranking; variables selection

1.     INTRODUCTION

            Buying a house in Taipei is relatively hard-affordable. Therefore, evaluating a housing price become an issue. Even Taiwan authorities take the transactions more transparent in action. Taipei remains one of the most expensive cities in the world in which to buy a house. Taipei’s house price-to-income ratio stood between 15 and 17 in 2015, higher than London (8.5x), New York (5.9x), or Sydney (12.2x) (DELMENDO, 2016).

            Housing affordability remains a major problem in Taipei city.  Furthermore, higher housing affordability means higher housing prices relatively. In addition, there must be some inherent factors giving rise to these high housing prices. Those inherent factors determine the housing prices and meanwhile stand for the favor of people when they are going to buy or sell a house in Taipei.

            Actual Price Registration (APR) refers to a national system for registering the actual prices of property transactions—an initiative created to boost transparency in Taiwan’s real estate market.  This regulation came into effect on August 1, 2011. This study intends to determine what those factors are from that open system with real transactions by utilizing five data mining skills.

            There are 3 major housing types to which this paper pays particular attention. According to statistics from the Department of Urban Development, Taipei City Government for 2013 to 2015 , transactions involving condominiums in buildings of 5 storeys or less without an elevator (apartment) accounted on average for 21% of housing transactions (Type_APT), condominiums with elevators (buildings) for 58% (Type_BLD) and suites (Type_SUT) for 19% as shown in Figure 1.

            The curve corresponding to the right coordinate axis represents the volumes of transactions in each season. Even though the volumes have changed over time, the percentages of those 3 types remain relatively stable. Therefore, those 3 types become our study targets.

Figure 1: Volumes of transactions from 2013 to 2015

 

            The hedonic-based regression approach has been utilized extensively to investigate the relationship between house prices and housing characteristics(FAN; ONG; KOH, 2006). For example, Goodman (1978) extended hedonic price analysis to the formation of housing price indices measuring variations within a metropolitan area (GOODMAN, 1978).

            Fit et al., (2003) developed several hedonic specifications that attempt to more fully capture the interactive components of location values (FIK;  LING; MULLIGAN, 2003).  Welch et al., (2016) estimated a hedonic spatial panel model to determine the long-term impact of improved network access to bike and public transit facilities on housing sales prices (WELCH; GEHRKE; WANG, 2016). However, this approach is subject to criticisms arising from potential problems related to fundamental model assumptions and estimation (FAN; ONG; KOH, 2006).

            Nowadays, there are more and more studies that focus on real estate by using data mining techniques. Acciani et al.,  (2011) adopted model trees and multivariate adaptive regression splines to predictors in real estate appraisal (ACCIANI; FUCILLI; SARDARO, 2011). Fong and Wah (2013) utilized feature selection techniques to screen important attributes and applied those attributes to build up a predictive model by using different kinds of data mining techniques. Gan et al., (2015) built decision trees and neural networks and compared their results.

            While these authors all used different data mining techniques to figure out the housing prices, few of them attempted to find out what were the important attributes or to rank them by importance at the same time. Moreover, none of them identified the attributes according to the types of houses.

            This paper is going to utilize five models and five measurements to evaluate them. The five models are decision trees, random forests, model trees, artificial neural networks and multiple regression. The criteria used to measure the forecasting accuracy are MAPE, R², RMSE, MAE and COR. The final result is the roadmap for evaluating the more reasonable housing prices.

2.     RESEARCH METHODOLOGY

            The research flow is shown in figure 2. All the data used in this paper is downloaded from APR. By using 5 data mining techniques and comparing 3 major housing types by MAPE, R², RMSE, MAE and COR.  This paper finds out that different housing types need different data mining models.

            Each type has its own favor attributes with higher importance values. Therefore, ranks those attributes according to the averages of these importance values. Then count the number of models that have the same attributes. This ranking and selection process helps us to figure out the relative important attributes in each housing types.

            Finally, according to the statistics on rankings and votes of attributes, this paper identify the classifications of the attributes and build a roadmap to depict the diversities of attributes.

Figure 2: Research flow

3.     DATA MINING SKILLS

            This session is going to introduce the data mining skills used in this paper.

3.1.        Decision Tree(DT)

            A DT algorithm works by splitting a dataset in order to build a model that successfully classifies each record in terms of a target field or variable (WOODS; KYRAL, 1997). There are two types of DT: a classification tree and a regression tree that can be implemented using the four most popular algorithms: the chi-squared automatic interaction detection (CHAID) (KASS, 1980; MAGIDSON, 1994), the iterative dichotomiser (ID3) (QUINLAN, 1986), the classification and regression trees (BREIMAN; FRIEDMAN; OLSHEN; STONE, 1984) and C4.5 (QUINLAN, 1992).

            CHAID and ID3 can only be used for the classification tree, while both the classification and regression trees can be used for the others. A response variable which has more classes or categories than a classification tree can be used, otherwise a regression tree that has numeric or continuous responsiveness can be used instead.

            Two main processes used to construct a tree are tree growing and pruning. The tree growing process searches for independent variables as splitters that start from the root node with all the instances and keeps partitioning those with the greatest differences until no significant differences can be identified. In this process, the purity or impurity criterion is used to split a node that makes instances more likely in a node. In the case of a classification tree, splitting the data is based on homogeneity. A regression model splits each of the independent variables as nodes where their inclusion decreases the error measure the most. The best criterion should produce the greatest purity or reduce the impurity the most.

3.2.        Random Forests (RF)

            The pros and cons of DT are as follows (JAMES; WITTEN; HASTIE; TIBSHIRANI, 2013). The advantages are that they are easy to explain, more closely mirror human decision-making, may be displayed graphically and can easily handle qualitative predictors. Unfortunately, DT generally do not have precise predictive power. However, the performance of the predictive power can be substantially improved by RF.

            In actual fact, RF are an example of ensemble methods that combines a series of k base models (or trees) with the aim of co-creating an improved composite model. Each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest (BREIMAN, 2001). After a large number of trees are generated, they are combined to yield a single consensus prediction by voting for classification trees or averaging for regression trees. Besides, RF are characterized by significant improvements in accuracy, and greater robustness to errors and outliers.

            There are two basic beliefs regarding RF in that most trees can provide correct predictions and the trees make mistakes in different places.  Beriman (2001) stated that the use of the Strong Law of Large Numbers shows that RF always converge so that overfitting is not a problem and they produce limit values of the generalization errors that are measures of how accurate the individual classifiers are (strength) and of the dependence between them(correlation) (BREIMAN, 2001). The idea is to maintain the strength without increasing their correlation.

3.3.        Model Tree (MT)

            The MT is based on a divide-and-conquer approach through which it is possible to learn from a set of instances (WITTEN; FRANK, 2005). The output of a MT is represented by a tree–like structure in which it is possible to distinguish a root node, parent and child nodes, arches (or branches) and leaves (ACCIANI; FUCILLI; SARDARO, 2011).

            The greatest difference when compared with a decision tree is the content of the leaf node. In the model tree, each terminal node represents more and delivers more information. A linear regression model is calculated based on the number of instances of that node that it contains, and not on an averaged value in the regression tree. As a result, it may provide a more precise estimation. This paper uses a rule-based model that is an extension of Quinlan's M5 MT (KUHN; WESTON; DEEFER; COUTLER, 2016).

3.4.        Artificial Neural Network (ANN)

            ANN is an artificial intelligence model originally designated to replicate the human nervous system (BAHIA, 2013). Once the nervous system is alerted by outside stimulations, neurons work and react. Therefore, ANN consists of three main layers: the input data layer (stimulations), the hidden layer(s) and the output layer. Each artificial neuron has a set of input connections that receive signals from other neurons and a bias adjustment, as well as a transfer function that transforms the sum of the weighted inputs and bias to decide the value of the output (COAKLEY; BROWN, 2000).

3.5.        Multiple Regression (MR)

            The hedonic-based regression approach belongs to MR. There are many independent variables and one dependent variable in MR. The relationships between the independent variables and dependent variable will be described. Fixed independent variables derive the conditional expectation of the dependent variable, an averaged value. Therefore, MR is widely used for prediction.

4.     DATA SOURCE AND PREPARATION

            The data used in this research are downloaded from APR. Raw data amount to 48,658 observations from July 2013 to August 2015. After deleting all records with empty column(s) and unreasonable values, the total number of observations is 22,480 and encompasses the three most popular housing types that are all only for home use.

            To facilitate further inspections and comparisons, this paper also combines each of these three types into an overall group (Type_ALL). Generally speaking, Type_APT and Type_BLD are both suitable for a family and Type_SUT might be more suitable for singles.

            There are 20 attributes used in this paper that are listed in Table 1. This research has partitioned the houses into three types, and therefore the total number of attributes used in Type_APT, Type_BLD and Type_SUT is 19. The housing prices are naturally chosen as the dependent variable while the other housing attributes are treated as independent variables.

            There are two types of attributes: C stands for category and N for numeric. The amounts of data used in Type_APT, Type_BLD and Type_SUT are 6,115, 13,039 and 3,326, respectively. Two-thirds of the sample data are used in building the model, and the remaining one-third is used as an external holdout for measurement purposes.

 

Table 1: Data attributes

 

Attributes

Type

Description

1

target_dst

C

Administrative districts: Songshan(1), Sinyi(2), Da-an(3), Jhongshan(4),Jhonjheng(5), Datong(6), Wanhua(7), Wunshan(8), Nangang(9), Neihu(10), Shihlin(11) and Beitou(12).

2

target_tp

C

With(1) or without(2) parking place

3

lnd_area

N

Occupied land area of the house(M²)

4

lndusg_tp

C

Type of land usage: Residential(1), Commercial(2), Industrial(3), Others(4), Agricultural(5)

5

ym_sold

C

Year and month when the house has been sold

6

prk_sold

N

Number of parking places sold

7

flat_type

C

Floor numbering

8

total_flat

N

Total floor level of a building

9

hs_tp

C

Housing types: APT(1),BLD (2) and SUT(6)

10

cstrct_tp

C

Types of construction methods: Reinforced concrete (1),Reinforced brick structure (2) ,Referring to building occupation permit (3), Brick structure (4) ,Steel reinforced concrete (5), Referring to other registrations (6), Steel concrete (7), Precast reinforced concrete (8).  

11

flr_area

N

Area of the house (M²)

12

room

N

Number of rooms

13

sit_room

N

Number of living and/or dining rooms

14

bathroom

N

Number of bathrooms

15

cmptmt

C

Compartment (1) or not (2)

16

mgt_cmt

C

Having (1) or not having (2) a management committee

17

pk_type

C

Parking type: No parking space (0), On the ground floor (1), Lifting plane (2), Lifting machinery (3), Ramp (4), Ramp machinery (5), Tower (6), Others (7)

18

pk_area

N

Parking area (M²)

19

flat_age

N

Housing age (year)

20

price

N

Total price (NTD)

5.     RESULTS AND DISCUSSION

            The purpose of this section is to ascertain the predominant attributes of housing prices. Five models are utilized in the prediction. There are many criteria used to measure the forecasting accuracy (MUNUSAMY; MUTHUVEERAPPAN; BABA; ABDULLAH; ASMONI, 2015).

            In this paper, the measures used for comparison purposes are the MAPE (Mean Absolute Percentage Error), R² (Coefficient of determination), RMSE (Root mean squared error), MAE (Mean absolute error) and COR (Correlation).

            The results are derived from package ‘rminer’ (CORTEZ, 2016) and displayed in Table 2. The notation "<" means "better" if a lower value, and ">" stands for "better" if a higher value. The notation "¹" represents the best performance based on the specific measure for each housing type.

            For all houses, the MT’s MAE is larger than the RF’s, however, the MT’s RMSE is smaller than the RF’s. That means RF have more forecasting values closer to real prices than the MT, but meanwhile the RF have more outliers than the MT. In a word, MT has the best forecasting performance of all houses because the MT has the four best measures of the five.

            For apartments, RF have all the measures to win: the smallest MAPE, RMSE and MAE, and the largest R² and COR. Furthermore, ANN is found to do better than the other models because over half the measures are better than those for the other models. Obviously, due to the distinct characteristics of the different housing types, different algorithms need to be adopted.

Table 2: Measurement results for all types

Model

Measurement

Type_ALL

Type_APT

Type_BLD

Type_SUT

DT

MAPE
"<"

54.73

62.11

48.75

31.78

ANN

34.38

46.83

32.00

20.81

MR

50.58

50.18

40.93

23.94

MT

27.59¹

41.23

27.97¹

16.90¹

RF

27.95

39.71¹

29.33

23.38

DT


">"

0.73

0.49

0.68

0.61

ANN

0.81

0.53

0.87¹

0.80¹

MR

0.73

0.57

0.74

0.76

MT

0.84¹

0.56

0.84

0.78

RF

0.78

0.59¹

0.80

0.79

DT

RMSE
"<"

12447320

5003089

15617442

2789666

ANN

10374326

4794231

9971407¹

1996698¹

MR

12349125

4565800

14159109

2196543

MT

9648036¹

4628516

11232151

2010710

RF

11181599

4479439¹

12480842

2032913

DT

MAE
"<"

6638674

3642247

8678978

1986994

ANN

4531948

3436007

5762095

1335743¹

MR

6051855

3321844

7791333

1536106

MT

4307236

3289040

5485538

1372310

RF

4115843¹

3169619¹

5467660¹

1384466

DT

COR
">"

0.85

0.70

0.83

0.78

ANN

0.90

0.74

0.93¹

0.89

MR

0.86

0.76

0.86

0.87

MT

0.92¹

0.75

0.92

0.89

RF

0.90

0.78¹

0.91

0.90¹

            Each type has its own ranking or focused attributes. Insights may be gained by utilizing the important values of each attribute in a model that can be derived from rminer. Different models have different importance values for the same attributes. Inspired by the ensemble model, this paper averages those importance values from the five models outlined above, and ranks those attributes according to the averages of these importance values.

            Those attributes appearing in the bold frames constitute 95 percent of the importance resident in each type as shown in Table 3. We then count the number of models (#) that have the same attributes among the top 10 attributes of Type_ALL and these appear in the bold frames for each method simultaneously. The results can be seen as the voting results based on the five models.

            For Type_BLD, for instance, there are nine attributes that account for 95 percent of the importance with respect to housing prices. Those attributes from the most important to the least important are floor area, land area, number of rooms, and number of sold parking places, etc. The attribute of floor area in Type_BLD receives the five models’ votes, land area gets four, and number of rooms gets two, and so on.

 

 

 

Table 3: Rankings and votes of attributes

 

Type_ALL

Type_APT

Type_BLD

Type_SUT

Ranking

Attributes

#

Attributes

#

Attributes

#

Attributes

#

1

flr_area

5

flr_area

5

flr_area

5

flr_area

5

2

target_dst

4

target_dst

5

lnd_area

4

target_dst

5

3

pk_area

3

bathroom

4

room

2

flat_age

5

4

lnd_area

2

lnd_area

4

prk_sold

3

lnd_area

4

5

room

2

pk_area

3

target_dst

5

cstrct_tp

3

6

prk_sold

3

sit_room

4

pk_area

3

pk_type

4

7

total_flat

3

prk_sold

2

total_flat

2

total_flat

2

8

bathroom

1

flat_age

3

bathroom

3

pk_area

4

9

cstrct_tp

3

target_tp

2

cstrct_tp

2

prk_sold

1

10

cmptmt

1

room

4

flat_age

2

lndusg_tp

2

11

sit_room

2

cstrct_tp

2

pk_type

1

bathroom

3

12

flat_age

2

ym_sold

1

sit_room

1

room

3

13

target_tp

3

cmptmt

3

target_tp

1

cmptmt

2

14

pk_type

1

pk_type

2

cmptmt

0

flat_type

2

15

lndusg_tp

2

lndusg_tp

2

flat_type

0

target_tp

1

16

ym_sold

1

flat_type

1

lndusg_tp

0

sit_room

0

17

flat_type

1

mgt_cmt

1

mgt_cmt

0

ym_sold

0

18

hs_tp

0

total_flat

0

ym_sold

0

mgt_cmt

0

19

mgt_cmt

0

 

 

 

 

 

 

            Floor area is the most important and most robust attribute, and all five models agree with the three types. There are many studies whose findings are in line with this point of view. Sirmans et al. (2006) stated that floor area is perhaps the most important structural attribute in determining house prices (SIRMANS; MACDONALD; MACPHERSON; ZIETZ, 2006). In addition, Bracke (2015) showed that the contribution of floor area is positive for housing prices.  Xiao et al. (2016) also said that property prices increase as floor area increases.

            Moreover, to discover the characteristics of these attributes, this paper extracts the 10 most important attributes from Type_ALL in Table 3 and uses these attributes as the baseline. For each type of house, we sum each model’s votes (#), sum the rankings of each attribute before averaging them, and, finally, calculate the variances of the rankings as shown in Table 4.

 

Table 4: Statistics on rankings and votes

Ranking

Attributes

Sum of Votes

Sum of
Rankings

Averaged
 Rankings

Variances
of Rankings

1

flr_area

15

3

1.00

0.00

2

target_dst

15

9

3.00

4.50

3

pk_area

10

19

6.33

0.50

4

lnd_area

12

10

3.33

2.00

5

room

9

25

8.33

24.50

6

prk_sold

6

20

6.67

4.50

7

total_flat

4

32

10.67

60.50

8

bathroom

10

22

7.33

12.50

9

cstrct_tp

7

25

8.33

2.00

10

cmptmt

5

40

13.33

0.50

            There are various other inferences obtained from these attributes. By identifying the attributes, these inferences will be discovered. Those attributes in Table 4 that occupy over 50 percent of total votes (15) are referred to as major. Meanwhile, those that have relatively small variances of rankings (less than 5) are referred to as stable.

            Thus the major-stable attributes are identified in red shading, such us floor area, administrative districts, parking and land area, due to their high importance and relatively small variances. Similarly, those major-unstable attributes appear with orange shading, the minor-stable ones with yellow and the minor-unstable ones with green.

            First, del Cacho (2010) stated that location is a factor of paramount importance when determining the pricing of a property. Second, in downtown areas and inner cities, parking requirements could profoundly alter the housing stock (MANVILLE, 2013). Therefore, parking requirements can increase the price of real estate (SHOUP, 2014). Finally, a larger land area leads to more floor area in each of those three types of housing. Therefore, land area is also an indicator.

            Furthermore, the attributes that are referred to as type-dependent attributes show up in the bold frames for Type_APT and Type_SUT, but do not appear in the bold frames for Type_ALL in Table 3. This indicates that different types have their own favorite attributes in addition. Finally, there are attributes outside the bold area for each type of housing that are referred to as others. Those attributes are less important.

            By identifying the attributes, the roadmap of importance as shown in Figure 3 is constructed. This could serve as a reference when people appraise a house in Taipei. For example, when people want to buy a condominium with an elevator, the first considerations will be floor area, target district, parking area and land area, all of which are major-stable attributes. Next, major-unstable attributes, such as the numbers of rooms and bathrooms, followed by minor attributes, will be taken into account. Finally, other attributes will be considered.

            The roadmap depicts the diversities of attributes.  The same type of major-unstable attributes, for example, the number of rooms and bathrooms, appears in different ranking positions. The apartments and condominiums with an elevator are preferred in terms of the number of rooms and bathrooms than the suites. This road map helps us to price the houses.

 

Figure 3: Roadmap of important attributes

            The attributes in the bold area may or may not always be important. In view of this, we captured those attributes in the bold frames in Table 3 and reran those five models. The total amounts of the independent attributes used in Type_APT’, Type_BLD’ and Type_SUT’ were changed to 13, 9 and 12, respectively.

            Those attributes were considered to be the most important 95 percent from the appraisals of the five models for each housing type. The consequences are listed in Table 5.  The yellow shadings reflect belonging to the better parts of the performances than in the previous experiment that adopted all 19 attributes in the evaluation. The green parts were worse and the white parts were equal.

            Type_SUT’ performs better in the situation where only 12 important attributes were used.  This indicates that most of the important attributes for Type_SUT’ were found in this paper. However, those attributes for Type_APT’ and Type_BLD’ did not work as well as those for Type_SUT’. This reveals that there are attributes that were considered to be more important than this research discovered that were not exposed.

Table 5: Measurement results for 3 major types

Model

Measurement

Type_APT’

Type_BLD’

Type_SUT’

DT

MAPE
"<"

62.11

48.57

31.03

ANN

45.76

33.66

16.59

MR

50.18

44.07

23.76

MT

41.52

28.36

16.99

RF

40.05

27.13

16.90

DT


">"

0.49

0.71

0.66

ANN

0.56

0.83

0.81

MR

0.56

0.73

0.76

MT

0.56

0.83

0.78

RF

0.58

0.83

0.83

DT

RMSE
"<"

5003089

15051426

2581766

ANN

4621929

11397272

1918152

MR

4620246

14371934

2196793

MT

4612869

11513048

2090838

RF

4530046

11428551

1840169

DT

MAE
"<"

3642247

8492579

189946

ANN

3278997

6381590

1254615

MR

3324592

7992023

1530882

MT

3298646

5671468

1378868

RF

3205514

5095230

1190809

DT

COR
">"

0.70

0.84

0.82

ANN

0.75

0.91

0.90

MR

0.75

0.86

0.87

MT

0.75

0.91

0.89

RF

0.77

0.92

0.91

6.     CONCLUSION

            In this study, five data mining techniques were constructed from the Actual Price Registration of Taiwan to examine those models’ performances in regard to prediction, and to find out the relatively important attributes that will help to identify which attributes are more important according to the type of houses. In such a big data era with huge volumes of data, variables and methods, this paper delineates a road map for the selection of variables in relation to house prices.

            First, this paper used five measures, namely, the MAPE, R², RMSE, MAE and COR, to evaluate those five models’ performances in terms of prediction. In general, there was no one single best model that could satisfy all three types of houses concurrently. While random forests were more suitable for apartments, ANN were more reliable for the condominiums with elevator(s) and for the suites.  The only reason for this was that the patterns of each housing type were not completely similar. Therefore, the model selected was dependent on the housing type.

            Second, Figure 3 will help us to identify which attributes are important and their rankings. Through the process of identification, influential factors will be shown in sequence, and decisions to buy or set prices will be made. 

            Suggestions for future studies include vicinity issues, such as the distances to schools, department stores and parks, etc. That should be taken into account. This research lacks this kind of information. However, the models used could be revalidated when having such data. More new findings about the neighborhood of the houses will be obtained.

REFERENCES

ACCIANI, C.; FUCILLI, V.; SARDARO, R. (2011) Data Mining in Real Estate Appraisal: A Model Tree and Multivariate Adaptive Regression Spline Approach. Aestimum, v. 58, p. 27-45.

BAHIA, I. S. H. (2013) A Data Mining Model by Using ANN for Predicting Real Estate Market: Comparative Study. International Journal of Intelligence Science, v. 3, n. 4. p. 162-169.

BREIMAN, L.; FRIEDMAN, J. H.; OLSHEN, R. A.; STONE, C. J. (1984) Classification and Regression Trees, Belmont, CA: Wadsworth.

BREIMAN, L. (2001) Random Forests. Machine Learning, v. 45, n. 1, p. 5-32.

BRACKE, P. (2015) House Prices and Rents: Microevidence From A Matched Data Set in Central London. Real Estate Economics, v. 43, n. 2, p. 403-431.

COAKLEY, J. R.; BROWN, C. E. (2000) Artificial Neural Networks in Accounting and Finance: Modeling Issues. International Journal of Intelligent Systems in Accounting, Finance and Management, v. 9, n. 2. p. 119-144.

CORTEZ, P. (2016) Package ‘rminer’. Available: https://cran.r-project.org/web/packages/rminer/rminer.pdf . Access: 2th September, 2016.

DEL CACHO, C. (2010) A Comparison of Data Mining Methods for Mass Real Estate Appraisal (No. 27378). Munich Personal RePEc Archive.

DELMENDO, L. C. 2016. Taiwanese House Prices Continue to Fall Due to Harsh Taxes. Retrieved on September 16, 2016, from http://www.globalpropertyguide.com/Asia/Taiwan/Price-History

FAN, G. Z.; ONG, S. E.; KOH, H. C. (2006) Determinants of House Price: A Decision Tree Approach. Urban Studies, v. 43, n. 12, p. 2301-2315.

FIK, T. J.; LING, D. C.; MULLIGAN, G. F. (2003) Modeling Spatial Variation in Housing Prices: A Variable Interaction Approach. Real Estate Economics, v. 31, n. 4, p. 623-646.

FONG, S.; WAH, Y. B. (2013) A Prediction Model for Forecasting the Trend of Macau Property Price Movements and Understanding the Influential Factors. Journal of Emerging Technologies in Web Intelligence, v.5, n. 2, p. 122-131.

GAN, V.; AGARWAL, V.; KIM, B. (2015) Data Mining Analysis and Predictions of Real Estate Prices. Issues in Information Systems, v. 16, n. 4, p. 30-36.

GOODMAN, A. C. (1978) Hedonic Prices, Price Indices and Housing Markets. Journal of Urban Economics, v. 5, n. 4, p. 471-484.

JAMES, G.; WITTEN, D.; HASTIE, T.; TIBSHIRANI, R. (2013) An Introduction to Statistical Learning, New York: Springer.

KASS, G. V. (1980) An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, v. 29, n. 2, p. 119-127.

KUHN, M.; WESTON, S.; DEEFER, C.; COUTLER, N. (2016) Cubist Models for Regression, Available: https://cran.r-project.org/web/packages/Cubist/vignettes/cubist.pdf . Access: 10th December, 2016.

MAGIDSON, J. (1994) The CHAID Approach to Segmentation Modeling: Chi-squared Automatic Interaction Detection, in: BAGOZZI, R. P. (Ed.), Advanced Methods of Marketing Research. Malden (Mass. US): Blackwell Business, p. 118-159.

MANVILLE, M. (2013) Parking Requirements and Housing Development: Regulation and Reform in Los Angeles. Journal of the American Planning Association, v. 79, n. 1, p. 49-66.

MULLEY C. (Ed.), Parking: Issues and Policies. United Kingdom: Emerald Publishing, p. 87-113.

MUNUSAMY, M.; MUTHUVEERAPPAN, C.; BABA, M.; ABDULLAH, M. N.; ASMONI, M. (2015). An Overview of the Forecasting Methods Used in Real Estate Housing Price Modelling. Jurnal Teknologi, v. 73, n. 5, p. 189-193.

QUINLAN, J. R. (1986) Induction of Decision Trees. Machine Learning, v. 1, p. 81-106.

QUINLAN, J. R. (1992) C4. 5: Programming for Machine Learning, San Mateo, CA: Morgan Kauffmann.

SHOUP, D. (2014) The High Cost of Minimum Parking Requirements, in: ISON, S.;

SIRMANS, G. S.; MACDONALD, L.; MACPHERSON, D. A.; ZIETZ, E. N. (2006) The Value of Housing Characteristics: A Meta Analysis. The Journal of Real Estate Finance and Economics, v. 33, n. 3, p. 215-240.

WELCH, T. F.; GEHRKE, S. R.; WANG, F. (2016) Long-term Impact of Network Access to Bike Facilities and Public Transit Stations on Housing Sales Prices in Portland, Oregon. Journal of Transport Geography, v. 54, p. 264-272.

WITTEN, I. H.; FRANK, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques, 5 ed. Boston, MA: Morgan Kaufmann.

WOODS, E.; KYRAL, E. (1997) Ovum Evaluates Data Mining, London: Ovum.

XIAO, Y.; ORFORD, S.; WEBSTER, C. J. (2016) Urban Configuration, Accessibility, and Property Prices: A Case Study of Cardiff, Wales. Environment and Planning B: Planning and Design, v. 43, n. 1, p. 108-129.