
Traditional multi-credit reporting company (CRC) model developments have involved one of two scenarios:
The development of VantageScore featured a modeling data design superior to that of traditional tri-CRC model developments because it employed an equitable and consistent contribution of data from each CRC. Consequently, biases and variability inherent in the traditional tri-CRC data design were eliminated, resulting in the ability to create a single tri-CRC scoring algorithm that requires neither alignment nor translation among the three CRCs.
By facilitating a single tri-CRC scoring algorithm, the data design allows for a more seamless scoring strategy implementation for credit grantors and easier score interpretation for consumers. It also allows for true tri-CRC characteristic leveling (see related white paper). A detailed description of the VantageScore data design and its advantages over traditional model development data designs will be discussed in the remainder of this paper.
As with the development of any product, the use of flawed inputs results in a flawed end product. As the saying goes, "garbage in, garbage out." This applies to the traditional data design for the development of tri-CRC credit models. The common practice by credit grantors of using three CRC scores to make credit decisions highlights the need for a CRC-based score that is as consistent among the three CRCs as possible. Ideally, the score should:
The traditional data design for the development of tri-CRC models does not meet these requirements. Consequently, the meaning of tri-CRC scores developed using this data design is not as clean as possible. Credit grantors do not have a tool that can be used to gauge risk with consistency and consumers do not have a score that they can interpret easily among the three CRCs.
As described previously, the data design for tri-CRC models had been done in one of two ways:
The first of the traditional data design methods involves the developer independently extracting data from potentially different time frames. The data are then used to create independent models that will contain different characteristics and different point assignments between the three CRCs was used. The resulting models are then aligned to have the same score range and score-to-odds interpretation.
There are several problems with this data design method. First, the data extracted by each CRC may represent different points in time for each CRC, resulting in a bias where seasonality and different credit file compositions at different points in time are represented differently by each CRC. For example, one time period may have substantially more recently opened mortgages than another, resulting in a sub-optimal level of predictive power for that characteristic. Second, the characteristics and associated points that make up the three scores are not consistent. The differences in the data between the three CRCs will result in potentially different characteristics and different point values for the characteristics. This could result in a consumer potentially getting widely different adverse action reason codes between the three CRCs, even with scores that may be close to each other. Third, score alignment is an exercise that requires estimation, thus introducing additional variability to the aligned score.
The second of the traditional data design methods involves the development of the model using a single CRC's data, then force-fitting the remaining CRCs' data into the developed model. As with the first method, there are problems with this method as well. First, the model is biased toward the sampling routine used by the contributing CRC's data, as the other CRCs did not contribute to the development data. Second, the characteristics in the developed model are biased toward the contributing CRC's data. As such, equitable characteristic leveling is not attained because the non-contributing CRC's data are being forced to conform to the contributing CRC, when such conformation may not be possible given the data differences between CRCs.
The development of VantageScore introduced a new and superior data design for tri-CRC model development, which eliminates the drawbacks of the traditional data design. The data design is as follows:
The VantageScore data design is preferable to traditional data design for the following reasons:
The consistent effectiveness of VantageScore across all three CRCs can be illustrated in the graphs shown in Figures 1, 2, 3, and 4, which used the parallel validation sample. The lines or bars in each graph reflect the scores for one CRC and performance from each of the three CRCs, where each CRC contributes one-third of the performance to the population.
Enlarge
Figure 1 shows the consistent bad rate of new accounts across all three CRCs for any given score. The separation of the three CRCs at the lower scores can be attributed to the low population size of the new account population.
Enlarge
Figure 2 shows the KS for new accounts for each CRC across various industries. Within any given industry, each CRC performs virtually identically to the others.
Enlarge
Figure 3 shows the consistent bad rate of existing accounts across all three CRCs for any given score. Because of the large sample size, the three CRCs all converge to a common bad rate at a given score.
Enlarge
Figure 4 shows KS for existing accounts across all three CRCs for various industries. As with the new accounts, the CRCs are virtually identical in KS within any given industry.
The VantageScore data design allows for this type of CRC performance comparison because the consumers in the sample are present in all three CRC files. This type of comparison is not possible under the traditional tri-CRC design method because of the absence of the "three CRC files for the same consumer at the same time" feature present in the VantageScore design.
The data design for traditional tri-CRC model developments have resulted in scores that have elements of a tri-CRC design. However, they fall short of the label "tri-CRC" because they lack an equitable contribution or use of CRC data, or they introduce unnecessary variability to the development process. The development of VantageScore introduces a much more desirable tri-CRC data design, resulting in the first true tri-CRC score consisting of one scoring algorithm for all three CRCs derived using an equitable contribution of data from all three CRCs. This true tri-CRC data design sets the foundation for any other tri-CRC development or analysis efforts, maximizing the benefits of tri-CRC data for credit grantors and consumers.