Summary: I. Introduction. II. Precautionary Measures and Pretrial Risks. III. Methods. IV. Results. V. Discussion. VI. Conclusions. VII. Conflict of Interest Statement. VIII. Acknowledgements. IX. References.
I. Introduction
A systematic review of instruments aims to identify gaps in knowledge and assist in selecting the most suitable tool to measure the variable in question regarding a specific population.1 In this case, we focus on tools for pretrial risk assessment in Mexican juveniles. A review can also provide information about measurement properties, defined as an aspect of the quality of an instrument, which in turn can be divided into three main domains, 1) validity, 2) reliability and 3) responsiveness. These properties are explained below.
Validity reflects the degree to which a tool measures the variable intended to measure, for instance, if it is adequately based on a general review (face validity), expert opinion (content validity), statistical confirmation of the underlying theoretical elements that compose the variable (structural validity), consistency with empirical evidence (hypotheses testing), adaptation of the original version of the tool in a different population (cross-cultural validity) and comparison with an instrument considered as a “gold standard” (criterion validity). Reliability indicates if the measurement is free of error, i.e., that changes in the score reflect changes in the variable under different conditions, for example, the degree of interrelatedness among different items (internal consistency) or consistency through repeated applications (test-retest). Last, responsiveness refers the ability to detect changes in the variable over time such as a change in the score of the tool2. Each of these properties requires a particular type of study to assess them and this review describes the methodology used to analyze the studies of selected instruments for pretrial risk assessment.
II. Precautionary Measures and Pretrial Risks
In the 1960s, Pretrial Justice Services (PJS) were implemented in the United States of America (USA).3 Nowadays, they operate in different countries like Canada, the United Kingdom, Australia, Chile, and Mexico to create quality information for evaluating and supervising the conditions imposed by the Court.4 These conditions, called precautionary measures, look to guarantee the effectiveness of the criminal procedure and reduce the likelihood of pretrial failure.5
The Inter-American Commission on Human Rights6 defines pretrial failure as 1) failure to appear (or FTA) in court or flight and 2) hampering the criminal investigation. However, admission of pretrial misconduct varies across countries and jurisdictions. For instance, in North America, this failure is characterized by failure to appear and/or the commission of another public offense before the end of the trial, a situation which is also known as public safety.7 In Latin America, pretrial failure consists of failure to appear; acting against personal integrity or putting the life of a victim, offended party, witnesses or the community at risk; and/or interfering with the criminal investigation by altering or falsifying evidence, intimidating witnesses and threating or hampering the work of the actors involved.8
According to theoreticians9 and international Juvenile Justice standards,10 precautionary measures must comply with the principles of minimum intervention while promoting non-custodial measures, rationality according to the impact caused by behavior, suitability to a given objective, and necessity based on a selection of the measures that are the least restrictive of rights. Therefore, preventive detention must be used as a last resort, for the shortest possible time and when there is a need for caution due to pretrial risk. To this end, there is a diverse catalog of non-custodial measures, which include periodic appearances in court, prohibition from leaving a specific territory, and banning contact with certain persons.11
Pretrial Risk Assessment
In 1993, the Juvenile Detention Alternatives Initiative (JDAI) was created with the primary objectives to encourage non-custodial measures, avoid overcrowding facilities, improve conditions in detention facilities, and deter pretrial failure.12 To achieve this, one fundamental strategy is the implementation of evidence-based pretrial risk assessment instruments (RAI) that assist judicial decision-making regarding the best precautionary measures,13 while ensuring that personal characteristics of the accused and prior criminal charges do not bias decisions.14 This could be one reason why violence RAI are not suitable to assess pretrial failure in juvenile offenders because risk factors like criminal sentences, personality traits, substance use and friends with antisocial behaviors are mainly applicable for predicting violent recidivism,15 while pretrial risks focus on information relevant to procedural purposes as a means to lower the probability of pretrial failure while the ruling is being determined. Even though both take into account general principles of risk assessment (e.g., the intensity of intervention should be proportional to the level of risk obtained through evidence-based factors), violence and pretrial failure are different behaviors that require a distinctively different approach.
Guidelines establish that pretrial RAI should take a risk-protective approach through an evaluation of individual, contextual and situational factors based on empirical and normative criteria.16 Some minimum areas to be assessed17 include community ties, delinquent behavior, and collateral factors (Table 1). This information is then verified through interviews (face-to-face or by telephone) with informants such as family, teachers, or friends. Domiciliary visits, a review of legal files, and other types of documentation may also be considered.18 Once the information is verified, a risk assessment is made by calculating an overall and behavior-specific risk score that guides the release or detain decision.19
Table 1 Required minimum sections in juvenile pretrial risk assessment instruments
| Section | Content |
|---|---|
| Community Ties | Residential stability, cohabitants, economic dependents, employment stability, education, family and peer relationships, facilities to leave the country or remain hidden, and social context |
| Delinquent Behavior | Current offense, legal status, prior and pending cases or petitions, infractions, behavior in detention, severity of the foreseen sanction, weapon involvement, aggression against victim or witnesses, prior pretrial misconduct, and violations of prior judicial conditions |
| Collateral Factors | Aggravated or mitigated risk score in previous areas, including the age at intake, family environment safety and stability, escape or runaway history, school performance and attendance, first offense, degree of involvement in the offense, mental health condition, etc., not to be considered if not supported by the information system |
Note. These elements are merely enunciative, but not limited to other areas of evaluation. Developed by the author based on the guidelines established by Pretrial Justice Services.20
In the USA, the Pretrial Justice Institute (PJI) and the National Association of Pretrial Service Agencies (NAPSA) created guidelines and standards for pretrial release and diversion.21 Pioneer states like California, Florida, New Mexico, and Virginia have implemented and validated detention RAI for juvenile offenders. To date, more than 15 US states have implemented them.22
In Latin America, some efforts have been made since the implementation of the Accusatory Criminal Justice System. Mexico23 and Chile24 have developed pretrial justice service implementation manuals, comprising a comprehensive model of evaluation and supervision with pretrial risk assessment standards. In Mexico, pretrial justice services are commonly called Unidades de Medidas Cautelares (UMECAs).25 The UMECA of the State of Morelos in Mexico was one of the first to implement RAI,26 but its measurement properties are unknown, much less its impact on pretrial release and detention rates27 even though it is used for determining the rationality and suitability of precautionary measures.28
When designing a new tool, it is recommendable to examine the instruments of different pretrial justice services in order to identify common variables, especially those that have been effective and validated in the referral population.29 This underlines the urgency of conducting pretrial risk assessments based on validated tools with adequate measurement properties which simultaneously meet theoretical and normative risk assessment criteria, especially for juveniles in conflict with the law, a person between 12 and 17 years of age accused of criminal behavior.30
Hence, a standardized procedure is needed to select the most suitable instruments to assess pretrial risks along the lines of the protocol developed by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative, which seeks to reinforce a selection of outcome measurement instruments in clinical and research fields.31
This study aims to assess and summarize the quality of measurement properties of pretrial risk assessment instruments for Mexican juveniles in the Comprehensive Criminal Justice System for Adolescents (Sistema Integral de Justicia Penal para Adolescentes or SIJPA), through a systematic review using the COSMIN methodology.
III. Methods
This systematic review follows COSMIN guidelines for searching and evaluating measurement properties:32
1. Search Strategy
A literature review was performed in 1) the PubMed database 2) metasearch engines;33 the UNAM General Office for Libraries and Digital Information Services (DGB-UNAM, in Spanish) and Google Scholar, and 3) libraries found on pretrial organization websites; Criminal Procedure Justice Institute (IJPP, in Spanish), Juvenile Justice Advocates International (JJAI) and Institute for Legal Research (IIJ, in Spanish).
The Peer Review of Electronic Search Strategies (PRESS) 2015 Checklist34 and the search strategy of the COSMIN with a sensitive filter for studies on measurement properties35 were taken into account for a more precise search (Table 2). Previously specified criteria were considered for potential article selection (Table 3). Language or time restrictions (from its inception to June 15, 2022) were not placed in order to make the search as extensive as possible.
Table 2 Search strategy used per database
| Database | Search terms |
|---|---|
| PubMed | (pretrial OR detention OR probation) AND (“Risk Assessment”[Mesh] NOT “violence risk”)† AND (“Adolescent”[Mesh] OR juvenile OR youth) AND measurement properties filter‡ |
| DGB-UNAM§ | (pretrial OR detention assessment OR detention risk OR public safety risk OR failure to appear OR FTA OR flight risk) AND (adolescent* OR juvenile OR youth) AND (validation OR psychometr* OR clinimetr* OR development) |
| Google Scholar | (“pretrial risk” OR “pretrial failure” OR “public safety risk” OR “failure to appear”) AND (adolescent* OR juvenile OR youth) AND (Mexico OR Latino) AND (“validation study” OR psychometr*) |
| Institute for Legal Research (IIJ) | “adolescente” AND “cautelar” |
| Other web sites ¶ | No search terms were used. A manual review of their resources was conducted. |
† Some violence risk instruments assess general recidivism which include violating probation or parole conditions36 that could be compatible with pretrial risk assessment. If this were the case, the search terms would include them in the results.
‡Filter developed by Terwee et al.37 to find studies on measurement properties
§ Filtered by type of resources: academic publications, electronic resources, and reports
¶ Includes Criminal Procedure Justice Institute38 and Juvenile Justice Advocates International39
Table 3 Inclusion and exclusion criteria for selection of studies
| Criteria | Description |
|---|---|
| Inclusion |
|
| Exclusion |
|
† For development studies of instruments not originally written in Spanish, other population groups were used for reasons of inclusion.
An additional strategy was proposed in the event that no pretrial RAI were found. Since it is an acceptable practice to consider instruments developed with similar population characteristics and theoretical models,40 an open database from the Mexican Government was consulted41 to search for instruments from National Surveys with juveniles assessing the recommended risk assessment variables (Table 1).
In this regard, an advanced search was used in the Gobierno section with the terms “encuesta nacional,” “Adolescentes,” and “Mujeres” as filters. The selection of surveys was made based on titles, objectives, and conceptual design. Potential resources for information on pretrial risk assessment underwent a general review of the questionnaire contents for face validity.
Once a potential instrument was identified, another search was performed using the search terms listed in Table 4 to find measurement property studies including Mexican or Latino juveniles. The selection of articles was made based on the title and the abstract. All articles were reviewed independently by two reviewers. The article was included for analysis if at least one reviewer considered it relevant. References were also checked for potentially relevant studies.42
Table 4 Search strategy used for development or validation studies
| Database | Search terms |
|---|---|
| PubMed † | “Measurement instrument” ‡ AND (“Adolescent”[Mesh] OR juvenile OR youth) AND measurement properties filter§ |
| DGB-UNAM¶ | “Measurement instrument” ‡ AND (adolescent* OR juvenile OR youth) AND (validation OR psychometr* OR clinimetr* OR development) |
| English: “Measurement instrument” AND adolescent* AND (validac* OR psicometr*) | |
| Scholar | Spanish #: “Measurement instrument” AND adolescent* AND (validac* OR psicometr*) |
Note. Searches were conducted separately for Spanish and English sources.
† A validation study filter was applied, except for the Parent-Child Conflict Tactics Scale.
‡ Replaced by each instrument name (Spanish and English) or abbreviations, if applicable, as text words. Some alternatives were used specifically for each language. In English, the terms Social Insecurity Perception Scale, “risk perception scale” OR “social insecurity perception scale” were used. For Spanish, the terms Social Insecurity Perception Scale, “escala percepcion inseguridad social” OR “escala inseguridad percibida” were used, and APQ “parentalidad alabama” OR “practicas parentales alabama” OR “estilos parentalidad alabama” were also used. The preposition “de” in an instrument’s name in Spanish was not included as search term, except in the case of Google Scholar.
§ Filter developed by Terwee et al.43 to find studies on measurement properties
¶ Filtered by type of resources: academic publications and thesis
# For the Alabama Parenting Questionnaire, instrument names (“parentalidad alabama” and “practicas parentales alabama”) were used in separate searches.
2. Evaluation of Measurement Properties
According to COSMIN methodology, the assessment was performed in three stages.44 Two reviewers conducted analyses independently and an additional reviewer resolved any discrepancies.
A. Evaluation of development and content validity45
Consideration was given to general design characteristics, such as theoretical framework, population characteristics, sample size, methodology relevancy, and statistical analyses for concept elicitation, identification of items, and pilot testing. Content validity includes the evaluation of relevance, comprehensiveness, and comprehensibility. The criteria of each study were scored on a four-point scale ranging from inadequate (I) to very good (V) using the COSMIN Risk of Bias Checklist46 to assess the methodological quality of studies and determine whether the results are reliable from a total score based on its lowest rating.
In this section, the design criteria47 for a study performed in a sample representing the target population and qualitative methodology for concept elicitation was modified according to the literature48 that considered the examination of instruments with compatible theoretical models a suitable methodology.
B. Evaluation of other measurement properties49
The methodological quality of construct validity, criterion validity, reliability, and responsiveness were examined by sample size, method and statistical analysis suitability, and description of bias. Similarly, a four-point scale rating (Inadequate to Very good) was used for each set of criteria and the lowest rating was reported as the total score. In addition, values were compared against criteria for good measurement properties50 to determine if the measurement property was sufficient (+), insufficient (-) or indeterminate (?).51
C. Evaluation of quality of evidence52
If more than one study assessing a measurement property was found, results would be qualitatively summarized per instrument and compared against the criteria for good measurement properties to determine whether the measurement property is sufficient (+), insufficient (-), inconsistent (±) or indeterminate (?). For development and content validity, ten criteria for good content validity were graded based on studies and the reviewer’s rating.53 Next, an overall rating per criteria was assigned, prioritizing the study results to reduce subjective judgment. Lastly, the quality of evidence was rated using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach, starting from high quality and progressively downgraded for risk of bias, inconsistency, imprecision, and indirectness, depending on whether it was serious, very serious or extremely serious.
IV. Results
1. Search Results
The results of the first research strategy were 1) PubMed: 34 articles, 2) DGB-UNAM: 66 articles, and 3) Google Scholar: 149 articles. No instruments were found specifically for Mexican or Latino populations. No publications were found on Criminal Procedure Justice (28 reports), Juvenile Justice Advocates International (18 reports) and Institute for Legal Research (6 articles, 2 reports) websites either. Consequently, the second strategy was employed.
From the findings of the second strategy (Figure 1, Search 1), one survey was selected since it was the only report with previous validation: Mexico National Survey of Drug Use Among Students (ENCODE, in Spanish).54 Surveys with juvenile offenders55 were excluded due to the lack of information on the scale design or validation in the methodology report. According to variables in pretrial RAI, five scales were selected: 1) the Peer Scale (Escala de Grupo de Amigos),56 2) the Social Insecurity Perception Scale (Escala de Percepción de Inseguridad Social),57 3) the Alabama Parenting Questionnaire (APQ),58 4) the Family Environment Scale (FES) (Escala de Ambiente Familiar),59 and 5) the Parent-Child Conflict Tactics Scale (CTSPC) (Escala de Tácticas de Conflicto Padre- Hijo).60
† Includes Special report, Adolescents: Vulnerability and Violence64 and the National Survey on Adolescents in the Criminal Justice System65
‡ Includes the National Survey on the Dynamics of Household Relationships66
§ Includes the National Survey of Drugs, Alcohol and Tobacco Consumption 2016-201767 and the Diagnosis of adolescents who commit serious crimes in Mexico68
¶ Results with search terms in English are noted as nE. and those with search terms in Spanish are noted as nS
# Refers to Data Analysis and Survey Unit of National Institute of Psychiatry records69
These studies yielded 1195 results (Figure 1, Search 2) although the Peer Scale was removed since no relevant studies were found therein. After a full-text screening, two additional scales were included: Family Environment Scale for Adolescents (Escala de Ambiente Familiar para Adolescentes or EAFA)61 and FES-Short Form (Escala de Ambiente Familiar-Versión abreviada).62
The main characteristics of the instruments are described in Table 5. All include self-report tools which mostly assess different aspects of the parent-child relationship, especially from the child’s point of view, using a 4-point ordinal scale or predefined frequency categories. The Social Insecurity Perception Scale assesses the social environment.
Table 5 Characteristics of the outcome measurement instruments included in the searches
| Instrument | Author(s) (year of publication) | Construct | Mode of administration† | Number of scales (number of total items); Range of score | (Sub)scale(s) (number of items) | Response options | Language |
|---|---|---|---|---|---|---|---|
| Alabama Parenting Questionnaire (APQ) | Frick (1991) | Parenting practices related to externalizing behaviors in children | Self-report and Inter view (Child and Parent) | 5 (42); 42-168 | Involvement (10), positive parenting (6), poor monitoring/supervision (10), inconsistent discipline (6), corporal punishment (3), and other discipline practices (7) | 4-point frequency scale (1-4: never, sometimes, often, and always) | English. Transl. Spanish |
| Family Environment Scale | Villatoro et al. (1997) | Family environment: communication, support and cohesion | Self-report | 6 (42); 42-168 | Hostility and rejection (11), parent communication (9), child communication (9), parent support (7), daily child support (6), and significative child support (6) | 4-point frequency scale (1-4: hardly ever, sometimes, frequently, very frequently) | Spanish |
| Family Environment Scale - Short Form | Quiroz et al. (2007) | Family environment: communication, support, and cohesion | Self-report | 5 (18); 18-72 | Hostility and rejection (NR), parent communication (NR), child communication (NR), parent support (NR), and daily child support (NR) | 4-point ordinal scale (1-4: hardly ever, sometimes, frequently, very frequently) | Spanish |
| Family Environment Scale for Adolescents (EAFA) | Ruiz-Cárdenas et al. (2017) | Family environment: perception of family relationships regarding discipline, communication, problem solving, and affection | Self-report | 5 (25); 25-100 | Parent conflict (6), lack of family communication (5), lack of family habits and rules (6), hostility (5), and family acceptance (3) | 4-point frequency scale (1-4: hardly ever, sometimes, frequently, almost always) | Spanish |
| Parent- Child Conflict Tactics Scale (CTSPC) | Straus et al. (1995) | Psychological and physical maltreatment, neglect, and nonviolent modes of discipline | Self-report and Interview (Child and Parent) | 3 (22); 0-22+, depends on response category Suppl.=3(13) | Nonviolent discipline (4), psychological aggression (5), and physical assault: corporal punishment (5), physical maltreatment (4), severe physical maltreatment (4) Suppl. Weekly discipline (4), neglect (5), sexual maltreatment (4) | Overall 7 frequency categories: this has never happened, not in the past year, but it happened be fore, more than 20 times, 11-20 times, 6-10 times, 3-5 times, twice and once in the past year | English. Transl. Spanish |
| Social In security Perception Scale | Villatoro et al. (1997) | Social risk: perception of neighborhood safety and danger | Self-report | 3 (15); 15-60 | Distant risk (9), social safety (3), and personal risk (3) | 4-point ordinal scale (1-4: completely agree, agree, disagree, completely disagree) | Spanish |
Note. Instruments are displayed in alphabetical order. Transl.= Translation, Suppl.= Supplemental, and NR= Not reported.
† If administration is collected directly from the user, it is self-reported, but if a professional is required to interpret or complete the assessment, the information is obtained through an interview.
Of these studies, 71% (n=5) were carried out with Mexicans. US studies (29%, n=2) consist of development studies. 86% (n=6) used a cross-sectional design with probabilistic sampling (57%, n= 4). The study design corresponds to secondary analyses of the surveys with general (29%, n=2) and student population (57%, n=4) ranging from 6 to 23 years of age. 43% (n=3) rely on informant reports provided by primary caregivers. Key characteristics of studies are displayed in Table 6.
Table 6 Characteristics of development and validation studies found over the course of the research
| Instrument | Author(s) (year) [Reference] | Purpose of study | Study design (Sampling) | Study population | City (Country) | |
|---|---|---|---|---|---|---|
| N (% Male) | Age Mean ± SD, Range years | |||||
| Alabama Parenting Questionnaire (APQ) | Shelton et al. (1996) | To compare the assessment of parenting practices across informants and methods using several indices of reliability and validity. | Longitudinal (Non-probabilistic) | N=160 children and their caregivers, n=124 (M=81%) clinic-referred, n=36 (M=73%) volunteer. |
|
Alabama (USA) |
| Alabama Parenting Questionnaire (APQ) | Robert (2009) | To examine parenting practices in Mexico and assess the usefulness of the APQ with Mexican caregivers. | Cross-sectional (Probabilistic) | N=829 female primary caregivers (mothers: n=829, grandmothers: n=24, missing: n=9) and their children (n=862, M=48%) |
|
Nuevo León (México) |
| Family Environment Scale | Villatoro et al. (1997) | To present the validity and reliability of a scale aimed to evaluate the adolescent’s perception of their family environment. | Cross-sectional (Probabilistic) | N=793 students (M=46.8%) |
|
Mexico City (México) |
| Family Environment Scale - Short Form | Quiroz et al. (2007) | To determine the relationship between past experiences of mistreatment or inadequate familiar environment and the presence of antisocial behavior in adolescents. | Cross-sectional (Probabilistic) | N=3,603 students (M=NR) |
|
Mexico City (México) |
| Family Environment Scale for Adolescents (EAFA) | Ruiz-Cárdenas et al. (1998) | To obtain the construct validity of the Family Environment Scale for Adolescents. | Cross-sectional (Non-Probabilistic) | N=391 students (M=48.8%) |
|
Mexico City (México) |
| Parent-Child Conflict Tactics Scale (CTSPC) | Straus et al. (1998) | To create a parent-to-child version of the Conflict Tactics Scale. | Cross-sectional (Probabilistic) | N=1,000 parents (M=51%) and their children (n=1,000) |
|
USA |
| Social Insecurity Perception Scale | Villatoro et al. (1997) | To obtain validity and reliability of a measure of social insecurity and its relation to drug abuse in Mexican adolescents. | Cross-sectional (Non-probabilistic) | N=795 students (M=46.8%) |
|
Mexico City (México) |
Note: Instruments are displayed in alphabetical order. N = total sample size, n= subgroups, µ = mean, SD = standard deviation, M = male, NR= not reported and USA= United States of America.
2. Measurement Properties of the Instruments Selected
A. Evaluation of Development
Most of the scales (n=4) described a clear construct with a defined conceptual framework and context of use, especially with evaluation and epidemiological research applications. The scales are grounded on developmental theories of disruptive and antisocial behavior,70 and sociological theories.71 To note, the APQ does not provide a clear construct definition, while the Social Insecurity Perception Scale and the FES only include a general description of population with non-detailed or unspecified characteristics (e.g., parent, child, adolescent). These criteria give a rating of insufficient.
In terms of concept elicitation, all reported using methodology based on a literature review and previous versions of instruments but did not give detailed information on the methodology and subsequent analyses. Therefore, a doubtful rating was assigned in such cases. The EAFA and the FES-Short Form consider factorial analysis for identifying relevant items. Quiroz et al. 72 state that the FES-Short Form is the result of subsequent analysis of the FES, possibly a factor analysis, but no additional information is provided. Lastly, only the APQ and the CTSPC conducted a pilot study with an adequate sample size (n≥7) of parents for improved clarity, but the particulars of the procedure are not presented.
In summary, the content validity rating was insufficient for the FES, the EAFA, and the Social Insecurity Perception Scale due to inconsistent relevance and unassessed comprehensiveness and comprehensibility. As stated above, general design characteristics are not specified and no justification for either the selected response category or the recall period is provided. Similarly, the APQ relevance was insufficient while comprehensibility was sufficient. The CTSPC relevance was sufficient, but comprehensibility could not be rated as the only available validation was found in the English version. Comprehensiveness was unassessed on any scale. A detailed evaluation for content validity is described in Table 7.
Table 7 Content validity rating of potential instruments for pretrial risk assessment
| Instrument or Acronym | Alabama Parenting Questionnaire | Family Environment Scale Family | Environ ment Scale for Adolescents | Parent-Child Conflict Tactics Scale - Parent Form | Social Insecurity Perception | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Type of Study | DS (Shelton et al., 1996) | CV (Robert, 2009) | Authors | DS (Villatoro et al., 1997) | Authors | DS (Ruiz-Cárdenas et al., 2017) | Authors | DS (Straus et al., 1998) | Authors | DS (Villatoro et al., 1997) | Authors | |
| Criteria | ||||||||||||
| Relevance ‡ | ||||||||||||
| 1 | Are the items included relevant to the construct of interest? | - | - | +† | + | +† | + | +† | + | +† | + | +† |
| 2 | Are the items included relevant to the target population of interest? | - | - | +† | - | +† | + | +† | + | +† | - | +† |
| 3 | Are the items included relevant for the context of use of interest? | + | - | +† | - | +† | + | +† | + | +† | - | +† |
| 4 | Are the response options appropriate? | + | - | + | - | + | - | + | + | - | - | + |
| 5 | Is the recall period appropriate? | ? | - | ? | ? | ? | ? | ? | - | + | ? | ? |
| Rating | ± | - | + | ± | + | ± | + | + | + | ± | + | |
| Overall Rating c | ± | ± | + | ± | ||||||||
| Comprehensiveness‡ | ||||||||||||
| 6 | Are all key concepts included? | - | - | + | - | + | - | + | - | + | - | + |
| Rating | - | - | + | - | + | - | + | - | + | - | + | |
| Overall Rating c | - | - | - | - | - | |||||||
| Comprehensibility‡ | ||||||||||||
| 7 | Does the population of interest understand the instructions as intended? | ? | ? | NA | - | NA | - | NA | ? | NA | - | NA |
| 8 | Does the population of interest understand the items and response options as intended? | ? | ? | NA | - | NA | - | NA | ? | NA | - | NA |
| 9 | Are the items appropriately worded? | NA | NA | + | NA | + | NA | + | NA | ? | NA | + |
| 10 | Do the response options match the question? | NA | NA | + | NA | + | NA | + | NA | ? | NA | + |
| Rating | ? | ? | + | - | + | - | + | ? | ? | - | + | |
| Overall Rating§ | + | - | - | - | - | |||||||
| Content Validity Rating§ | ± | - | - | ± | - | |||||||
Note. Instruments are displayed in alphabetical order. The rating was determined using COSMIN criteria for good content validity.73 The abbreviations used are DS: Development study and CV: Content validity study.
† Rating assigned by authors, taking into account construct, population, and context of use for pretrial risk assessment
‡ The rating system is +: Sufficient, -: Insufficient, ±: Inconsistent, ?: Indeterminate, and NA: Not applicable.
§ The rating system is +: Sufficient, -: Insufficient, and ±: Inconsistent.
B. Evaluation of Validity
This domain includes content (n=1), construct (n=6) and criterion validity (n=1). Content validity was assessed from the comprehensibility of the items by an adequate sample size of mothers and children,74 but the quality was doubtful due to the poor description of the procedure. For the Social Insecurity Perception Scale, a dissertation75 described a comprehensibility assessment for a 9-item scale, but the procedure was not clearly described. This particular paper was not included as it was not a study on measurement properties.
Regarding construct validity, 1) Cross-cultural validity was not reported, not even for the Spanish versions of the APQ and the CTSPC. Robert76 reported back-translation and pilot test for relevance and clarity issues of the APQ- Parent Form, but cultural equivalence was not examined. Its factorial structure was compared with the English version77 and, as a result, differences were found when sorting the corresponding items in each factor. A previous study78 stated that the APQ validity and reliability were suitable, and some items of the CTSPC were adapted from the Spanish version,79 but the methodology and results are not described. Good internal consistency is only reported for the CTSPC in a study with Mexican juveniles.80 No justification is reported for an item reduction in Spanish versions: APQ-33 items,81APQ-18 items,82 CTSPC-51 items83 and CTSPC -61 items.84
2) Structural validity (n=4) was examined through exploratory (n=2) and confirmatory (n=2) factor analyses with a sufficient sample size, except for the EAFA which obtained an insufficient rating. The APQ85 conducted confirmatory factor analyses, which reported theoretical inconsistency between parental involvement, positive parenting, and poor monitoring/supervision factors. A confirmatory analysis was conducted for the FES to solve conceptual inconsistencies of the parent communication factor, validating a two-factor model and adding a new factor on significant child support.86 For the Social Insecurity Perception Scale, a doubtful rating was assigned because of sampling bias; most participants were classified at a moderate risk level. For the other scales, indetermination was attributed to unspecified fit indices.
3) Hypotheses-testing was assessed through the direct association between negative parenting (APQ and FES - Short Form) with the Child Behavior Checklist (CBCL)87 and the Antisocial Behavior Scale,88 and criminogenic settings (Social Insecurity Perception Scale) with the High School Drug Use Questionnaire.89 Besides, discrimination between groups were evaluated by gender and antisocial behavior (FES - Short Form), as were disruptive behavior diagnoses (APQ). Most of the results support the author’s hypotheses, but some (n=2) were indeterminate due to vague interpretation. Therefore, problematic scales due to inconsistency were positive parenting (APQ), parent and daily child support (FES - Short Form), and social safety (Social Insecurity Perception Scale). Indicators of doubtful quality were related to the inclusion of scales with inadequate internal consistency (α<.70), undescribed fit indices for regression models, unequal group sizes, and unspecified description of subgroups. Inadequate quality was due to a lack of description of the measurement properties of the comparator. Finally, criterion validity was only reported for the FES. As a result of the analysis, a short form was obtained with suitable correlation values for original subscales.90
C. Evaluation of Reliability
This domain comprises internal consistency (n=6) and test-retest reliability (n=1). Most of the scales (n=4) had a good methodological quality, so a doubtful rating relates to the sampling bias described earlier. An indeterminate rating is attributed to a prior determination of structural validity. According to Cronbach’s alpha values, inadequate scales are positive parenting (α=.545), poor monitoring/supervision (α=.623), inconsistent discipline (α=.557) and corporal punishment (α=.408) (APQ), similar to the English version.91 Also, social safety (α=.688) and personal risk (α=.613) (Social Insecurity Perception Scale), as well as the FES-daily child support (α=.680), the EAFA-hostility (α=.681) and the CTSPC-neglect (α=.220). Test-retest reliability was only reported for the APQ using a wide range time interval across interviews (“at least three days apart in a two-to-four-week period”) which also implies a possible training bias due to repeated administration over a short-time period. In this case, it was assigned a doubtful quality and as it was estimated with coefficient alpha, the rating was indeterminate.
3. Quality of Evidence of the Instruments
All development studies obtained very low quality scores attributable to a very serious risk of bias because it only takes into account one poor quality study and no content validity study. The quality for the CTSPC also decreased due to serious inconsistency and serious indirectness in the results as it includes a different population and administration format. As to the APQ, although it considers a doubtful content validity study for the parent form, it also included a different population, making the results largely inconsistent. In the validation studies, low quality (n=8) is related to very serious indirectness and a very serious risk of bias for it due to inclusion of a doubtful study; and very low quality (n=3) is explained by very serious indirectness and a very serious to extremely serious risk of bias in examining doubtful and inadequate studies. Table 8 describes the quality of evidence per measurement property.
Table 8 Methodological quality and measurement properties of potential instruments for pretrial risk assessment
| Instrument or Acronym [Reference] | Instrument Development | Content Validity | Structural Validity | Internal Consistency | Test-retest reliability | Criterion Validity | Hypotheses Testing |
|---|---|---|---|---|---|---|---|
| Alabama Parenting Questionnaire (APQ) † (Shelton et al., 1996) |
|
|
|
|
|
|
|
| Alabama Parenting Questionnaire (APQ) - Parent Form (Robert, 2009) |
|
|
|
|
|
|
|
| Family Environment Scale (Villatoro et al., 1997) |
|
|
|
|
|
|
|
| Family Environment Scale - Short Form (Quiroz et al., 2007) |
|
|
|
|
|
|
|
| Family Environment Scale for Adolescents (EAFA) (Ruiz-Cárdenas et al., 2017) |
|
|
|
|
|
|
|
| Parent-Child Conflict Tactics Scale (CTSPC) - Parent Form (Straus et al., 1998) |
|
|
|
|
|
|
|
| Social Insecurity Perception Scale (Villatoro et al., 1997) |
|
|
|
|
|
|
|
Note. Instruments are displayed in alphabetical order. Methodological quality (MQ) was determined using COSMIN Risk of Bias checklist,92 while property measures rating (R) was based on criteria for good measurement properties.93 The abbreviations used are V: Very good, A: Adequate, D: Doubtful, I: Inadequate, +: Sufficient, -: Insufficient, ?: Indeterminate, NR: Not reported, Q: Quality of evidence, DBD: Disruptive behavior diagnosis, CBCL: Child Behavior Checklist, AB: Antisocial Behavior, HSDUQ: High School Drug Use Questionnaire.
† Only the child report subscale was considered for measurement properties evaluation.
V. Discussion
The main objective of this review was the evaluation of the measurement properties of relevant pretrial RAI for Mexican juvenile offenders. Nevertheless, no development or validation studies were found. This is probably because of the lack of publication practices within the Criminal Justice System. Although some tools have been designed (e.g., UMECA of the State of Morelos), there is no available data, thus making it difficult to examine them.
Nonetheless, there is an enormous number of outcome measurement instruments which can be adapted to forensic settings. For instance, scales designed for epidemiological studies, like those included in the review, are compatible with community ties and family collateral factors in pretrial RAI.94 These scales are appropriate for the Juvenile Justice System because of their criminological and sociological framework of antisocial behavior, a comprehensive construct that encompasses substance use and criminal behavior.95
An unexpected finding was discovered in development studies. Most included an ambiguous description of the design, while others had none. The EAFA and the CTSPC were the only ones that clearly established the main characteristics and only the latter specified response options and a recall period. A pilot test for a developed outcome measurement instrument is not a frequent procedure and if conducted, it generally only gauges comprehensibility. Regarding content validity, one of the most important measurement properties,96 was only reported for the APQ. According to Prinsen,97 98 instruments with poor content validity should not be selected, but when a very low-quality level estimation is not reliable, other properties like internal consistency must be examined. This last property obtained the highest methodological and quality level of evidence, which means it is one of the most reliable estimations, followed by structural and criterion validity.
From validation, the FES obtained the highest quality. Meanwhile, more measurement properties were examined in the APQ and the Social Insecurity Perception Scale, but these had low-quality levels reflecting substantial differences from a true estimation. The APQ was the only one with two studies published. No validation studies were found for the CTSPC.
After the analyses with COSMIN, the selection of instruments for forensic application should be determined by the level of evidence, highlighting the scientific evidence obtained from expert-consensus-standardized methodology as established by the Daubert Standard.99 This implies selecting the FES among parent-child relationship scales, as well as the CTSPC and the Social Insecurity Perception Scale, for evaluating a different domain of family environment and community settings. However, it could be worthwhile to select subscales with adequate methodological quality and sufficient measurement properties that are also supported by evidence of predictors of pretrial misconduct or antisocial behavior. For instance, some studies have reported that positive parenting practices100 like involvement and supervision, mainly through adolescent disclosure,101 are significant predictors. Therefore, the authors encourage adapting involvement from the APQ, lack of family communication from the EAFA and parent support and child communication from the shorter form of FES. As to the others, the CTSPC in its entirety and distant and personal risk from the Social Insecurity Perception Scale are recommended to provide information about safety in the family environment and additional characteristics of the neighborhood. In any case, it is necessary to revise the items, response scales, and recall period to ensure relevance, comprehensiveness, and comprehensibility, mainly because the transcultural adaptation of the Spanish versions of the APQ and the CTSPC is unknown.
These subscales could improve the evaluation of contextual factors with the lowest number of items possible since pretrial RAI should be brief to be used as a screening device and easier to fill out,102 while also adhering to the law.103 In the Mexican Comprehensive Criminal Justice System for Adolescents, information about collateral factors, like mitigating factors, must be considered a benefit104 to ensure that judicial decisions comply with the principles of precautionary measures105 and protect the best interests of the child throughout the criminal process. This is why mental health status cannot be considered an aggravating factor, but an opportunity to detect mental health needs from a public health perspective.106
Studies with juvenile offenders107 report mental health problems such as disorders due to substance use and disruptive behavior which may increase as the criminal case progresses; it is more likely that this will meet the criteria in the latter stages than at the onset.108 This highlights the relevance of the Juvenile Justice System’s prompt detection of needs to guarantee the protection of the right to enjoy the highest attainable standard of health.109 110 In the end, the point of pretrial risk assessment is to balance individual rights with the need for caution, and to accomplish this, validated tools are essential.
In the future, it will be fundamental to consider some challenges in their implementation so as to enhance the effectiveness of these instruments. First, before testing, justice system operators could be invited to take part in the development process to record suggestions and obtain their approval.111 This would improve instrument feasibility and promote a multidisciplinary approach. Second, for validation purposes, the sample must represent the referral population by having similar characteristics (e.g., sex, age, scholarity, etc.), with different charge types and risk levels.112 If statistical analysis is adjusted to these conditions and results are monitored, they could diminish potential bias in judicial decisions when establishing conditions and detention lengths. Last, it may be appropriate to implement structured guidelines for judicial operators regarding scope and limitations in practice, so as to standardize judicial discretion when possible and increase awareness of possible biased outcomes as a result of discretion.113 To sum up, pretrial risk assessment could be a double-edged sword if not supported by data and reliable methodology, but especially if it does not adhere to decision-making guidelines.114
As to the limitations of this study, despite following a well-established protocol, search results were restricted because of database scope and publishing practices about measurement properties. First, the instruments assessed were selected from national surveys, so it is possible that several instruments with a smaller sample size were excluded. Second, the selection of studies was initially made based on the title and the abstract, but it was found that it is common practice to report a development and validation study without it being clearly stated in these sections. Authors recommend searching by domains (e.g., attachment, delinquency behavior) or particular variables to broaden the scope of results for Mexican and Latino populations. This strategy should include a manual search in government databases and institutional repositories that are not included in this review.
VI. Conclusions
This is the first systematic review conducted to identify pretrial RAI for Mexican juvenile offenders, using well-documented criteria like COSMIN. However, no pretrial RAI were found. Authors proposed five self-report instruments that were selected from surveys for evaluating parenting practices and social context. No validated tools were found for delinquent behavior and most variables of community ties, such as residential, employment and school stability. Last, because of the quality level of evidence, the selection of subscales was simply laying the groundwork. More research is needed on the validity and reliability of instruments in order to reach a more solid conclusion.
This review highlights the urgent need for the Mexican Comprehensive Criminal Justice System for Adolescents to use proven and reliable tools that have an impact on detention decisions without infringing on legal principles. Future research should be directed at developing and validating pretrial evidence-based tools with a risk-need approach to encourage the implementation of precautionary measures suited to the Mexican context, the best interests of the child, and legal standards.










nueva página del texto (beta)




