Research Synthesis · 2025
Measuring the Mind
A Psychometric and Theoretical Foundation for Cognitive Function Assessment
Abstract
This synthesis addresses 42 research questions across seven domains — from Jungian theoretical origins to factor analysis, item-writing bias, scoring algorithms, and ethical obligations of test publication.
Key finding: full four-letter MBTI type stability falls to approximately 39–50% within five weeks. This is the direct, predictable result of forcing normally distributed traits into bimodal categories — not primarily a reliability problem.
The widely used Grant/Brownsword function stack lacks independent factorial validation and must be treated as an unconfirmed theoretical heuristic. Most critically: no published, peer-reviewed factor-analytic study has confirmed that the eight cognitive functions recover eight distinct, independently replicable factors.
Keywords: cognitive functions · Jungian typology · psychometric validity · personality assessment · MBTI · factor analysis
Preface and Scope
This document is a research synthesis — an honest accounting of what the empirical and theoretical literature does and does not support before a single assessment item is written. It is not a manual, a results guide, or a practitioner handbook.
The global personality testing market is dominated by instruments — most visibly the MBTI and its derivative 16Personalities — that are psychometrically weaker than their ubiquity would suggest. The vast ecosystem of free online cognitive function tests has reproduced and amplified the MBTI's weaknesses without adding scientific accountability.
Evidence Quality System
Part I
Theoretical Foundations
Jung's original framework (1921) is richer, more ambiguous, and more developmental than the mechanistic stack-and-type models that have proliferated in popular culture. Each section below traces the intellectual genealogy of the theory and evaluates the empirical status of its key claims.
Figure 1.1
The Eight Cognitive Functions
As theorised by Jung (1921). Abbreviations like Ti and Te are practitioner shorthand — not found in Jung's original text.
Extraverted Thinking
Organises the external world using objective logic and systems.
Introverted Thinking
Builds precise internal frameworks; values logical consistency.
Extraverted Feeling
Attunes to group emotional tone; maintains social harmony.
Introverted Feeling
Evaluates based on deeply held personal values and authenticity.
Extraverted Sensing
Engages fully with immediate sensory experience and action.
Introverted Sensing
Compares present experience to stored impressions and traditions.
Extraverted Intuition
Perceives multiple possibilities and connections in the outer world.
Introverted Intuition
Synthesises unconscious patterns into singular future-focused insights.
No published study has confirmed these eight as empirically distinct, measurable constructs.
Jung's Original Theory of Psychological Types (1921)
Carl Jung's Psychologische Typen (1921) is a work of clinical phenomenology, not experimental psychology. Its authority rests on two decades of clinical observation — not population-level survey data. Before any test item is written, the developer must confront the gap between what Jung actually wrote and what the online MBTI community presents as Jungian theory.
Jung described four functions — Thinking (T), Feeling (F), Sensation (S), and Intuition (N) — each operable in an extraverted or introverted attitude, yielding eight functional modes. The abbreviations Te, Ti, Fe, Fi are practitioner shorthand not in Jung's text. They imply more discrete, monolithic constructs than Jung described.
Rational vs. Irrational functions: Thinking and Feeling are rational (involving judgment and evaluation). Sensation and Intuition are irrational — immediate, pre-judgmental modes of perception. Items for rational functions should probe the criteria by which judgments are made; items for irrational functions should probe the mode in which information is received.
Is type a trait, a stage, or a state? Jung's writing is consistent with all three. The most defensible reading: the dominant function orientation is biologically predisposed but requires differentiation through experience, with the overall type picture shifting substantially across the lifespan.
Evidence: Weak / ContestedThe specific claim that Jung's 8 functions are empirically distinct, measurable constructs is not yet established by independent psychometric research.
The Myers-Briggs Extension: Innovations and Known Biases
Isabel Briggs Myers translated Jung's clinical phenomenology into population-level self-report items from approximately 1943 onward. This involved genuine innovation — but also introduced specific, documented biases that persist in nearly all derivative instruments.
The J/P dimension does not appear in Jung. Myers derived it from function stack logic: if a respondent's extraverted function is a judging function (T or F), they present as more organised (J); if a perceiving function (S or N), more flexible (P). Its validity depends entirely on the validity of the function stack model.
Documented Intuitive Bias
The Grant/Brownsword Function Stack: Origins, Evidence, and Critique
The function stack model — dominant, auxiliary, tertiary, inferior in an alternating E/I pattern — is the structural backbone of contemporary cognitive function typology. It is also the most empirically challenged element of the framework.
Reynierse (2009) published a direct challenge in the field's own journal: the IEIE alternating pattern was not confirmed when actual MBTI response data were examined. People's expressed attitudes on individual function dimensions did not conform to the predicted alternating rule.
The divide between academic and practitioner communities is stark. In mainstream personality psychology, the function stack is effectively ignored. In online communities, it is treated as established fact, with entire interpretive systems (Beebe's archetype model, loop/grip theory) built upon it.
Figure 1.2
Function Stack: INTJ and ENFP (IEIE Alternating Pattern)
The stack is derived from Myers' J/P framework, not directly from Jung. It remains empirically unconfirmed. Note how attitude alternates I→E→I→E (and E→I→E→I) at every position.
INTJ
ENFP
Caution: stack ordering is theoretically derived, not independently empirically validated (Reynierse, 2009).
Beebe, Quenk, Berens, and von Franz: Extended Frameworks
John Beebe (2016)
Extends the function stack to eight positions using archetypal characters. Shadow functions are, by definition, inaccessible to conscious self-report — providing theoretical grounding for why any self-report instrument has a structural accuracy ceiling.
Naomi Quenk (1993/2002)
Documents 'grip states' — stress-induced emergence of the inferior function in a primitive, poorly controlled form. A respondent currently in the grip will produce function scores reflecting a temporary distressed state, not their baseline type.
Linda Berens (2000)
Adds temperament theory and interaction styles alongside cognitive functions. Her surface-level constructs (interaction styles) may be more psychometrically robust than deep-level functions — closer to observable behaviour.
Marie-Louise von Franz (1971)
Documented the developmental arc of type across the lifespan. A test taken at age 25 and again at 55 may legitimately produce different results — not because the instrument is unreliable, but because type development is real.
These frameworks are clinically rich but lack controlled empirical validation. Most useful as interpretive frameworks and design constraints.
Part II
Psychometric Research and Validity
The psychometric literature on the MBTI is large but methodologically uneven. This section synthesises the most relevant findings, with explicit attention to conflicts of interest in the existing publication record and to the critical question of whether eight cognitive functions are empirically separable constructs.
MBTI Reliability and Validity: A Critical Assessment
Pittenger (2005) — the most-cited independent assessment — concluded the MBTI falls short of acceptable standards for individual-level use in professional and educational contexts. This represents the broader academic consensus. The instrument's commercial dominance reflects the gap between psychometric standards and market forces.
Figure 2.1
Test-Retest Reliability: MBTI vs. Big Five
~47% after 5 weeks (0.83⁴)
~83% per scale
~75–80% over 20 years
The Conflict-of-Interest Problem
Factor-Analytic Evidence for the Eight Functions
The most honest answer to the most foundational validity question: there is no published, peer-reviewed, independent factor-analytic study that has confirmed the eight cognitive functions as distinct, replicable constructs using independently developed items.
Studies produce mixed results: Ni and Ne tend to show more consistent empirical separation than the sensation functions; Si and Se are particularly difficult to distinguish. No study has cleanly recovered eight factors from independently developed cognitive function items.
A Genuine Scientific Opportunity
The Bimodal Distribution Problem
Personality traits are distributed normally in the population. Type systems require bimodal distributions — most people clearly at one pole or the other. These facts are incompatible. Forcing a normal distribution into a bimodal category: discards information, is maximally imprecise for the most common respondents, and makes test-retest instability structurally inevitable. The 35–65% full-type retest inconsistency rates reported across MBTI studies are the direct, predictable result — not a measurement quality problem per se.
Figure 2.2
Reality vs. Assumption: How Traits Actually Distribute
Personality traits distribute normally. Type systems assume bimodal clustering. This mismatch makes instability structurally inevitable for ~35–40% of respondents who score near the centre.
Reality: Normal Distribution
Assumption: Bimodal
MBTI and the Big Five: Empirical Overlap
McCrae and Costa's landmark (1989) reanalysis showed MBTI dimensions are substantially redundant with four of the five Big Five factors. Critically, no MBTI dimension captures Neuroticism — the dimension most strongly predictive of depression, anxiety disorders, and psychological suffering. An anxious INFP and a psychologically stable INFP receive identical type descriptions, despite having quite different functional patterns.
Figure 2.3
MBTI–Big Five Correlation Matrix
Four MBTI scales map onto four Big Five factors. Neuroticism — the strongest predictor of mental health outcomes — is entirely absent.
Demographic Biases: Gender, Age, Culture, and Sample Skew
South Korea is now arguably the world's most MBTI-engaged society — the four-letter code features in job applications and dating profiles. This creates a paradox: the largest and most culturally engaged user populations are those whose results are most likely contaminated by prior theory exposure.
The T/F dimension shows the most robust gender difference of any MBTI scale — partially attributable to item content bias activating gender-socialised self-concepts. Differential Item Functioning (DIF) analyses consistently show some T/F items function differently across genders.
Figure 2.4
Type Frequency: General Population vs. Online Communities
INTP appears at ~19% in online typology communities vs. ~3% in general population — a 6× overrepresentation. Recruiting from forums produces unrepresentative, theory-contaminated normative samples.
INTP
INTJ
INFJ
ENFP
ISFJ
ESTJ
Part III
Respondent Factors That Distort Results
The accuracy of any self-report instrument is bounded not only by its psychometric quality but by the accuracy of the self-knowledge respondents bring to it. Four major distortion sources are particularly relevant to cognitive function testing.
The Self-Knowledge Problem
Self-reports are most accurate for highly internal, low-observability traits. Cognitive function preferences are theoretically well-suited to self-report — but the self-insight problem is most acute precisely for internal states requiring metacognitive observation: watching how you think, not just what you think.
The correlation between self-report personality scores and informant ratings from people who know the respondent well is approximately r = .40–.50. Around 75–84% of variance in self-description is not captured by how well-acquainted others would describe the same person.
Evidence: ModerateTheory Contamination: The Literate-Respondent Problem
A respondent who has spent time in online typology communities will approach any cognitive function test with a theory-shaped lens. Seeing "I prefer to develop a coherent internal logical framework before sharing conclusions," they can identify this as Ti-diagnostic and respond based on their self-concept — not their actual experience. In South Korea, where MBTI is a social identity, this contamination may affect the majority of respondents.
Contamination-Resistant Design Strategies
- • Behavioural specificity: "When I disagreed with a group decision in the last year, I typically..." requires autobiographical retrieval that is harder to bias.
- • Non-obvious forced choice: Neither option should obviously map to a specific function.
- • Situational vignettes: Brief scenarios bypass self-concept consultation.
- • Open-ended writing prompts: Most resistant — respondents don't know which linguistic features will be analysed.
Neurodivergent Conditions and Systematic Distortions
Neurodivergent distortions operate in predictable directions, producing predictable mistyping patterns rather than random noise.
Autistic individuals who mask may consistently endorse Fe-consistent items (social harmony orientation) while their actual processing is more Fi or Ti — because responses reflect the persona performed, not the processing experienced. Most pronounced in autistic women, who mask more extensively (Hull et al., 2017, 2020).
Introduces within-person response inconsistency that inflates apparent test-retest unreliability. Self-monitoring items may systematically underperform as self-monitoring is itself an executive function characteristically impaired in ADHD.
Produces reliable shifts toward Introversion, Intuition, Feeling, and Perceiving. A depressed ESTJ may produce a profile consistent with INFP — not because they are INFP but because depression has altered which cognitive modes are accessible.
Mimics Introversion (social avoidance), produces future-scanning cognition scoring as Intuitive, and creates difficulty with closure appearing as Perceiving.
Mental Health State as a Systematic Confounder
Bleidorn et al.'s (2019) meta-analysis found major life events produce personality score changes of .3–.5 standard deviations — large enough to shift type classification for near-midpoint respondents. Bereavement, job loss, and relationship dissolution produce reliable, directional personality score changes.
Design Response: Pre-Test State Screener
Part IV
Question Design Methodology
Item design is the most direct translation of theoretical commitments into empirical measurement. The difference between an instrument that confirms assumptions and one that genuinely tests them lies almost entirely in the quality of individual items.
Psychometric Item Construction: Core Methodology
Format choice matters. Likert scales produce interval-level normative data but are susceptible to acquiescence bias and extreme response style (particularly common in East Asian samples). Forced-choice formats reduce these biases and reflect relative preference — more consistent with type theory's comparative logic — but generate ipsative data.
The Ipsative Data Problem
Preference vs. performance: "I analyse problems systematically and efficiently" conflates Thinking preference with self-perceived competence. Reframe around what is natural and effortless: "When approaching a complex problem, what feels most comfortable and spontaneous to me is..." captures preference; competence claims capture self-concept.
Figure 4.1
Item Format Comparison
Forced choice is preferred for cognitive preference measurement despite lower statistical flexibility.
Figure 4.2
Perceived Intellectual Prestige: N vs. S Item Vocabulary
N-associated vocabulary consistently carries higher perceived prestige in educated Western populations. Sensing items must be rewritten using high-status vocabulary (precision, mastery, craftsmanship, empirical rigour) to achieve balance.
Intuition vocabulary
Sensing vocabulary
Prestige ratings are illustrative approximations based on item content analyses (Reynierse, 2012). Scale: 0–100.
Validity and Faking Detection Scales
The MBTI contains no validity scales — no mechanism for detecting careless responding, strategic self-presentation, or random response patterns. The MMPI-2's validity scale architecture provides the reference model.
Validity Scale Components to Include
- • Consistency check pairs: Same content, different wording, separated by 20+ items
- • Infrequency items: Statements virtually no genuine respondent would select
- • Response time monitoring: Unusually fast or slow completion flags strategic or inattentive responding
- • Likert SD threshold: All responses near midpoint flags disengaged responding
NLP and Text-Based Personality Prediction
A growing body of research uses NLP to predict MBTI type from text, achieving 67–82% per-dimension accuracy from social media posts (Gjurkovic & Snajder, 2018; Plank & Hovy, 2015). Writing style — independently of content — carries personality-predictive signal: vocabulary diversity, sentence length, hedging language, abstractness, first-person pronoun frequency.
Open-ended prompts offer a critical advantage: the respondent does not know which features of their writing will be analysed. Strategic bias is substantially harder to maintain across multiple natural language prompts than across self-report items.
Evidence: ModeratePart V
Scoring and Algorithm Design
The scoring algorithm is the bridge between item responses and the type profile the respondent receives. Choices here determine what information is preserved, what is discarded, and how uncertainty is communicated.
Communicating Type Probabilistically Rather Than Categorically
The research recommendation is unambiguous: continuous scores with uncertainty indicators are superior to categorical assignments. A single four-letter type result implies a precision the instrument cannot support and creates the foundation for over-identification problems.
Practical Approaches to Probabilistic Communication
- • Probability distributions over types — ranked list with percentages
- • Confidence-banded dimension scores — each axis with a 95% CI; where the CI crosses the midpoint, display as undifferentiated
- • Spectrum displays — eight function scores as a visual profile without forcing type labels
- • Conditional type assignment — assign a label only when scores exceed a meaningful threshold; present multiple candidate types for near-midpoint respondents
User experience research shows respondents want a type label for simplicity and shareability. The design challenge is providing this without misrepresenting the instrument's certainty.
Norming, Reference Populations, and Cutpoint Setting
Online-test-taking norms — the most convenient option for a new instrument — will systematically misrepresent the general population due to the Intuitive/Introvert skew documented in Part II. Recruiting a quota-balanced normative sample is essential: setting explicit upper limits on the proportion of Intuitive, Introverted, and theory-literate respondents, and actively recruiting Sensing types, Extraverts, and typology-naive respondents through general population panels (Prolific.co or equivalent).
Figure 5.1
Recommended Test Development Pipeline
From theoretical framework through normative database establishment. Each phase has specified minimum samples and deliverables.
Resolve 7 foundational design decisions before writing any item
Write 80–120 candidate items across 8 function scales + validity items
Theoretical accuracy, S/N prestige audit, DIF pre-check
Think-aloud protocols to identify ambiguous or alienating items
Exploratory Factor Analysis — determine empirically supported factor count
Remove/revise items below r = .30 item-total; check endorsement rates
Confirmatory Factor Analysis — test 8-factor model vs. alternatives
Test-retest reliability (6-week), convergent validity, DIF by group
Quota-balanced reference population; establish cutpoints
Part VI
Gaps, Failure Modes, and Opportunities
Understanding the specific weaknesses of existing instruments and the known failure modes of previous attempts is a prerequisite for building something genuinely better.
Weaknesses of Major Existing Free Tests
Figure 6.1
Evidence Quality Assessment for Key Claims
Red indicators mark areas where theoretical commitments must substitute for empirical grounding before a single item can be written.
| Test | Principal Weakness |
|---|---|
| 16Personalities | Measures Big Five but labels results in MBTI type language. Not a cognitive function test. No published independent validation. |
| Sakinorva | No EFA/CFA validation; theory-transparent items; typology-community sample; multiple algorithms reveal inconsistency but confuse non-literate users. |
| Michael Caloz CFT | Highly theory-transparent items; no published psychometric validation; very long (100+ items) with no validity scales. |
| Keys2Cognition | No published psychometric data; scoring algorithm undocumented; sample exclusively from typology communities. |
| IDRlabs cognitive tests | No methodological transparency; item derivation undisclosed; no published validation; corporate presentation without scientific accountability. |
Seven Shared Failure Modes
What the Most Rigorous Instrument Would Include
Independently validated 8-function item bank confirmed via EFA/CFA to recover 8 distinct factors — not yet demonstrated in any published study.
Pre-test state screener integrated into scoring algorithm, flagging results when respondent appears in significant distress.
Explicit neurodivergent accommodation: masking-aware item framing, extended time, results disclaimer.
DIF validation across gender, age, and cultural group — mandatory before public deployment.
Validity scales: inconsistency detection, infrequency items, response time monitoring.
Probabilistic output — probability distribution over types or confidence-banded dimension scores.
Non-theory-transparent items: behavioural vignettes, autobiographical recall, open-ended prompts.
Published validation study in a mainstream personality psychology journal with independent peer review.
Demographically diverse normative sample with quota sampling for Sensing types and Extraverts.
Pre-registered analysis plan testing the 8-function hypothesis rather than assuming it.
Part VII
Ethics, Communication, and Legal Obligations
Publishing a personality assessment instrument carries obligations that extend well beyond psychometric technical standards. This section addresses responsible communication, the psychological risks of personality labelling, and legal obligations for an Australian-based developer.
Responsible Results Page Design
The Barnum/Forer Effect. Forer's 1949 study showed that people accept vague, generally applicable personality descriptions as specifically accurate when told the description was generated for them personally. His class gave 4.26/5 accuracy to text copied verbatim from a horoscope. This effect is fully present in MBTI results.
Counteracting the Barnum Effect
Results pages should consistently use conditional language: "your responses suggest a current preference for..." rather than identity language: "you are an INTJ." Type labels can become fixed and limiting — "I can't do X, I'm an introvert" — with no empirical foundation.
Evidence: StrongEthical and Legal Obligations of Test Publication
GDPR (EU)
GDPR applies to any publicly accessible web application. Lawful basis for a free test is typically consent. State screener data may constitute sensitive personal data under GDPR, requiring explicit separate consent and additional security obligations.
Australian Privacy Act 1988
APPs require notification of what information is collected, how it will be used, and to whom disclosed. Mental health state information may be sensitive information under the Act, requiring explicit consent.
Myers-Briggs IP
"MBTI" and "Myers-Briggs" are registered trademarks. The four-letter type codes (INTJ, ENFP, etc.) are generally unprotected. Describe the instrument as a "Jungian-informed cognitive preference assessment."
High-Stakes Contexts
Terms of service should explicitly prohibit employment or clinical use until independent validation for those applications has been conducted. Adverse impact analysis is legally required in employment contexts in most jurisdictions.
Appendix A
Seven Foundational Design Decisions
These questions have no definitive empirical answer. A test developer must make an explicit theoretical commitment on each before writing items. Failure to do so results in an instrument whose design is internally inconsistent.
Function Stack Validity
Does the dominant-auxiliary-tertiary-inferior ordered hierarchy exist as an empirical structural reality, or is it a theoretically useful but empirically unconfirmed heuristic? This determines whether your scoring algorithm should derive stack positions or treat all 8 functions as independent dimensions.
Design Decision RequiredJ/P Dimension Inclusion
The J/P dimension is not in Jung — it is Myers' derivation. Should your test produce 16 types (requiring J/P) or a different output based purely on 8 function scores? Each choice has significant implications for type labelling and user expectation management.
Design Decision RequiredCategorical vs. Continuous Output
Should the primary output be a categorical type label or a continuous function profile? Psychometric evidence strongly favours continuous; user experience and cultural expectation favour categorical. A hybrid is possible but complex to design well.
Design Decision RequiredIndependent Functions vs. Bipolar Axes
Should the 8 functions be treated as 8 independent constructs or as 4 bipolar axes (Ti↔Fe, Te↔Fi, Si↔Ne, Se↔Ni)? The axis approach is more psychometrically parsimonious; independent function measurement preserves more theoretical nuance.
Design Decision RequiredNear-Midpoint Handling
How should the instrument handle respondents who score near the midpoint on multiple dimensions? A principled decision about confidence thresholds and multiple-type presentation must be made explicitly — there is no empirically determined answer.
Design Decision RequiredTheoretical Frame for Item Content
Should items follow Thomson's phenomenological framework, Myers' behavioural framework, or Socionics' information-metabolism framework? These generate fundamentally different items and potentially different empirical structures.
Design Decision RequiredPrimary Use Case
Is the instrument primarily for self-understanding by general users or for theoretical research? This shapes item transparency, results communication depth, normative sampling strategy, and validation priorities.
Design Decision RequiredPriority Reading List
Twenty Primary Sources
The following twenty sources are ranked by priority for a researcher or developer who has not yet written a single item. Each represents a foundational text that directly shapes design decisions. Sources are grouped by phase of development relevance.
Jung, C.G. (1971). Psychological Types (Collected Works, Vol. 6). Princeton University Press.
The primary source. Must be read before any commentary. Chapters 10–11 (definitions) and the type descriptions are essential. Most practitioners have never read Jung directly.
Pittenger, D.J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal, 57(3), 210–221.
The most concise authoritative academic critique. Essential for understanding exactly what psychometric weaknesses you are improving upon.
DeVellis, R.F. (2016). Scale Development: Theory and Applications (4th ed.). SAGE.
The best practical guide to building a psychometric scale from scratch. Read before writing items. Covers item writing through factor analysis and reliability assessment.
McCrae, R.R., & Costa, P.T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model. Journal of Personality, 57(1), 17–40.
Establishes the Big Five–MBTI relationship. Essential for positioning your instrument relative to validated prior literature and designing convergent validity analyses.
Myers, I.B., McCaulley, M.H., Quenk, N.L., & Hammer, A.L. (1998). MBTI Manual (3rd ed.). CPP.
The canonical source for MBTI methodology, reliability, and normative data. Read critically — it is both informative and self-serving.
Reynierse, J.H. (2009). The questionable theoretical basis of the MBTI's J-P scale and type dynamics. Journal of Psychological Type, 69(12), 105–122.
The most important empirical challenge to the function stack model. Read before deciding whether to build with or without the stack.
Quenk, N.L. (1993). Beside Ourselves: Our Hidden Personality in Everyday Life. CPP.
The best treatment of inferior function and grip states. Essential for understanding how state and trait interact in cognitive function expression.
Thomson, L. (1998). Personality Type: An Owner's Manual. Shambhala Publications.
The most sophisticated re-reading of Jungian cognitive function theory in practical terms. Will expand your understanding of what functions actually describe beyond Myers' operationalisation.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.
The foundational paper on construct validity. Everything in psychometrics about what it means for a test to measure what it claims to measure builds on this work.
Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum.
The accessible introduction to IRT. Essential even if you use CTT initially, to design items that will support adaptive testing in later versions.
Berens, L.V. (2000). Understanding Yourself and Others: Introduction to the 4 Temperaments. Telos.
Berens' multilevel model helps identify which constructs are most psychometrically robust and which level of abstraction items should target.
Beebe, J. (2016). Energies and Patterns in Psychological Type. Routledge.
Beebe's archetypal model explains why shadow functions are inaccessible to self-report — directly implying a ceiling on instrument accuracy.
Hull, L. et al. (2020). Developing the Camouflaging Autistic Traits Questionnaire (CAT-Q). Journal of Autism and Developmental Disorders, 49(3), 819–833.
Foundational for understanding autistic masking and its implications for personality self-report accuracy. Essential for neurodivergent accommodation design.
Stein, R., & Swan, A.B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory. Personality and Individual Differences, 33(4), 490–507.
Recent independent review providing the most current picture of the empirical literature on MBTI validity.
Gjurkovic, M., & Snajder, J. (2018). Reddit and MBTI: Mining personality type-related discourse. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.
Most relevant empirical study on NLP-based MBTI prediction from naturalistic text. Essential context for deciding whether to include open-ended writing prompts.
Vazire, S., & Carlson, E.N. (2010). Self-knowledge of personality: Do people know themselves? Social and Personality Psychology Compass, 4(8), 605–620.
The definitive review of self-knowledge accuracy. Establishes the theoretical ceiling on self-report validity and identifies which constructs are most accurately self-reported.
Von Franz, M-L., & Hillman, J. (1971). Lectures on Jung's Typology. Spring Publications.
Von Franz's lectures on the inferior function are the most clinically detailed treatment of type development. Essential for lifespan interpretation and results communication design.
Comrey, A.L., & Lee, H.B. (1992). A First Course in Factor Analysis (2nd ed.). Lawrence Erlbaum.
Practical guide to EFA and CFA. You will run factor analyses on pilot data — understanding the methodology correctly is essential.
Boyle, G.J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74.
Particularly relevant given Australian context. A concise, hard-hitting psychometric critique by an independent researcher.
American Psychological Association (2014). Standards for Educational and Psychological Testing. APA.
The industry standard for published psychological tests. Read before building to understand the target, not after, so the development process meets these standards from the start.
References
Full Bibliography
All citations used across this synthesis, formatted for academic reference.
American Psychological Association. (2014). Standards for educational and psychological testing. APA.
Barrick, M.R., & Mount, M.K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Beebe, J. (2016). Energies and patterns in psychological type. Routledge.
Berens, L.V. (2000). Understanding yourself and others: An introduction to the 4 temperaments. Telos Publications.
Bleidorn, W., Hopwood, C.J., Back, M.D., Denissen, J.J.A., Hennecke, M., Orth, U., ... & Roberts, B.W. (2020). Personality trait stability and change. Personality Science, 1, 1–41.
Bouchard, T.J., & Loehlin, J.C. (2001). Genes, evolution, and personality. Behavior Genetics, 31(3), 243–273.
Boyle, G.J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74.
Carskadon, T.G. (1982). Clinical and counseling aspects of the Myers-Briggs Type Indicator. Research in Psychological Type, 4(1), 2–31.
Comrey, A.L., & Lee, H.B. (1992). A first course in factor analysis (2nd ed.). Lawrence Erlbaum.
Costa, P.T., & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources.
DeVellis, R.F. (2016). Scale development: Theory and applications (4th ed.). SAGE.
Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Lawrence Erlbaum.
European Data Protection Board. (2020). Guidelines 05/2020 on consent under Regulation 2016/679. EDPB.
Forer, B.R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility. Journal of Abnormal and Social Psychology, 44(1), 118–123.
Furnham, A. (1996). The big five versus the big four: The relationship between the Myers-Briggs Type Indicators (MBTI) and NEO-PI five-factor model of personality. Personality and Individual Differences, 21(2), 303–307.
Gjurkovic, M., & Snajder, J. (2018). Reddit and MBTI: Mining personality type-related discourse. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 18–25.
Hull, L., Levy, L., Lai, M.-C., Petrides, K.V., Baron-Cohen, S., Allison, C., ... & Mandy, W. (2020). Is social camouflaging associated with anxiety and depression in autistic adults? Molecular Autism, 12(1), 1–15.
Hull, L., Mandy, W., & Lai, M.-C. (2017). Behavioural and cognitive sex/gender differences in autism spectrum condition and typically developing males and females. Autism, 21(6), 706–727.
Hull, L., Mandy, W., Lai, M.-C., Baron-Cohen, S., Allison, C., Smith, P., & Petrides, K.V. (2019). Development and validation of the Camouflaging Autistic Traits Questionnaire (CAT-Q). Journal of Autism and Developmental Disorders, 49(3), 819–833.
Hull, L., Petrides, K.V., Allison, C., Smith, P., Baron-Cohen, S., Lai, M.-C., & Mandy, W. (2017). "Putting on My Best Normal": Social camouflaging in adults with autism spectrum conditions. Journal of Autism and Developmental Disorders, 47(8), 2519–2534.
Jang, K.L., Livesley, W.J., & Vernon, P.A. (1996). Heritability of the Big Five personality dimensions and their facets: A twin study. Journal of Personality, 64(3), 577–591.
Jung, C.G. (1971). Psychological types (R.F.C. Hull, Trans.; Collected Works, Vol. 6). Princeton University Press. (Original work published 1921)
Loehlin, J.C. (1992). Genes and environment in personality development. SAGE.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.
McCrae, R.R., & Costa, P.T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40.
Myers, I.B., & Myers, P.B. (1980). Gifts differing: Understanding personality type. CPP.
Myers, I.B., McCaulley, M.H., Quenk, N.L., & Hammer, A.L. (1998). MBTI manual: A guide to the development and use of the Myers-Briggs Type Indicator (3rd ed.). CPP.
Nardi, D. (2011). Neuroscience of personality: Brain savvy insights for all types of people. Radiance House.
Office of the Australian Information Commissioner. (2019). Australian Privacy Principles guidelines. OAIC.
Pittenger, D.J. (1993). Measuring the MBTI...and coming up short. Journal of Career Planning and Employment, 54(1), 48–52.
Pittenger, D.J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57(3), 210–221.
Plank, B., & Hovy, D. (2015). Personality traits on Twitter—or—how to get 1,500 personality tests in a week. Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 1–5.
Quenk, N.L. (1993). Beside ourselves: Our hidden personality in everyday life. CPP.
Quenk, N.L. (2002). Was that really me? How everyday stress brings out our hidden personality. Davies-Black.
Reynierse, J.H. (2009). The questionable theoretical basis of the MBTI's J-P scale and type dynamics. Journal of Psychological Type, 69(12), 105–122.
Reynierse, J.H. (2012). Toward a science of type: Moving beyond preference measurement. Journal of Psychological Type, 72(5), 1–36.
Soto, C.J., & John, O.P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143.
Stein, R., & Swan, A.B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching review of the literature. European Journal of Personality, 33(4), 490–507.
Thomson, L. (1998). Personality type: An owner's manual. Shambhala Publications.
Vazire, S., & Carlson, E.N. (2010). Self-knowledge of personality: Do people know themselves? Social and Personality Psychology Compass, 4(8), 605–620.
Von Franz, M.-L., & Hillman, J. (1971). Lectures on Jung's typology. Spring Publications.
Submitted for expert commentary. All citations should be verified against primary texts.
© 2025 · Hana Pham · Research Synthesis