Research Synthesis · 2025

Measuring the Mind

A Psychometric and Theoretical Foundation for Cognitive Function Assessment

Domains

of research

Sections

research questions

Sources

priority list

Abstract

This synthesis addresses 42 research questions across seven domains — from Jungian theoretical origins to factor analysis, item-writing bias, scoring algorithms, and ethical obligations of test publication.

Key finding: full four-letter MBTI type stability falls to approximately 39–50% within five weeks. This is the direct, predictable result of forcing normally distributed traits into bimodal categories — not primarily a reliability problem.

The widely used Grant/Brownsword function stack lacks independent factorial validation and must be treated as an unconfirmed theoretical heuristic. Most critically: no published, peer-reviewed factor-analytic study has confirmed that the eight cognitive functions recover eight distinct, independently replicable factors.

Keywords: cognitive functions · Jungian typology · psychometric validity · personality assessment · MBTI · factor analysis

Preface and Scope

This document is a research synthesis — an honest accounting of what the empirical and theoretical literature does and does not support before a single assessment item is written. It is not a manual, a results guide, or a practitioner handbook.

The global personality testing market is dominated by instruments — most visibly the MBTI and its derivative 16Personalities — that are psychometrically weaker than their ubiquity would suggest. The vast ecosystem of free online cognitive function tests has reproduced and amplified the MBTI's weaknesses without adding scientific accountability.

Evidence Quality System

Throughout this document, claims carry one of four evidence ratings: Strong (multiple independent replications) · Moderate (replicated with caveats) · Weak/Contested (limited or disputed evidence) · Design Decision Required (empirically unresolved — requires a principled theoretical choice).

Part I

Theoretical Foundations

Jung's original framework (1921) is richer, more ambiguous, and more developmental than the mechanistic stack-and-type models that have proliferated in popular culture. Each section below traces the intellectual genealogy of the theory and evaluates the empirical status of its key claims.

Figure 1.1

The Eight Cognitive Functions

As theorised by Jung (1921). Abbreviations like Ti and Te are practitioner shorthand — not found in Jung's original text.

TeRational

Extraverted Thinking

Organises the external world using objective logic and systems.

TiRational

Introverted Thinking

Builds precise internal frameworks; values logical consistency.

FeRational

Extraverted Feeling

Attunes to group emotional tone; maintains social harmony.

FiRational

Introverted Feeling

Evaluates based on deeply held personal values and authenticity.

SeIrrational

Extraverted Sensing

Engages fully with immediate sensory experience and action.

SiIrrational

Introverted Sensing

Compares present experience to stored impressions and traditions.

NeIrrational

Extraverted Intuition

Perceives multiple possibilities and connections in the outer world.

NiIrrational

Introverted Intuition

Synthesises unconscious patterns into singular future-focused insights.

No published study has confirmed these eight as empirically distinct, measurable constructs.

Jung's Original Theory of Psychological Types (1921)

Carl Jung's Psychologische Typen (1921) is a work of clinical phenomenology, not experimental psychology. Its authority rests on two decades of clinical observation — not population-level survey data. Before any test item is written, the developer must confront the gap between what Jung actually wrote and what the online MBTI community presents as Jungian theory.

Jung described four functions — Thinking (T), Feeling (F), Sensation (S), and Intuition (N) — each operable in an extraverted or introverted attitude, yielding eight functional modes. The abbreviations Te, Ti, Fe, Fi are practitioner shorthand not in Jung's text. They imply more discrete, monolithic constructs than Jung described.

Rational vs. Irrational functions: Thinking and Feeling are rational (involving judgment and evaluation). Sensation and Intuition are irrational — immediate, pre-judgmental modes of perception. Items for rational functions should probe the criteria by which judgments are made; items for irrational functions should probe the mode in which information is received.

Is type a trait, a stage, or a state? Jung's writing is consistent with all three. The most defensible reading: the dominant function orientation is biologically predisposed but requires differentiation through experience, with the overall type picture shifting substantially across the lifespan.

Evidence: Weak / Contested

The specific claim that Jung's 8 functions are empirically distinct, measurable constructs is not yet established by independent psychometric research.

The Myers-Briggs Extension: Innovations and Known Biases

Isabel Briggs Myers translated Jung's clinical phenomenology into population-level self-report items from approximately 1943 onward. This involved genuine innovation — but also introduced specific, documented biases that persist in nearly all derivative instruments.

The J/P dimension does not appear in Jung. Myers derived it from function stack logic: if a respondent's extraverted function is a judging function (T or F), they present as more organised (J); if a perceiving function (S or N), more flexible (P). Its validity depends entirely on the validity of the function stack model.

Documented Intuitive Bias

Multiple independent researchers confirmed a systematic bias: Intuition is described using vocabulary associated with intellect, creativity, and abstract thought — qualities with high social desirability in educated Western populations. Item content analyses (Reynierse, 2012) confirm that N items carry higher perceived intellectual prestige than S items. Myers herself reported as INFP.

Evidence: Strong

The Grant/Brownsword Function Stack: Origins, Evidence, and Critique

The function stack model — dominant, auxiliary, tertiary, inferior in an alternating E/I pattern — is the structural backbone of contemporary cognitive function typology. It is also the most empirically challenged element of the framework.

Reynierse (2009) published a direct challenge in the field's own journal: the IEIE alternating pattern was not confirmed when actual MBTI response data were examined. People's expressed attitudes on individual function dimensions did not conform to the predicted alternating rule.

The divide between academic and practitioner communities is stark. In mainstream personality psychology, the function stack is effectively ignored. In online communities, it is treated as established fact, with entire interpretive systems (Beebe's archetype model, loop/grip theory) built upon it.

Figure 1.2

Function Stack: INTJ and ENFP (IEIE Alternating Pattern)

The stack is derived from Myers' J/P framework, not directly from Jung. It remains empirically unconfirmed. Note how attitude alternates I→E→I→E (and E→I→E→I) at every position.

INTJ

Dominant

NiI

Auxiliary

TeE

Tertiary

FiI

Inferior

SeE

ENFP

Dominant

NeE

Auxiliary

FiI

Tertiary

TeE

Inferior

SiI

Caution: stack ordering is theoretically derived, not independently empirically validated (Reynierse, 2009).

Evidence: Weak / Contested

Beebe, Quenk, Berens, and von Franz: Extended Frameworks

John Beebe (2016)

Extends the function stack to eight positions using archetypal characters. Shadow functions are, by definition, inaccessible to conscious self-report — providing theoretical grounding for why any self-report instrument has a structural accuracy ceiling.

Naomi Quenk (1993/2002)

Documents 'grip states' — stress-induced emergence of the inferior function in a primitive, poorly controlled form. A respondent currently in the grip will produce function scores reflecting a temporary distressed state, not their baseline type.

Linda Berens (2000)

Adds temperament theory and interaction styles alongside cognitive functions. Her surface-level constructs (interaction styles) may be more psychometrically robust than deep-level functions — closer to observable behaviour.

Marie-Louise von Franz (1971)

Documented the developmental arc of type across the lifespan. A test taken at age 25 and again at 55 may legitimately produce different results — not because the instrument is unreliable, but because type development is real.

Evidence: Weak / Contested

These frameworks are clinically rich but lack controlled empirical validation. Most useful as interpretive frameworks and design constraints.

Part II

Psychometric Research and Validity

The psychometric literature on the MBTI is large but methodologically uneven. This section synthesises the most relevant findings, with explicit attention to conflicts of interest in the existing publication record and to the critical question of whether eight cognitive functions are empirically separable constructs.

10.

MBTI Reliability and Validity: A Critical Assessment

Pittenger (2005) — the most-cited independent assessment — concluded the MBTI falls short of acceptable standards for individual-level use in professional and educational contexts. This represents the broader academic consensus. The instrument's commercial dominance reflects the gap between psychometric standards and market forces.

Figure 2.1

Test-Retest Reliability: MBTI vs. Big Five

MBTI full 4-letter type47%

~47% after 5 weeks (0.83⁴)

MBTI individual scale83%

~83% per scale

Big Five traits (20 yr)78%

~75–80% over 20 years

The Conflict-of-Interest Problem

The majority of MBTI validity research has been published in the Journal of Psychological Type, affiliated with the Center for Applications of Psychological Type — an organisation that trains MBTI practitioners. Independent researchers (McCrae & Costa, 1989; Boyle, 1995; Pittenger, 2005) consistently reach more critical conclusions. Any new instrument should publish validation data in mainstream personality journals (Psychological Assessment, Journal of Research in Personality) where independent peer review operates without this conflict.

Evidence: Strong

11.

Factor-Analytic Evidence for the Eight Functions

The most honest answer to the most foundational validity question: there is no published, peer-reviewed, independent factor-analytic study that has confirmed the eight cognitive functions as distinct, replicable constructs using independently developed items.

Studies produce mixed results: Ni and Ne tend to show more consistent empirical separation than the sensation functions; Si and Se are particularly difficult to distinguish. No study has cleanly recovered eight factors from independently developed cognitive function items.

A Genuine Scientific Opportunity

The 8-function model is a theoretical hypothesis awaiting empirical confirmation. A new instrument designed to test — rather than assume — this structure would be the first of its kind. Pre-registering this analysis before data collection would maximise scientific credibility.

Evidence: Weak / Contested

12.

The Bimodal Distribution Problem

Personality traits are distributed normally in the population. Type systems require bimodal distributions — most people clearly at one pole or the other. These facts are incompatible. Forcing a normal distribution into a bimodal category: discards information, is maximally imprecise for the most common respondents, and makes test-retest instability structurally inevitable. The 35–65% full-type retest inconsistency rates reported across MBTI studies are the direct, predictable result — not a measurement quality problem per se.

Figure 2.2

Reality vs. Assumption: How Traits Actually Distribute

Personality traits distribute normally. Type systems assume bimodal clustering. This mismatch makes instability structurally inevitable for ~35–40% of respondents who score near the centre.

Reality: Normal Distribution

Assumption: Bimodal

Evidence: Strong

13.

MBTI and the Big Five: Empirical Overlap

McCrae and Costa's landmark (1989) reanalysis showed MBTI dimensions are substantially redundant with four of the five Big Five factors. Critically, no MBTI dimension captures Neuroticism — the dimension most strongly predictive of depression, anxiety disorders, and psychological suffering. An anxious INFP and a psychologically stable INFP receive identical type descriptions, despite having quite different functional patterns.

Figure 2.3

MBTI–Big Five Correlation Matrix

Four MBTI scales map onto four Big Five factors. Neuroticism — the strongest predictor of mental health outcomes — is entirely absent.

Extraversion

Openness

Agreeableness

Conscientiousness

Neuroticism

E / I

+0.72

—

S / N

+0.66

—

T / F

-0.45

—

J / P

+0.45

—

Strong positive Strong negative Weak / no correlation

Evidence: Strong

15.

Demographic Biases: Gender, Age, Culture, and Sample Skew

South Korea is now arguably the world's most MBTI-engaged society — the four-letter code features in job applications and dating profiles. This creates a paradox: the largest and most culturally engaged user populations are those whose results are most likely contaminated by prior theory exposure.

The T/F dimension shows the most robust gender difference of any MBTI scale — partially attributable to item content bias activating gender-socialised self-concepts. Differential Item Functioning (DIF) analyses consistently show some T/F items function differently across genders.

Figure 2.4

Type Frequency: General Population vs. Online Communities

INTP appears at ~19% in online typology communities vs. ~3% in general population — a 6× overrepresentation. Recruiting from forums produces unrepresentative, theory-contaminated normative samples.

General population Online typology communities

INTP

19%

INTJ

14%

INFJ

1.5%

12%

ENFP

10%

ISFJ

14%

2.5%

ESTJ

1.5%

Evidence: Strong

Part III

Respondent Factors That Distort Results

The accuracy of any self-report instrument is bounded not only by its psychometric quality but by the accuracy of the self-knowledge respondents bring to it. Four major distortion sources are particularly relevant to cognitive function testing.

17.

The Self-Knowledge Problem

Self-reports are most accurate for highly internal, low-observability traits. Cognitive function preferences are theoretically well-suited to self-report — but the self-insight problem is most acute precisely for internal states requiring metacognitive observation: watching how you think, not just what you think.

The correlation between self-report personality scores and informant ratings from people who know the respondent well is approximately r = .40–.50. Around 75–84% of variance in self-description is not captured by how well-acquainted others would describe the same person.

Evidence: Moderate

18.

Theory Contamination: The Literate-Respondent Problem

A respondent who has spent time in online typology communities will approach any cognitive function test with a theory-shaped lens. Seeing "I prefer to develop a coherent internal logical framework before sharing conclusions," they can identify this as Ti-diagnostic and respond based on their self-concept — not their actual experience. In South Korea, where MBTI is a social identity, this contamination may affect the majority of respondents.

Contamination-Resistant Design Strategies

• Behavioural specificity: "When I disagreed with a group decision in the last year, I typically..." requires autobiographical retrieval that is harder to bias.
• Non-obvious forced choice: Neither option should obviously map to a specific function.
• Situational vignettes: Brief scenarios bypass self-concept consultation.
• Open-ended writing prompts: Most resistant — respondents don't know which linguistic features will be analysed.

Evidence: Moderate

19.

Neurodivergent Conditions and Systematic Distortions

Neurodivergent distortions operate in predictable directions, producing predictable mistyping patterns rather than random noise.

Autism (masking)

Autistic individuals who mask may consistently endorse Fe-consistent items (social harmony orientation) while their actual processing is more Fi or Ti — because responses reflect the persona performed, not the processing experienced. Most pronounced in autistic women, who mask more extensively (Hull et al., 2017, 2020).

ADHD

Introduces within-person response inconsistency that inflates apparent test-retest unreliability. Self-monitoring items may systematically underperform as self-monitoring is itself an executive function characteristically impaired in ADHD.

Depression

Produces reliable shifts toward Introversion, Intuition, Feeling, and Perceiving. A depressed ESTJ may produce a profile consistent with INFP — not because they are INFP but because depression has altered which cognitive modes are accessible.

Anxiety

Mimics Introversion (social avoidance), produces future-scanning cognition scoring as Intuitive, and creates difficulty with closure appearing as Perceiving.

Evidence: Moderate

20.

Mental Health State as a Systematic Confounder

Bleidorn et al.'s (2019) meta-analysis found major life events produce personality score changes of .3–.5 standard deviations — large enough to shift type classification for near-midpoint respondents. Bereavement, job loss, and relationship dissolution produce reliable, directional personality score changes.

Design Response: Pre-Test State Screener

Administer a brief state screener (PHQ-2 + GAD-2 equivalent, 4–8 items total) before the main test. Automatically flag results if scores exceed thresholds. Recommend retesting after 4–6 weeks for flagged respondents. Do not use screener data for research purposes without explicit, separate informed consent.

Evidence: Strong

Part IV

Question Design Methodology

Item design is the most direct translation of theoretical commitments into empirical measurement. The difference between an instrument that confirms assumptions and one that genuinely tests them lies almost entirely in the quality of individual items.

21.

Psychometric Item Construction: Core Methodology

Format choice matters. Likert scales produce interval-level normative data but are susceptible to acquiescence bias and extreme response style (particularly common in East Asian samples). Forced-choice formats reduce these biases and reflect relative preference — more consistent with type theory's comparative logic — but generate ipsative data.

The Ipsative Data Problem

Forced-choice items generate ipsative data: scores are interdependent because choosing one option implies not choosing the other. Standard statistical assumptions of independence are violated; correlations with external normative variables are mathematically distorted. Recommended approach: use normative item-level scoring internally, but present results ipsatively as a relative function profile.

Preference vs. performance: "I analyse problems systematically and efficiently" conflates Thinking preference with self-perceived competence. Reframe around what is natural and effortless: "When approaching a complex problem, what feels most comfortable and spontaneous to me is..." captures preference; competence claims capture self-concept.

Figure 4.1

Item Format Comparison

Forced choice is preferred for cognitive preference measurement despite lower statistical flexibility.

Likert Scale

Forced Choice

Slider

Reduces acquiescence bias

Reduces extreme response style

Resistance to strategic gaming

Statistical flexibility

Theory alignment (relative preference)

Cross-cultural consistency

None Low Moderate High

Figure 4.2

Perceived Intellectual Prestige: N vs. S Item Vocabulary

N-associated vocabulary consistently carries higher perceived prestige in educated Western populations. Sensing items must be rewritten using high-status vocabulary (precision, mastery, craftsmanship, empirical rigour) to achieve balance.

Intuition vocabulary

Abstract85

Theoretical82

Creative88

Innovative90

Pattern recognition78

Sensing vocabulary

Practical52

Detail-oriented48

Sequential42

Reliable58

Concrete45

Prestige ratings are illustrative approximations based on item content analyses (Reynierse, 2012). Scale: 0–100.

Evidence: Strong

23.

Validity and Faking Detection Scales

The MBTI contains no validity scales — no mechanism for detecting careless responding, strategic self-presentation, or random response patterns. The MMPI-2's validity scale architecture provides the reference model.

Validity Scale Components to Include

• Consistency check pairs: Same content, different wording, separated by 20+ items
• Infrequency items: Statements virtually no genuine respondent would select
• Response time monitoring: Unusually fast or slow completion flags strategic or inattentive responding
• Likert SD threshold: All responses near midpoint flags disengaged responding

Evidence: Strong

24.

NLP and Text-Based Personality Prediction

A growing body of research uses NLP to predict MBTI type from text, achieving 67–82% per-dimension accuracy from social media posts (Gjurkovic & Snajder, 2018; Plank & Hovy, 2015). Writing style — independently of content — carries personality-predictive signal: vocabulary diversity, sentence length, hedging language, abstractness, first-person pronoun frequency.

Open-ended prompts offer a critical advantage: the respondent does not know which features of their writing will be analysed. Strategic bias is substantially harder to maintain across multiple natural language prompts than across self-report items.

Evidence: Moderate

Part V

Scoring and Algorithm Design

The scoring algorithm is the bridge between item responses and the type profile the respondent receives. Choices here determine what information is preserved, what is discarded, and how uncertainty is communicated.

30.

Communicating Type Probabilistically Rather Than Categorically

The research recommendation is unambiguous: continuous scores with uncertainty indicators are superior to categorical assignments. A single four-letter type result implies a precision the instrument cannot support and creates the foundation for over-identification problems.

Practical Approaches to Probabilistic Communication

• Probability distributions over types — ranked list with percentages
• Confidence-banded dimension scores — each axis with a 95% CI; where the CI crosses the midpoint, display as undifferentiated
• Spectrum displays — eight function scores as a visual profile without forcing type labels
• Conditional type assignment — assign a label only when scores exceed a meaningful threshold; present multiple candidate types for near-midpoint respondents

User experience research shows respondents want a type label for simplicity and shareability. The design challenge is providing this without misrepresenting the instrument's certainty.

31.

Norming, Reference Populations, and Cutpoint Setting

Online-test-taking norms — the most convenient option for a new instrument — will systematically misrepresent the general population due to the Intuitive/Introvert skew documented in Part II. Recruiting a quota-balanced normative sample is essential: setting explicit upper limits on the proportion of Intuitive, Introverted, and theory-literate respondents, and actively recruiting Sensing types, Extraverts, and typology-naive respondents through general population panels (Prolific.co or equivalent).

Figure 5.1

Recommended Test Development Pipeline

From theoretical framework through normative database establishment. Each phase has specified minimum samples and deliverables.

Theoretical Framework

Resolve 7 foundational design decisions before writing any item

Item Pool Generation

Write 80–120 candidate items across 8 function scales + validity items

Expert Review5–10 experts

Theoretical accuracy, S/N prestige audit, DIF pre-check

Cognitive InterviewingN = 20–30

Think-aloud protocols to identify ambiguous or alienating items

Pilot Study 1 (EFA)N ≥ 300

Exploratory Factor Analysis — determine empirically supported factor count

Item Refinement

Remove/revise items below r = .30 item-total; check endorsement rates

Pilot Study 2 (CFA)N ≥ 500

Confirmatory Factor Analysis — test 8-factor model vs. alternatives

Validation StudyN ≥ 1,000

Test-retest reliability (6-week), convergent validity, DIF by group

Normative DatabaseN ≥ 1,000

Quota-balanced reference population; establish cutpoints

Evidence: Strong

Part VI

Gaps, Failure Modes, and Opportunities

Understanding the specific weaknesses of existing instruments and the known failure modes of previous attempts is a prerequisite for building something genuinely better.

35.

Weaknesses of Major Existing Free Tests

Figure 6.1

Evidence Quality Assessment for Key Claims

Red indicators mark areas where theoretical commitments must substitute for empirical grounding before a single item can be written.

MBTI test-retest instability — full type ~47%Strong

Normal distribution of personality traits in populationStrong

MBTI–Big Five correlation structureStrong

Intuitive bias in MBTI item contentStrong

Online typology community sample skew (6× INTP)Strong

State effects (depression, anxiety) on personality scoresStrong

Barnum/Forer effect in type result acceptanceStrong

Big Five heritability (~40–60%)Moderate

Type stability across the adult lifespanModerate

Self-knowledge accuracy limits (r ≈ .40–.50 with informant)Moderate

Autistic masking effects on personality self-reportModerate

Theory contamination by typology-literate respondentsModerate

NLP personality prediction from natural text (~67–82%)Moderate

Grant/Brownsword stack IEIE alternating patternWeak/Contested

8 cognitive functions as empirically distinct constructsWeak/Contested

Nardi EEG evidence for type-specific brainwave patternsWeak/Contested

Socionics function definitions — independent validationWeak/Contested

Stack dominant/auxiliary/tertiary orderingWeak/Contested

Function stack validity — theoretical commitment requiredDesign Decision

Categorical vs. continuous output — no empirical resolutionDesign Decision

J/P dimension inclusion — theoretical commitment requiredDesign Decision

StrongModerateWeak/ContestedDesign Decision

Test	Principal Weakness
16Personalities	Measures Big Five but labels results in MBTI type language. Not a cognitive function test. No published independent validation.
Sakinorva	No EFA/CFA validation; theory-transparent items; typology-community sample; multiple algorithms reveal inconsistency but confuse non-literate users.
Michael Caloz CFT	Highly theory-transparent items; no published psychometric validation; very long (100+ items) with no validity scales.
Keys2Cognition	No published psychometric data; scoring algorithm undocumented; sample exclusively from typology communities.
IDRlabs cognitive tests	No methodological transparency; item derivation undisclosed; no published validation; corporate presentation without scientific accountability.

Seven Shared Failure Modes

All existing free instruments share: (1) no published EFA/CFA validation; (2) items too theory-transparent to resist gaming; (3) systematic S/N bias toward Intuition; (4) no validity or faking detection scales; (5) forced categorical output without uncertainty quantification; (6) normative samples from typology communities; (7) no state screener or neurodivergent accommodation.

39.

What the Most Rigorous Instrument Would Include

Independently validated 8-function item bank confirmed via EFA/CFA to recover 8 distinct factors — not yet demonstrated in any published study.

Pre-test state screener integrated into scoring algorithm, flagging results when respondent appears in significant distress.

Explicit neurodivergent accommodation: masking-aware item framing, extended time, results disclaimer.

DIF validation across gender, age, and cultural group — mandatory before public deployment.

Validity scales: inconsistency detection, infrequency items, response time monitoring.

Probabilistic output — probability distribution over types or confidence-banded dimension scores.

Non-theory-transparent items: behavioural vignettes, autobiographical recall, open-ended prompts.

Published validation study in a mainstream personality psychology journal with independent peer review.

Demographically diverse normative sample with quota sampling for Sensing types and Extraverts.

10.

Pre-registered analysis plan testing the 8-function hypothesis rather than assuming it.

Part VII

Ethics, Communication, and Legal Obligations

Publishing a personality assessment instrument carries obligations that extend well beyond psychometric technical standards. This section addresses responsible communication, the psychological risks of personality labelling, and legal obligations for an Australian-based developer.

41.

Responsible Results Page Design

The Barnum/Forer Effect. Forer's 1949 study showed that people accept vague, generally applicable personality descriptions as specifically accurate when told the description was generated for them personally. His class gave 4.26/5 accuracy to text copied verbatim from a horoscope. This effect is fully present in MBTI results.

Counteracting the Barnum Effect

Results pages must actively counteract this through: (1) presenting discriminant information — what the result type is typically not; (2) including disconfirming information; (3) showing the two or three closest alternative types; (4) encouraging active self-verification: "Actively identify ways this description does not accurately describe you."

Results pages should consistently use conditional language: "your responses suggest a current preference for..." rather than identity language: "you are an INTJ." Type labels can become fixed and limiting — "I can't do X, I'm an introvert" — with no empirical foundation.

Evidence: Strong

42.

Ethical and Legal Obligations of Test Publication

GDPR (EU)

GDPR applies to any publicly accessible web application. Lawful basis for a free test is typically consent. State screener data may constitute sensitive personal data under GDPR, requiring explicit separate consent and additional security obligations.

Australian Privacy Act 1988

APPs require notification of what information is collected, how it will be used, and to whom disclosed. Mental health state information may be sensitive information under the Act, requiring explicit consent.

Myers-Briggs IP

"MBTI" and "Myers-Briggs" are registered trademarks. The four-letter type codes (INTJ, ENFP, etc.) are generally unprotected. Describe the instrument as a "Jungian-informed cognitive preference assessment."

High-Stakes Contexts

Terms of service should explicitly prohibit employment or clinical use until independent validation for those applications has been conducted. Adverse impact analysis is legally required in employment contexts in most jurisdictions.

Evidence: Strong

Appendix A

Seven Foundational Design Decisions

These questions have no definitive empirical answer. A test developer must make an explicit theoretical commitment on each before writing items. Failure to do so results in an instrument whose design is internally inconsistent.

Function Stack Validity

Does the dominant-auxiliary-tertiary-inferior ordered hierarchy exist as an empirical structural reality, or is it a theoretically useful but empirically unconfirmed heuristic? This determines whether your scoring algorithm should derive stack positions or treat all 8 functions as independent dimensions.

Design Decision Required

J/P Dimension Inclusion

The J/P dimension is not in Jung — it is Myers' derivation. Should your test produce 16 types (requiring J/P) or a different output based purely on 8 function scores? Each choice has significant implications for type labelling and user expectation management.

Design Decision Required

Categorical vs. Continuous Output

Should the primary output be a categorical type label or a continuous function profile? Psychometric evidence strongly favours continuous; user experience and cultural expectation favour categorical. A hybrid is possible but complex to design well.

Design Decision Required

Independent Functions vs. Bipolar Axes

Should the 8 functions be treated as 8 independent constructs or as 4 bipolar axes (Ti↔Fe, Te↔Fi, Si↔Ne, Se↔Ni)? The axis approach is more psychometrically parsimonious; independent function measurement preserves more theoretical nuance.

Design Decision Required

Near-Midpoint Handling

How should the instrument handle respondents who score near the midpoint on multiple dimensions? A principled decision about confidence thresholds and multiple-type presentation must be made explicitly — there is no empirically determined answer.

Design Decision Required

Theoretical Frame for Item Content

Should items follow Thomson's phenomenological framework, Myers' behavioural framework, or Socionics' information-metabolism framework? These generate fundamentally different items and potentially different empirical structures.

Design Decision Required

Primary Use Case

Is the instrument primarily for self-understanding by general users or for theoretical research? This shapes item transparency, results communication depth, normative sampling strategy, and validation priorities.

Design Decision Required

Priority Reading List

Twenty Primary Sources

The following twenty sources are ranked by priority for a researcher or developer who has not yet written a single item. Each represents a foundational text that directly shapes design decisions. Sources are grouped by phase of development relevance.

Jung, C.G. (1971). Psychological Types (Collected Works, Vol. 6). Princeton University Press.

The primary source. Must be read before any commentary. Chapters 10–11 (definitions) and the type descriptions are essential. Most practitioners have never read Jung directly.

Pittenger, D.J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal, 57(3), 210–221.

The most concise authoritative academic critique. Essential for understanding exactly what psychometric weaknesses you are improving upon.

DeVellis, R.F. (2016). Scale Development: Theory and Applications (4th ed.). SAGE.

The best practical guide to building a psychometric scale from scratch. Read before writing items. Covers item writing through factor analysis and reliability assessment.

McCrae, R.R., & Costa, P.T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model. Journal of Personality, 57(1), 17–40.

Establishes the Big Five–MBTI relationship. Essential for positioning your instrument relative to validated prior literature and designing convergent validity analyses.

Myers, I.B., McCaulley, M.H., Quenk, N.L., & Hammer, A.L. (1998). MBTI Manual (3rd ed.). CPP.

The canonical source for MBTI methodology, reliability, and normative data. Read critically — it is both informative and self-serving.

Reynierse, J.H. (2009). The questionable theoretical basis of the MBTI's J-P scale and type dynamics. Journal of Psychological Type, 69(12), 105–122.

The most important empirical challenge to the function stack model. Read before deciding whether to build with or without the stack.

Quenk, N.L. (1993). Beside Ourselves: Our Hidden Personality in Everyday Life. CPP.

The best treatment of inferior function and grip states. Essential for understanding how state and trait interact in cognitive function expression.

Thomson, L. (1998). Personality Type: An Owner's Manual. Shambhala Publications.

The most sophisticated re-reading of Jungian cognitive function theory in practical terms. Will expand your understanding of what functions actually describe beyond Myers' operationalisation.

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.

The foundational paper on construct validity. Everything in psychometrics about what it means for a test to measure what it claims to measure builds on this work.

Embretson, S.E., & Reise, S.P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum.

The accessible introduction to IRT. Essential even if you use CTT initially, to design items that will support adaptive testing in later versions.

Berens, L.V. (2000). Understanding Yourself and Others: Introduction to the 4 Temperaments. Telos.

Berens' multilevel model helps identify which constructs are most psychometrically robust and which level of abstraction items should target.

Beebe, J. (2016). Energies and Patterns in Psychological Type. Routledge.

Beebe's archetypal model explains why shadow functions are inaccessible to self-report — directly implying a ceiling on instrument accuracy.

Hull, L. et al. (2020). Developing the Camouflaging Autistic Traits Questionnaire (CAT-Q). Journal of Autism and Developmental Disorders, 49(3), 819–833.

Foundational for understanding autistic masking and its implications for personality self-report accuracy. Essential for neurodivergent accommodation design.

Stein, R., & Swan, A.B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory. Personality and Individual Differences, 33(4), 490–507.

Recent independent review providing the most current picture of the empirical literature on MBTI validity.

Gjurkovic, M., & Snajder, J. (2018). Reddit and MBTI: Mining personality type-related discourse. Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.

Most relevant empirical study on NLP-based MBTI prediction from naturalistic text. Essential context for deciding whether to include open-ended writing prompts.

Vazire, S., & Carlson, E.N. (2010). Self-knowledge of personality: Do people know themselves? Social and Personality Psychology Compass, 4(8), 605–620.

The definitive review of self-knowledge accuracy. Establishes the theoretical ceiling on self-report validity and identifies which constructs are most accurately self-reported.

Von Franz, M-L., & Hillman, J. (1971). Lectures on Jung's Typology. Spring Publications.

Von Franz's lectures on the inferior function are the most clinically detailed treatment of type development. Essential for lifespan interpretation and results communication design.

Comrey, A.L., & Lee, H.B. (1992). A First Course in Factor Analysis (2nd ed.). Lawrence Erlbaum.

Practical guide to EFA and CFA. You will run factor analyses on pilot data — understanding the methodology correctly is essential.

Boyle, G.J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74.

Particularly relevant given Australian context. A concise, hard-hitting psychometric critique by an independent researcher.

American Psychological Association (2014). Standards for Educational and Psychological Testing. APA.

The industry standard for published psychological tests. Read before building to understand the target, not after, so the development process meets these standards from the start.

References

Full Bibliography

All citations used across this synthesis, formatted for academic reference.

American Psychological Association. (2014). Standards for educational and psychological testing. APA.

Barrick, M.R., & Mount, M.K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.

Beebe, J. (2016). Energies and patterns in psychological type. Routledge.

Berens, L.V. (2000). Understanding yourself and others: An introduction to the 4 temperaments. Telos Publications.

Bleidorn, W., Hopwood, C.J., Back, M.D., Denissen, J.J.A., Hennecke, M., Orth, U., ... & Roberts, B.W. (2020). Personality trait stability and change. Personality Science, 1, 1–41.

Bouchard, T.J., & Loehlin, J.C. (2001). Genes, evolution, and personality. Behavior Genetics, 31(3), 243–273.

Boyle, G.J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74.

Carskadon, T.G. (1982). Clinical and counseling aspects of the Myers-Briggs Type Indicator. Research in Psychological Type, 4(1), 2–31.

Comrey, A.L., & Lee, H.B. (1992). A first course in factor analysis (2nd ed.). Lawrence Erlbaum.

Costa, P.T., & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources.

DeVellis, R.F. (2016). Scale development: Theory and applications (4th ed.). SAGE.

Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Lawrence Erlbaum.

European Data Protection Board. (2020). Guidelines 05/2020 on consent under Regulation 2016/679. EDPB.

Forer, B.R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility. Journal of Abnormal and Social Psychology, 44(1), 118–123.

Furnham, A. (1996). The big five versus the big four: The relationship between the Myers-Briggs Type Indicators (MBTI) and NEO-PI five-factor model of personality. Personality and Individual Differences, 21(2), 303–307.

Hull, L., Levy, L., Lai, M.-C., Petrides, K.V., Baron-Cohen, S., Allison, C., ... & Mandy, W. (2020). Is social camouflaging associated with anxiety and depression in autistic adults? Molecular Autism, 12(1), 1–15.

Hull, L., Mandy, W., & Lai, M.-C. (2017). Behavioural and cognitive sex/gender differences in autism spectrum condition and typically developing males and females. Autism, 21(6), 706–727.

Hull, L., Mandy, W., Lai, M.-C., Baron-Cohen, S., Allison, C., Smith, P., & Petrides, K.V. (2019). Development and validation of the Camouflaging Autistic Traits Questionnaire (CAT-Q). Journal of Autism and Developmental Disorders, 49(3), 819–833.

Hull, L., Petrides, K.V., Allison, C., Smith, P., Baron-Cohen, S., Lai, M.-C., & Mandy, W. (2017). "Putting on My Best Normal": Social camouflaging in adults with autism spectrum conditions. Journal of Autism and Developmental Disorders, 47(8), 2519–2534.

Jang, K.L., Livesley, W.J., & Vernon, P.A. (1996). Heritability of the Big Five personality dimensions and their facets: A twin study. Journal of Personality, 64(3), 577–591.

Jung, C.G. (1971). Psychological types (R.F.C. Hull, Trans.; Collected Works, Vol. 6). Princeton University Press. (Original work published 1921)

Loehlin, J.C. (1992). Genes and environment in personality development. SAGE.

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.

McCrae, R.R., & Costa, P.T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40.

Myers, I.B., & Myers, P.B. (1980). Gifts differing: Understanding personality type. CPP.

Myers, I.B., McCaulley, M.H., Quenk, N.L., & Hammer, A.L. (1998). MBTI manual: A guide to the development and use of the Myers-Briggs Type Indicator (3rd ed.). CPP.

Nardi, D. (2011). Neuroscience of personality: Brain savvy insights for all types of people. Radiance House.

Office of the Australian Information Commissioner. (2019). Australian Privacy Principles guidelines. OAIC.

Pittenger, D.J. (1993). Measuring the MBTI...and coming up short. Journal of Career Planning and Employment, 54(1), 48–52.

Pittenger, D.J. (2005). Cautionary comments regarding the Myers-Briggs Type Indicator. Consulting Psychology Journal: Practice and Research, 57(3), 210–221.

Plank, B., & Hovy, D. (2015). Personality traits on Twitter—or—how to get 1,500 personality tests in a week. Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 1–5.

Quenk, N.L. (1993). Beside ourselves: Our hidden personality in everyday life. CPP.

Quenk, N.L. (2002). Was that really me? How everyday stress brings out our hidden personality. Davies-Black.

Reynierse, J.H. (2009). The questionable theoretical basis of the MBTI's J-P scale and type dynamics. Journal of Psychological Type, 69(12), 105–122.

Reynierse, J.H. (2012). Toward a science of type: Moving beyond preference measurement. Journal of Psychological Type, 72(5), 1–36.

Soto, C.J., & John, O.P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143.

Stein, R., & Swan, A.B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching review of the literature. European Journal of Personality, 33(4), 490–507.

Thomson, L. (1998). Personality type: An owner's manual. Shambhala Publications.

Vazire, S., & Carlson, E.N. (2010). Self-knowledge of personality: Do people know themselves? Social and Personality Psychology Compass, 4(8), 605–620.

Von Franz, M.-L., & Hillman, J. (1971). Lectures on Jung's typology. Spring Publications.

Submitted for expert commentary. All citations should be verified against primary texts.