The P value debate has revealed that hypothesis testing is in crisis – also in our discipline! But what should we do now? Nature recently asked influential statisticians to recommend one change to improve science. Here are five answers: (1) Adjust for human cognition: Data analysis is not purely computational – it is a human behavior. So, we need to prevent cognitive mistakes. (2) Abandon statistical significance: Academia seems to like “statistical significance”, but P value thresholds are too often abused to decide between “effect” (favored hypothesis) and “no effect” (null hypothesis). (3) State false-positive risk, too: What matters is the probability that a significant result turns out to be a false positive. (4) Share analysis plans and results: Techniques to avoid false positives are to pre-register analysis plans, and to share all data and results of all analyses as well as any relevant syntax or code. (5) Change norms from within: Funders, journal editors and leading researchers need to act. Otherwise, researchers will continue to re-use outdated methods, and reviewers will demand what has been demanded of them.
Leek, J., McShane, B.B., Gelman, A., Colquhoun, D., Nuijten, M.B. & Goodman, S.N. (2017). Five Ways to Fix Statistics. Nature, 551 (2), 557-559. DOI: 10.1038/d41586-017-07522-z
We should not ignore that researchers – in general but also in supply chain management – are not always as properly trained to perform data analysis as they should be. A highly visible discussion is currently going on regarding the prevalent misuses of p-values. For example, too often research has been considered as “good” research, just because the p-value passed a specific threshold – also in the SCM discipline. But the p-value is not an interpretation, it rather needs interpretation! Some statisticians now even prefer to replace p-values with other approaches and some journals have decided to ban p-values. Based on this ongoing discussion, the influential American Statistical Association has now issued a Statement on Statistical Significance and p-values. It contains six principles underlying the proper use and interpretation of the p-value. As a discipline, we should take these principles seriously: in our own research, but also when we review the manuscripts of our colleagues.
Wasserstein, R., & Lazar, N. (2016). The ASA’s Statement on p-values: Context, Process, and Purpose. The American Statistician https://doi.org/10.1080/00031305.2016.1154108
The AVE–SV comparison (Fornell & Larcker, 1981) is certainly the most common technique for detecting discriminant validity violations on the construct level. An alternative technique, proposed by Henseler et al. (2015), is the heterotrait–monotrait (HTMT) ratio of correlations (see the video below). Based on simulation data, these authors show for variance-based structural equation modeling (SEM), e.g. PLS, that AVE–SV does not reliably detect discriminant validity violations, whereas HTMT identifies a lack of discriminant validity effectively. Results of a related study conducted by Voorhees et al. (2016) suggest that both AVE–SV and HTMT are recommended for detecting discriminant validity violations if covariance-based SEM, e.g. AMOS, is used. They show that the HTMT technique with a cutoff value of 0.85 – abbreviated as HTMT.85 – performs best overall. In other words, HTMT should be used in both variance-based and covariance-based SEM, AVE–SV should be used only in covariance-based SEM. One might be tempted to prefer inferential tests over such heuristics. However, the constrained ϕ approach did not perform well in Voorhees et al.’s study.
Fornell, C., & Larcker, D. (1981). Evaluating Structural Equation Models with Unobservable Variables and Measurement Error. Journal of Marketing Research, 18 (1) https://doi.org/10.2307/3151312
Henseler, J., Ringle, C., & Sarstedt, M. (2015). A New Criterion for Assessing Discriminant Validity in Variance-based Structural Equation Modeling. Journal of the Academy of Marketing Science, 43 (1), 115-135 https://doi.org/10.1007/s11747-014-0403-8
Voorhees, C., Brady, M., Calantone, R., & Ramirez, E. (2016). Discriminant Validity Testing in Marketing: An Analysis, Causes for Concern, and Proposed Remedies. Journal of the Academy of Marketing Science, 44 (1), 119-134 https://doi.org/10.1007/s11747-015-0455-4
Supply chain research typically investigates phenomena that occur in vertical relationships, e.g., between suppliers and buyers. In our new article, The Interplay of Different Types of Governance in Horizontal Cooperations: A View on Logistics Service Providers, we take a look at horizontal relationships. For example, such relationships occur when two LSPs collaborate to complement their service portfolios. Particularly, our research analyzes the influence of contractual governance on the effectiveness of two types of operational governance (a formal and a relational type). It relates contractual governance and operational governance to two major outcome dimensions of horizontal cooperations between LSPs (cooperation-based firm performance and cooperation-based learning). The results reveal that contractual safeguarding is able to partly replace process formalization when aiming for better cooperation-based firm performance and complement process formalization when aiming for cooperation-based learning. At the same time, relational capital is always complemented by contractual safeguarding independently from the desired cooperation outcome.
Raue, J.S., & Wieland, A. (2015). The Interplay of Different Types of Governance in Horizontal Cooperations: A View on Logistics Service Providers. International Journal of Logistics Management, 26 (2) DOI: 10.1108/IJLM-08-2012-0083
Theory-building empirical research needs formal conceptual definitions. Particularly, such definitions are necessary conditions for construct validity. But what is a “good” formal conceptual definition? In his seminal JOM paper, A Theory of Formal Conceptual Definitions: Developing Theory-building Measurement Instruments, Wacker (2004) presents eight rules for formal conceptual definitions: (1) “Definitions should be formally defined using primitive and derived terms.” (2) “Each concept should be uniquely defined.” (3) “Definitions should include only unambiguous and clear terms.” (4) “Definitions should have as few as possible terms in the conceptual definition to avoid violating the parsimony virtue of ‘good’ theory.” (5) “Definitions should be consistent within the [general academic] field.” (6) “Definitions should not make any term broader.” (7) “New hypotheses cannot be introduced in the definitions.” (8) “Statistical tests for content validity must be performed after the terms are formally defined.” These rules are explained in detail in Wacker’s article. I am convinced that Wacker’s rules lead to better measurement instruments.
Wacker, J.G. (2004). A Theory of Formal Conceptual Definitions: Developing Theory-building Measurement Instruments. Journal of Operations Management, 22 (6), 629-650 https://doi.org/10.1016/j.jom.2004.08.002
A few months ago, I presented the Handbook of Management Scales, an online collection of previously used multi-item measurement scales (see post). Quite similar, the Journal of Business Logistics has now published a compendium of multi-item scales utilized in logistics research – a good complement to my collection. The authors, Keller et al. (2013), found that not less than 980 scales were used in four journals related to logistics (IJLM, IJPDLM, JBL, TJ) between 2001 and 2010. It is the merit of the authors to identify and document these scales in an electronic Appendix, which contains “a categorical listing of multi-item scales and the available information concerning the scales’ validity and reliability”. The Appendix is available as a Word document. One can only guess how tedious it was to prepare the compendium. In addition, the authors offer a comparison of scales categories, a comparison with previous results and a comparison between JBL and the Journal of Marketing.
Keller, S.B., Hochard, K., Rudolph, T., & Boden, M. (2013). A Compendium of Multi-Item Scales Utilized in Logistics Research (2001–10): Progress Achieved Since Publication of the 1973–2000 Compendium. Journal of Business Logistics, 34 (2) DOI: 10.1111/jbl.12011
From time to time, I present insightful methodological articles on this blog. Today’s post is dedicated to an article by Edwards (2011): The fallacy of formative measurement (ORM, Vol. 14, No. 2). The article critically compares reflective and formative measurement, i.e., two optional directions of the relationship between constructs and measures in empirical research. Reflective measurement treats “constructs as causes of measures, such that measures are reflective manifestations of underlying constructs”, whereas formative measurement specifies “measures as causes of constructs, such that measures form or induce an underlying latent variable”. The article “compares reflective and formative measurement on the basis of dimensionality, internal consistency, identification, measurement error, construct validity, and causality”. It turns out that Edwards takes a negative stance towards formative measurement. Particularly, Edwards argues that “formative measurement is not a viable alternative to reflective measurement”. Edwards’s article was among the best paper winners of Organizational Research Methods in 2011.
Edwards, J.R. (2011). The Fallacy of Formative Measurement. Organizational Research Methods, 14 (2), 370-388 https://10.1177/1094428110378369
In their interesting article, A tale of three perspectives: Examining post hoc statistical techniques for detection and correction of common method variance, Richardson et al. (2009) define common-method variance (CMV) as “systematic error variance shared among variables measured with and introduced as a function of the same method and/or source”. Post-hoc techniques promise help, if the research design does not allow the independent and dependent variables to use data from different methods and sources. Richardson et al. evaluate (1) the correlational marker, (2) the confirmatory factor analysis (CFA) marker, and (3) the unmeasured latent method construct (ULMC) techniques. Interestingly, they find that only the CFA marker technique appears to have some practical value, but “recommend the CFA marker technique be used only as a means for providing evidence about the presence of CMV and only when researchers can be reasonably confident they have used an ideal marker”. A good description of the CFA marker technique can be found in an article by Williams et al. (2010).
In 2010, I launched the Handbook of Management Scales. It is a collection of previously used multi-item measurement scales in empirical management research literature and contains numerous scales related to SCM research. It contains scales from high-ranked journals that are developed in a systematic scale development process and that are tested to measure a construct in terms of specification, dimensionality, reliability, and validity. For each scale at least objective items, source, and, if available, reliability (e.g. Cronbach’s alpha) are listed. Particularly, structural equation modeling might benefit from the Handbook of Management Scales. It is a wikibook and can be edited by anyone. Feel free to expand the Handbook of Management Scales by adding good scales. This can help to further develop the Handbook as a useful resource for empirical management research. Related handbooks are the Handbook of Metrics for Research in Operations Management and the Handbook of Marketing Scales.
Wieland, A. et al. (2010 ff.). Handbook of Management Scales. Wikibooks. Online: http://en.wikibooks.org/wiki/Handbook_of_Management_Scales