Recently, the bifactor model was suggested for the latent structure of the Rosenberg Self-Esteem Scale (RSES). The present paper investigates (i) the differences among bifactor, bifactor negative and other models; (ii) the effects of treating data as both categorical vs continuous; (iii) whether a problematic item in the Chinese RSES should be removed; and (iv) whether the final scoring would be affected. With a sample of 1.734 grade 4–6 school pupils in Hong Kong, we used BIC differences in addition to the usual model fit indices, and found that there was strong evidence for using the bifactor model (RMSEA = .052, 90% CI [.043, .062], CFI = .992, TLI = .984 for 9-item RSES categorical). Little difference is found between treating data as categorical or continuous for fit indices, but the factor loading patterns are better in categorical case. Keeping a problematic item has little effect on fit indices, but would lead to unexpected negative loading. The ranking of loadings within positive and negative items across different conditions are the same, which has important effects on scoring. Loadings in the method effects in the bifactor models are all positive (p < .001), which is different from previous research. All models show similar results on scoring, and support the usual simple sum score in most practice.