
Linguistic typology stands to gain significantly from advances in the use of extremely large datasets. However, our ability to secure these gains will depend on the availability of machine-readable data that is precise and comparable. Here we identify the challenges and opportunities ahead, relating to the quality, longevity, and (re-)usability of linguistic data in typology. Then in response, we introduce the DeAR principles (Decentralized, Automatically verified, Revisable), designed to guide and assist researchers to create diverse, high-resolution and robust datasets. We demonstrate the DeAR principles in action through the example of Paralex, a data standard (i.e., set of scientific conventions) developed collaboratively for lexicons of morphologically inflected forms. Our proposals aim to foster a more resilient and equitable infrastructure for the future of linguistic research.
