Dylan Glynn, Université Paris 8

CEFH - Dylan Glynn FOTO

Statistics is arguably the cornerstone of all empirical science. From the visualisation of complex and often subtle interactions in our data to the probability that our observations and results represent the reality of the world we seek to explain, contemporary science could not exist without statistics.  Especially important is the ability to model our data in ways that allow us to make predictions in order to test our hypotheses and calculate the accuracy of our descriptions.

The program R is the standard tool for performing statistics in corpus linguistics. Open source and cross-platform, this program is ideal for the kind of work that we, as social and cognitive scientists, need. However, the afternoon session will not be a course in R, but rather in what you can use R for. No knowledge of mathematics or programming is required.

1. Introduction - Categorical data and R
             a. Fundamentals
                          Population and sample - why we can't use percentages
                          Signification and confidence intervals - chow to test and generalise
                          Patterns et predictions – why use statistics
             b. R open-source, cross platform and cool :)
                          Data - clean and ordered
                          Visualising your counts - beyond pie charts
                          Chi-2 – a first step in significance

2. Correspondence Analysis
             a. Associations and identifying patterns in complex data
             b. coming

3. Cluster Analysis
             a. Sorting and identifying structures in complex data
             b. coming

4. Binary logistic regression
             a. Fixed Effects
             b. Mixed Effects