Bien des notions en linguistique … apparaîtront sous un jour différent si on les rétablit dans le cadre du discours, qui est la langue en tant qu’assumée par l’homme qui parle, et dans la condition d’intersubjectivité, qui seule rend possible la communication linquistique.
(Benveniste, Reference Benveniste1966, p. 266)
1. Introduction
Engagement refers to a grammatical system for encoding the relative accessibility of an entity or state of affairs to the speaker and addressee. While many linguistic elements can be deployed to express intersubjective meanings of this kind (e.g., asserting that I know something and you don’t), the possibility that grammatical systems can be built around such values – themselves fundamental to social cognition – has barely been explored and remains an open question. In Part I we introduced the notion of engagement with an initial example from Andoke, where a four-way auxiliary choice, which is a core part of the grammar and has clause-level scope, encodes the speaker’s assumptions about the accessibility of the represented proposition to speaker and/or hearer across all four logical permutations (speaker only, hearer only, both, or neither). From there we passed to a discussion of the broader question of intersubjectivity in language – not necessarily grammaticalised – and then back to the ‘primal scene’ of attentional coordination as it is played out through the use of deictics to coordinate attention to objects. We placed special emphasis on systems like Turkish or Jahai, in which attentional coordination appears to be the primary function of at least one demonstrative.
In this second part of the paper, we return to systems where the scope is the proposition or clause rather than the entity or NP. We also broaden our typological base to show that systems of engagement with clausal scope are found in several geographical hotspots – particularly the Colombian Andes and Western Amazonia, and several parts of New Guinea. In §2 we examine two systems from South America, Kogi and Kakataibo, which resemble Andoke in taking the event as a whole, rather than an individual object, as the level at which grammaticalised engagement coordinates mutual attention. In §3, we examine how engagement can interact with other knowledge-related categories, for example, by taking not just the proposition itself but the evidence for it within its scope. Having worked our way upwards in terms of scope levels, from demonstratives (entities) through basic propositions (states of affairs) to meta-propositions (certain evidentials), we show the interconnections between them in §4, by examining a language (Abui) which has coerced the rich set of speaker- vs. addressee-based contrasts in its demonstrative system into use at different grammatical levels (interclausal marker, clause-final marker); in the process it has developed a set of engagement markers from more basic deictic contrasts. We conclude in §5 by drawing together the threads of these various systems, suggesting some directions in which a more comprehensive typology of engagement can be developed in future research.
2. Engagement and states of affairs
We have already introduced one example of a language of Colombia, Andoke, with grammatical marking indicating the presumed degree of speaker and/or addressee knowledge or attention (broadly, accessibility) regarding an event, drawing on the seminal study by Landaburu (Reference Landaburu, Guentchéva and Landaburu2007). We now examine in detail two further languages where engagement has scope over clauses / states of affairs. In §2.1, we turn to Kogi, an unrelated Colombian language, which organises the four-way choice of engagement values into two sets of two, defined by a contrast between speaker-perspective and addressee-perspective. In §2.2 we look at Kakataibo, which also clearly manifests contrasts between speaker-focused and addressee-focused evidence, but in a way that is structurally less neat than either Andoke or Kogi.
2.1. epistemic marking in Kogi
Kogi (Arwako-Chibchan) has a tightly structured, paradigmatic set of epistemic markers, prefixed to an auxiliary verb, whose function is to signal the speaker’s assumptions regarding epistemic (a)symmetries between the speech participants with respect to an event (see Bergqvist, Reference Bergqvist, Austin, Bond, Nathan and Marten2011, Reference Bergqvist2016). ‘Symmetry’ denotes a situation where speech participants have shared access to an event, whereas ‘asymmetry’ indicates that access is exclusive to one party. Accessibility is subject to epistemic authority, which may reside with the speaker, or the addressee (see directly below). The set of epistemic markers consists of five prefixes: na-, ni-, sha-, shi-, and ska(n)–. Footnote 1
Na- and ni- both signal that the epistemic authority rests with the speaker. Na- denotes the speaker’s exclusive access to an event, while ni- denotes shared access between the speaker and the addressee. Consider the examples in (1):
-
(1) a. kwisa-té na-nuk-kú
dance-impf spkr.asym-be.loc-1sg
‘I am/was dancing.’ {I am informing you} (JM_130613)
-
b. kwisa-té ni-nuk-kú
dance-impf spkr.sym-be.loc-1sg
‘I am/was dancing.’ {as you know / are aware} (BUN_090824)
The verb form nanukkú in (1a) is appropriate in a situation where the speaker claims epistemic authority (in this case related to performing the action in question) without assuming that the addressee is aware of or knows the event referred to. For example, it could be uttered in a situation where the addressee has just asked the speaker what they are doing in another room. Access in (1a) is thus asymmetrical. The form ninukkú, on the other hand, is appropriate when the speaker claims epistemic authority while at the same time assuming that the addressee already knows, or is aware of, the event. Thus (1b) could be uttered in a situation where the speaker is asked to do something else and replies that they can’t do this right now because of their current activity, namely dancing. Access is in this case presented as symmetrical.
The forms, shi- and sha-, in contrast, pass the epistemic authority to the addressee. Sha- denotes the addressee’s exclusive access (2a), while shi- denotes shared access between the addressee and the speaker (2b):
-
(2) a. nas hanchibé sha - kwísa=tuk-(k)u
1sg.ind good adr.asym-dance=be.loc-1sg
‘I am dancing well.’ {don’t you think?} (BUN_090824)
-
b. kwisa-té shi-ba-lox
dance-impf adr.asym-2sg-be.loc
‘You are/were dancing.’ {right?} (BUN_090824)
As would be expected from ‘territory of knowledge’ considerations, vesting of the epistemic authority with the addressee frequently correlates with second person subject markers, as shown in (2b), but the distribution of the addressee-authority forms sha- and shi- is by no means restricted by the person of the subject, as shown in (2a) where the event concerns the actions of the speaker. Example (2a) could be uttered in a situation where someone learning how to dance seeks an evaluation from the instructor. By uttering the sentence in (2a), the speaker indicates that they think they are dancing well, but leaves it up to the addressee to agree or disagree. Example (2b) could be uttered in a situation where the speaker comments on the obvious activity of the addressee, but invites agreement from the addressee, who is offered the ultimate authority for the assertion. The paradigm of forms is shown in Table 1.
There is a functional overlap between the notions of speaker- vs. addressee-authority and of sentence-type. While na-/ni clearly occur in declarative clauses, the addressee-authority forms shi- and sha- might appear prima facie to be interrogative markers, as is suggested by the paraphrases in curly brackets (i.e., don’t you think? / right?). However, there are both grammatical and distributional reasons to analyse these as occurring in declarative clauses as well.
First, interrogative constructions can be formed without sha-/shi-, for example with a content interrogative (3a) or the interrogative marker -é (3b):
-
(3) a. sakí mi-k-zéi-shi Footnote 2
what 2o-dat-feel-ptcp
‘How are you?’ (DAM_090819)
-
b. néi ma-gu-ngu-é
go 2sg-do-pst-int
‘Did you go?’ (DAM_090820)
Second, the interrogative marker -e and the engagement prefix sha- are in complementary distribution (4): it is ungrammatical to combine the shi-/sha-prefixes with the interrogative -e. The semantic difference between -e and sha- is suggested by the translation of example (4) where ‘thinking about something’ (e.g., what to eat, or where to go) differs from ‘having an opinion about something’ (cf. (2a) above). The key difference in meaning is whether the speaker expresses his/her assumptions regarding the addressee’s thoughts and opinions, or not. In (4a), the speaker avoids making such assumptions by using -e. In (4b), on the other hand, the speaker assumes that the addressee has an opinion/thought about something and signals, at the same time, that the addressee has epistemic authority concerning what this opinion consists of. Given an otherwise identical construction, this difference in meaning must be attributed to the semantics of the individual forms, which in the case of sha- aligns with its proposed exclusive meaning (asymmetry).
-
(4) a. sakí hangwa-ba-lóx-e
what think-2sg-prog-int
‘What are you thinking about?’
-
b. sakí sha-hangwa-ba-lóx
what adr.asym-think-2sg-prog
‘What do you think (about something)?’
(BUN_090826)
The presence of the speaker’s assertion in the shi-/sha- forms is also apparent from their use in narratives. Depending on the specific setting for a narrative, an addressee-oriented stance may be adopted by marking monologic stretches of speech with either shi- or sha-. Consider the extract in (5), taken from a first person account of what life was like in the region of the Sierra Nevada de Santa Marta before the colonisers came and claimed much of the Kogi’s traditional lands.
-
(5) hate-kwe-ha~ Ø-izhi-hĩ dzaldzí-chi hixa aró hixa
father-pl-agt 3sg-bring-prtc non.indigenous-abl nor rice nor
aka-té Ø-to-a-kí hei-ni zeldázã
eat-prog 3sg-see-perf-neg this-loc food
‘The elders were not bringing (food) from the outsiders; not rice, nor had they seen eating (of this kind), only traditional food.’
[…]
hei-kí hei-kí shi-tu-lo-ku-ã
this-foc this-foc adr.sym-see-prog-1sg-perf
‘This, this is what I saw.’
(JM_130613)
The use of shi- in the final utterance of a longer stretch of speech serves to invite the (potentially) overlapping points of view of the speaker’s peers, who are present during the performance of the narrative. Notably, in other parts of the narrative, sha- is used interchangeably with shi- (see Bergqvist, Reference Bergqvist2016). Comparable narratives that are told to foreigners, or persons unfamiliar with the Kogi way of life, do not feature the shi-/sha-forms. Instead, they usually feature the na-/ni- forms, which, as stated, focus on the epistemic authority of the speaker.
While Kogi epistemic prefixes are frequent in discourse, they are not obligatory. Their grammatical status is also restricted in that the na-/ni-/sha-/shi-forms are mainly found in auxiliary constructions where they attach to the auxiliary head. Non-auxiliary (synthetic) verb phrases cannot directly take the epistemic prefixes. A way around this restriction is available, however, by using periphrastic auxiliaries (6):
-
(6) nas kwisa-nuk-ku-gé na - kla
1sg.ind dance-prog-1sg-hab spkr.asym-be
‘(Can’t you see) I am dancing!’ (ARR_120520)
Nakla is arguably not part of the verbal core, which is limited to the synthetic verb phrase (kwisanukkugé). Exactly what the functional and/or semantic difference between examples (1a) and (6) consists of remains to be explained.
The semantic scope of the prefixes includes tense, aspect, mood, and polarity. An example of how epistemic asymmetry scopes over modality is in (7a, b). In these examples the impossibility of sleeping is modified by the ni-/na- contrast, which target differences in epistemic symmetry:
-
(7) a. kaba-gasã ni-ba-kú
sleep-neg.pot spkr.sym-2sg-do
‘(Now) you can’t sleep anymore.’ (e.g., because it’s morning)
-
b. kaba-gasã na-ba-kú
sleep-neg.pot spkr.asym-2sg-do
‘You can’t sleep anymore.’ (e.g., because I say so, or for reasons unknown to you)
(ARR_120520)
Pragmatic interpretation effects that cannot be attributed to the encoded meaning of the forms, but which may result from their combination with certain contextual cues, include temporal displacement and attitudinal shades of meaning, such as ‘familiarity’ and ‘affection’. These are both forms which interact with time reference (see Bergqvist, Reference Bergqvist, Austin, Bond, Nathan and Marten2011, Reference Bergqvist2016).
Given the non-obligatory status of the discussed forms, what motivates the use of ni-/na-/shi-/sha- and when are they omitted? While the pragmatic considerations relevant to predicting the use of these prefixes have not yet been exhaustively explored, there are some initial indications. An important determinant of the (a)symmetry marker’s distribution is purely interactional: if there is an opposing claim to the one held by the speaker, then this may be contradicted by asserting (asymmetric) epistemic authority (cf. I do like the Eagles’ first album!). Conversely, the speaker may be forced to defer authority to the addressee in order to be able to talk about certain topics at all, such as the opinions of the addressee. Drawing on a model for stance-taking that aligns the speaker’s evaluation/positioning of an event with the addressee’s evaluation/positioning of the same event (Du Bois, Reference Du Bois and Englebretson2007), we see that the notion of epistemic asymmetry in Kogi is most likely to be used when an event has direct relevance for the speaker and/or the addressee. This pertains especially to events within the speech participant’s presumed ‘territory of information’ (Kamio, Reference Kamio1997), including ones that involve family members, expert knowledge, and personal experience. In contrast, engagement prefixes will be omitted where the speaker judges an event as inconsequential to him/herself and the addressee, for example, events involving third persons that do not require an evaluation.
2.2. speaker- vs. addressee-perspective in Kakataibo
While there is a solid tradition for the study of speaker’s perspective (in modality and evidentiality systems, for instance), the cross-linguistic apparatus for the study of the encoding of the perspective of the addressee is currently being built.
(Zariquiey, Reference Zariquiey2015, p. 161)
Kakataibo is a Panoan language of Peru that is of special interest for the number of markers it devotes to encoding “the expectations of the speaker about the perspective of the addressee in relation to the information presented in an utterance” (Zariquiey, Reference Zariquiey2015, p. 143). Footnote 3 These markers are found both in the final affix slot on verbs, and in special slots at the end of clitic strings in clause-second position.
A primary category distinction that affects the set of addressee-sensitive grammatical choices in Kakataibo is the difference between narrative and conversational genres, reflecting differences in the differential accessibility of information between recounted events vs. the here-and-now.
In the narrative genre, verbal suffix morphology opposes -a ‘unmarked’ to -ín ‘(unexpectedly) proximal / accessible to the addressee’. The default is to use the unmarked form, since normally one talks about things not known to the addressee, but Zariquiey discusses some revealing cases where the narrative passes from information (correctly) assumed by the speaker not to be known to the addressee, to information with which the addressee is familiar. For example, in (8a) the speaker begins a text with clan information unknown to the addressee, and uses the unmarked suffix -a, but somewhat later in the text (8b) he passes to the mention of a particular man (the son of one of the three brothers referred to in (8a)). This man was a close friend of the addressee, triggering a shift to -ín. Note that, while the key addressee-accessible information is the NP este Nicolás Aguilar ‘this Nicolás Aguilar’, the addressee-proximity is marked on the head of the clause as a whole, namely the verb. This resembles the location of engagement marking in Andoke and Kogi that we discussed above. Footnote 4
-
(8) a. A kimisha uni i-akë-x-a tres hermanos
That three man.abs be-rem.pst-3-unm three brothers
‘Those three men were three brothers.’
-
b. Este Nicolás Aguilar a-x i-akë-x-ín
This Nicolás Aguilar 3pl-s be-rem.pst-3pl-prox
‘that (man) … His son was this Nicolás Aguilar’ (perhaps better rendered in English as ‘was your Nicolás Aguilar’)
As well as -ín, there is what Zariquiey (Reference Zariquiey2015, pp. 154–155) calls a special second person malefactive suffix -ié. Footnote 5 This is used when reporting an event that will impact negatively on the addressee, but only when “the event is assumed by the speaker to be non-proximal from the perspective of the addressee in the sense that the information is not perceptually accessible for him or her”. An example:
-
(9) Goliath=n kamënë´ mi=n
Goliath=erg nar.3pl.mir you=gen
kuriki mëkamat-ié:
money.abs steal-3pl.2mal.non.prox
‘Goliath took your money.’
Within the conversational genre, addressee perspective is manifested in a different grammatical site – at the end of a string of second position clitics. As with -ié: but in opposition to -ín, the assumption of addressee ignorance attaches to these clitics, but in contradistinction to both cases there is a focus on the speaker’s (cognitive integration of) knowledge: certainty, previously established, in the case of the ‘certitudinal’, and surprise in the case of the mirative. More specifically, the =pa ‘certitudinal’ clitic is used in recounting events which the addressee wasn’t present to witness, while the =pënë ‘mirative’ “indicates that the addressee and the speaker have different perspectives or are in different places at the moment of the speech act” (Zariquiey, Reference Zariquiey2015, p. 158). For example, if the speaker discovers something about the addressee’s son, and reports it, he would use one of two forms depending on the time of the discovery. He would use pa (in the sequence riapa) if he discovered it earlier and then went to tell the addressee it is true, but pënë´ (in the sequence riapënë´) if he is seeing it at the moment of reporting, but the addressee can’t, e.g., because he is too far from where the event takes place.
While it is clear on the one hand that there are a number of categories in Kakataibo relevant to the monitoring of addressee’s presumed knowledge or access to information, the organisation of the grammar differs from Andoke or Kogi in not presenting a single organised paradigm detached from other categories. There are different grammatical strategies depending on whether the genre is narrative or conversational, leading to different locations for the addressee-oriented marker (verbal suffix vs. second position clitic). The encoding of presumed addressee non-knowledge gets melded in with second person malefactive in the case of -ié:, and within the conversational genre it is mixed up with degrees of speaker integration and ratification of knowledge. Finally, there are differences in whether the relevant markers emphasise accessibility to the addressee (-ín), against the presumed background of inaccessibility in narratives, or inaccessibility (=pa and =pënë), against the background of presumed accessibility in face to face interaction.
3. Engagement, evidence, and other epistemic categories
In the preceding section we have focused on the expression of accessibility and knowledge as either present or absent across speech act participants, with this mental directedness portrayed as either particular to speaker or hearer, or shared between them. However, we cannot stop there, as additional qualities of knowledge (for example, source and certainty) may also be incorporated with engagement-type values. Here we discuss some examples of how the more classic knowledge-related category of evidentiality, and to some extent those of epistemic modality and mirativity, can combine with the grammaticalised marking of engagement. In certain cases we can view these systems as metapropositional operators, where attention is coordinated not necessarily towards an event itself, but rather to the evidence for it. This represents a similar shift in level as that from entity (typically, the province of demonstratives in the noun phrase) to state of affairs (typically, the province of verbal operators in the clause), as discussed previously.
Evidentiality is conservatively defined as ‘grammaticised information source’ (Aikhenvald, Reference Aikhenvald2004). Typically, evidential morphemes specify the kind of evidence that an assertion is based on, for example, whether the event was seen to happen, or is being reported from hearsay. More rarely, evidentials may take scope over a referent (e.g., stating that an entity is known about through hearsay) rather than a state of affairs (see, e.g., Aikhenvald, Reference Aikhenvald2015; Gutiérrez & Matthewson, Reference Gutiérrez, Matthewson and Bogal-Allbritten2012; Hanks, Reference Hanks1990; Jacques & Lahaussois, Reference Jacques and Lahaussois2014; San Roque, Reference San Roque2008).
For some constructions in a language that marks an event like ‘peccaries crossed the path here’ with a perceptual evidential, there is a metapropositional operator, representing the epistemic commitment of perception, which takes the basic proposition in its scope. Exactly how this epistemic commitment is best represented for individual morphemes and languages is an interesting problem – at one extreme (e.g., Fleck, Reference Fleck2007; Speas, Reference Speas2004) are analyses that treat the epistemic commitment as a (fully tensed) proposition with an identifiable perceiver-argument (I saw that …), at the other extreme (see, e.g., San Roque, Reference San Roque2015) are underspecified representations that do not anchor the information source to any particular deictic centre (e.g., ‘through visual evidence’). For present purposes, our main goal is to show that these metapropositions of evidence can themselves be modulated according to the same categories of engagement that apply to propositions.
Studies of evidentiality have usually focused on the speaker as an experiencer of evidence, and it certainly seems to be the case that evidential markers tend to be used to express the speaker’s perspective. As a general rule, we make claims about our own evidence for the things we say. However, it has long been known that evidential morphology can also represent non-speaker perspectives. For example, questions typically take the evidential perspective of the addressee (Aikhenvald, Reference Aikhenvald2004; San Roque, Floyd, & Norcliffe, 2017), while third person narratives may be at least partly told from the evidential perspective of a central protagonist (see examples in Brugman & Macaulay, Reference Brugman and Macaulay2015). Certain languages appear to have taken this ability to represent the evidence of others a step further, and encode not one but two evidential perspectives simultaneously: that of both the speaker and the hearer. While such systems have been described (or at least sketched) for several different languages, our understanding of them is still in its infancy, and, with some exceptions, little material is available on how such distinctions are operationalised in discourse. We limit ourselves here to outlining a few of the known contrasts, looking first at several languages that appear to make specific claims about the nature of an addressee’s evidence.
Several languages of New Guinea, including Foe (Rule, Reference Rule1977), Wola (Sillitoe, Reference Sillitoe2010), and Pole (Rule, Reference Rule1977), are described as encoding whether an information source is shared between speaker and hearer, or exclusive to one of them (see also San Roque & Loughnane, 2012a, 2012b). Foe (Rule, Reference Rule1977) has a rich evidential system in independent clauses that distinguishes up to five information source categories (participatory, visual, non-visual sensory, inferred, assumed) across four tenses (present, near past, far past, future), three moods (indicative, customary, abilitative) and two sentence types (declarative and interrogative). These evidentials reflect a single perspective, typically that of the speaker in statements and the addressee in questions.
However, in nominalised clauses, an additional distinction is introduced into the participatory/visual evidential paradigm: whether or not the addressee witnessed the event or situation in question. Thus, Rule (1977, p. 97) describes a set of nominalisers used for a “fact known to speaker but unseen by person spoken to” as opposed to events “seen by both speaker and person spoken to” (see Table 2). Nominalisers also have special forms to indicate non-visual sensory and inferred evidence, but for these suffixes the addressee’s (presumed) perspective is not specified.
a Note the recurrent formal opposition between -ra in the first row and -bo/ba in the second. It is tempting to relate the -ba formative to the distal demonstrative free word ba in Foe; a -ba formative also occurs in other nominalisations, namely those making statements determined on grounds of present evidence. The corresponding proximal demonstrative is -to (Rule Reference Rule1977, p. 19), and the only way to relate this to -ra would be by means of some change like -to > -ro > -ra.
Examples of the contrastive far past nominalisers -ira and -bo’owa (as used in the formation of a relative clause) are shown in (10a) and (10b), respectively. While Rule does not provide details of context, we can assume that in (10a) only the speaker witnessed or was otherwise involved in the killing of the men long ago, whereas in (10b), both speaker and addressee saw the pig being killed:
-
(10) a. amena gahaye hü-ira bi hüyoga-bi’ae
?men previously hit/strike-fp.kts.nmz ?here bury-fp.ptcy
‘The men who were killed a long time ago, we buried here.’
(Rule, Reference Rule1977, p. 97, gloss added)
-
b. nami davi hü-bo’owa to’ae
pig 2.days.away hit/strike-fp.ssa.nmz ?this
‘This is the pig which was killed two days ago.’
(Rule, Reference Rule1977, p. 97, gloss added)
The Engan language Pole uses a special marker on main verbs when referring to past events that both the speaker and addressee saw (Rule, Reference Rule1977). Another Engan language, Wola, has a more complex system of evidential contrasts in independent clauses. According to Sillitoe’s (2010) analysis, in the near and far past tenses Wola contrasts five kinds of speaker/addressee evidence:
-
i. both speaker and hearer witness [or participate in] Footnote 6
-
ii. either speaker or hearer witnesses [or participates in]
-
iii. hearer did not witness but heard of previously
-
iv. speaker did not witness
-
v. neither speaker nor hearer witness
Sillitoe (Reference Sillitoe2010) outlines how persuasion in Wola society is only regarded as effective if the status of propositions can be epistemically upgraded, through conversation and praxis, from the ‘witnessed by speaker’ to the ‘witnessed by speaker and addressee’ categories. Understanding how this epistemic distribution interacts with evidentials, he argues, is crucial for development workers in countries like Papua New Guinea: only by understanding the operation of grammatical markers of who knows what can they establish plausibility and trust in the message they wish to convey:
[I]n parts of the Papua New Guinea highlands where the authority of the nation-state is weak to non-existent … participation (featuring bisumindis ‘we do, both parties witness’ knowledge) will be necessary if development initiatives are to have any hope. Agencies will have not only to involve people but also to demonstrate the effectiveness of their views and proposals. People will not heed what others direct as best unless they can ‘see’ – i.e. think or know – that it will work for them. They are suspicious of experts (with, at best, their biso, ‘s/he did, speaker only witnessed’, knowledge) given a propensity to question the necessary validity of others’ experience and only fully to trust in their own, paying heed to what they ‘see’ themselves. (Sillitoe, Reference Sillitoe2010, p. 26)
In the evidential systems found in the New Guinea Highlands, markers that indicate awareness of the addressee’s visual experience, or lack of it, thus appear to be especially prominent. This suggests the comparative ease of assessing whether or not an addressee was an eyewitness of some event, as opposed to more ‘hidden’ mental processes such as inference and assumption (see also San Roque et al., Reference San Roque, Floyd and Norcliffe2017). The Papuan language Duna (which neighbours the Engan language family) shows a spin on this tendency by including an inflection (-noko ~ -naoko) that does not make a definitive claim about the addressee’s visual experience, but suggests that he or she could have seen something that the speaker already knows about. An example is shown in (11). The (hypothetical) context is that Speaker A has asked B if they went to the market, and B has said they did, in company with another person (Mary). Speaker A finds this surprising, as she saw Mary but not Speaker B. Speaker B asserts that nevertheless they were there in plain view.
-
(11) A: ko no na-ke-ya, Mari no ke-o.
2sg 1sg neg-see-neg psn 1sg see-pfv
‘You I didn’t see, Mary I saw.’
B: neya=nia, no ngo- naoko .
not=assert 1sg go-pot.obs
‘No, I went (you could have seen me).’
Utterances marked with this inflection are often functionally interpreted as questions concerning what the addressee has seen (San Roque, Reference San Roque2008). In (12) this implicit question (‘did you see?’) is made explicit. In this hypothetical context, the speaker is relatively certain that the addressee would have walked past the burned school building in order to reach the place where they are now talking.
-
(12) skul-anda khira- noko , ke-o=pe.
school-encl burn-pot.obs see-pfv=q
‘The school burned, did you see it?’
In some instances the Duna ‘potential observation’ inflection thus appears to instruct the addressee to reflect on and perhaps to talk about their visual experience (see also San Roque, Reference San Roque2015). It may be that this is one of the important pragmatic functions of addressee-oriented visual evidentials more generally.
Outside of New Guinea, evidential systems that include a contrast between exclusive speaker knowledge as opposed to inclusive, shared knowledge have been briefly described for several languages of South America, such as Jaqaru (Hardman, Reference Hardman, Chafe and Nichols1986) and Southern Nambikuara (Kroeker, Reference Kroeker2001). For example, according to existing analyses Southern Nambikuara distinguishes between ‘individual’ (speaker-based) and ‘collective’ (speaker + hearer-based) observation, using the suffix -na 2 in the first case and -ti 2 tu 3 in the second (subject to different tense distinctions). Compare: wa 3 kon 3 na 2 la 2 ‘He worked today (I saw it, but you didn’t)’ (Kroeker, Reference Kroeker2001, p. 63) vs. wa 3 kon 3 tait 2 ti 2 tu 3 wa 2 ‘He worked (we both saw it)’ (Lowe, Reference Lowe, Dixon and Aikhenvald1999, p. 276). More recently, a related contrast has been discussed for the Tibeto-Burman language Kurtöp (Hyslop, Reference Hyslop2014).
The evidential markers discussed so far are described as encoding a specific kind of information source (e.g., direct observation) that (the speaker claims) an addressee has for an event. Contrasts relevant to engagement can also be embedded within what have been analysed as evidential systems in other ways, without identifying the exact nature of the address’s evidence. For example, according to Willett (1991, pp. 162–165), evidentials of Southeastern Tepehuan mark (i) the information source of the speaker and, in the reported category only, (ii) whether (the speaker claims that) the proposition is old or new knowledge for the addressee. The particle sap is used for “reported evidence previously unknown to the hearer” (13), whereas sac is used for reported evidence where “the speaker reminds the hearer of information he already knows the hearer is aware of” (14). Willett notes that sac is much less frequent than sap in both conversation and folklore narratives, suggesting that it may be a situationally and pragmatically marked choice.
-
(13) Oidya-’-ap gu-m tat. Jimi-a’ sap para
go.with-fut-2sg art-2sg father, go-fut reu to
Vódamtam cavuimuc.
Mezquital tomorrow
‘(You should) accompany your father. He says he’s going to Mezquital tomorrow.’
(Willett, Reference Willett1991, example (465))
-
(14) Va-jɨ́pir gu-m bí na-p sac tu-jugui-a’.
rlz-get.cold art-2sg food sub-2sg rek exp-eat-fut
‘Your food is already cold. (You said) you were going to eat.’
(Willett, Reference Willett1991, example (471))
An important thing to note in the Tepehuan case is that, whereas the epistemic channel by which the speaker gained their knowledge is explicitly identified as reported, that of the addressee is unspecified. In this respect, the assessment of evidential source as between speaker and addressee is less clearly symmetric than in such examples as Foe. Rather, the assessment of addressee knowledge seems to be straying into the (embattled) territory of mirativity, the marking of knowledge as new or unexpected, as already mentioned in relation to Kakataibo, above. We are yet to note a fully-fledged grammatical system that paradigmatically distinguishes (a)symmetric combinations of mirativity and engagement (e.g., with such specifications as ‘this is news to both of us’ versus ‘this is old news for you, but news to me’). However, the potential for a language to have dedicated addressee-oriented mirative markers (‘this is news for you!’) has received more attention of late (e.g., Hengeveld & Olbertz, Reference Hengeveld and Olbertz2012; Mexas, Reference Mexas2016; see also Gossner, Reference Gossner1994), and an interest in the general problem goes back at least to discussions of the ‘hot news’ use of the English perfect by McCoard (Reference McCoard1978) and McCawley (Reference McCawley1981), of the type Malcolm X has just been assassinated. This suggests that the newness of knowledge of some state of affairs may be coded independently for speaker and hearer in some grammars.
To take a different approach again, Hintz and Hintz (Reference Hintz and Hintz2017) describe how in South Conchucos Quechua the category of ‘mutual knowledge’ between speaker and addressee actually has a dedicated marker (the morpheme -cha:) within the evidential system. The exact nature of the source for this mutual knowledge can be quite varied, so there is a focus on the end state of shared awareness, rather than on the way this knowledge was acquired. (This could even be interpreted as a non-mirative marker in relation to speaker and addressee.) They also describe the evidential system of another variety, Sihuas Quechua, where an ‘individual’ vs. ‘mutual’ contrast is available for all evidential contrasts, symmetrically organised so that -i indicates ‘individual’ and -a ‘mutual’. Summarising the interaction of evidence type and its epistemic distribution, they conclude:
[I]nformation sources for the evidential category of mutual knowledge include the contributions of conversational participants, the beliefs and assumptions of the participants when interpreting shared experiences, and what members of the speech community can be expected to know about the world. Speakers use individual knowledge evidentials to introduce information and then use mutual knowledge evidentials once the fact has been established by consensus. (Hintz & Hintz, Reference Hintz and Hintz2017, p. 107)
The South Conchucos Quechua case shows similarities to Kogi, but in the Quechua variety this category is marked in contrast to evidential values such as ‘reported’, rather than being part of a paradigm that deals primarily with epistemic (a)symmetry.
Like Andoke and Kogi, all of the languages discussed above have developed morphemes that encode a range of epistemic configurations between speaker and hearer, but intertwined with the evidence for a proposition rather than simply for the proposition itself. Communicatively they can be used for such functions as to remind the addressee of shared knowledge and experience, to highlight the speaker’s more exclusive access to a particular event, to acknowledge or direct the addressee’s attention to relevant evidence, or to confirm the status of information as mutually known and agreed upon.
As has been extensively discussed and disputed in the literature, there is a close relationship between the semantic domains of information source and certainty, and thus, the grammatical categories of evidentiality and epistemic modality (e.g., Aikhenvald, Reference Aikhenvald2004; Chafe & Nichols, Reference Chafe and Nichols1986; Palmer Reference Palmer2001). Similarly as for evidentials in Foe and other languages, languages may offer options for a speaker to encode whether an epistemic modal value (e.g., certain, probable) is assumed to be shared by the addressee.
One example of this is found in the language Yurakaré (Gipper Reference Gipper2011, Reference Gipper2015). Yurakaré has two morphemes, =ya and =laba, both of which indicate that “the speaker considers the proposition to be possibly or probably true” (Gipper, Reference Gipper2015). The difference between them is that the ‘intersubjective’ =ya is used with assertions where the speaker expects the addressee to share his or her belief, whereas the ‘subjective’ =laba does not express any assumptions concerning the addressee’s state of mind. Gipper (Reference Gipper2015) describes how this difference in meaning has consequences for the distribution of the two markers: intersubjective =ya is typically found in situations of ‘symmetric’ knowledge, where both speaker and addressee have equal access to the information upon which the judgement is based. Her findings are based on quantitative and qualitative analyses of a video corpus of approximately 5.25 hours of (mostly dyadic) conversation. An example is shown in (15), where two speakers discuss the state of the lagoon in their village.
-
(15) Yurakaré [160906_conv]
M: ((turns his head, chin-points to the lagoon outside))
ujmanaj tishi kadyimta (.) tajudawa=
ujwa-ma=naja tishilë
look-imp.sg=new.situation now
ka-dyimta-ø ta-kudawa
3sg.obj-subside-3sg.sbj 1pl.poss-lagoon
‘Look, the water in our lagoon has subsided.’
P: =të bij:[binta dyimta kompadre yosse]
të bij~binta dyimta-ø kompadre yosse
intj ints~strong subside-3sg.sbj compadre(sp) again
‘Yes, it has subsided very much again.’
M: [të::j] (0.7)
të
intj
‘Yes.’
P: namashtay tajudawa yosse
nama-shta-ø= ya ta-kudawa yosse
dry-fut-3sg.sbj=intsubj 1pl.poss=lagoon again
‘Probably our lagoon will dry out again.’
By contrast, the subjective marker =laba is commonly used in both symmetric and asymmetric contexts, as the addressee’s knowledge state is not at issue. An example with an asymmetric context is shown in (16), where the addressee has superior access to the information in question: the epistemic perspectives of speaker and addressee are disparate, not shared, and the intersubjective marker =ya would not be appropriate:
-
(16) Yurakaré [290906_convI]
A: batamlab tishil na loma alta(chi) ((gaze to addressee)) (.)
bata-m= laba tishilё naa loma alta=chi
go.fut-2sg.sbj=subj now dem Loma Alta=dir
‘You are going to Loma Alta today, I think?’
E: nijtala
nijta=la
neg=comm
‘No.’
Gipper (Reference Gipper2015) further notes (among other findings) that =ya is used comparatively more frequently than =laba in ‘agreeing responses’, where the speaker agrees with what has just been said, and (unlike =laba) is never used in disagreeing responses. She argues that, as an intersubjective marker, =ya is highly compatible with agreeing responses because these are situations where “a shared epistemic perspective is explicitly expressed”. By the same token, =ya is not appropriate to disagreeing responses, where the epistemic perspectives of speaker and addressee are explicitly at odds.
A further example of engagement combined with epistemic modality is found in the Tibeto-Burman language Kinnauri (Saxena, Reference Saxena, Johanson and Utas2000). In this case, the copula ni expresses contrastive values of speaker and addressee certainty, being used where the speaker is confident about what they are asserting, against the addressee’s perceived doubts. In (17), to would be used “when Sonam is either a family member of the speaker, or is presently with the speaker. Du is used when Sonam is not a family member of the speaker, nor is … in physical proximity to the speaker. Ni is used if the hearer has some doubts about Sonam being a good person and the speaker knows that she is a good person” (Saxena, Reference Saxena, Johanson and Utas2000, p. 473). While the first two copula forms contrast different degrees of authority / epistemic access on the part of the speaker, the third form combines an authoritative positive modal assessment by the speaker with an assumption that the addressee does not share this assessment.
-
(17) Sonam dam to / du / ni
[proper.name] good be1:pres / be2:pres / be3:pres
‘Sonam is good.’
Overall, then, various additional qualities of knowledge (evidence, oldness/newness, certainty) can be expressed not only in regard to a single perspective, but also in regard to both speaker and hearer, and/or as a relation between them. There is no reason to assume that the expression of engagement is limited to these specific qualities, but we can rather expect that many other aspects of the mental directedness of interlocutors can be grammaticalised (§5). At the same time, however, we note that it is very unusual to find a comprehensive grammatical system of engagement and evidential (etc.) contrasts. That is, the full range of logical possibilities (e.g., speaker saw the event, hearer saw the event, both saw it, neither saw it; speaker inferred the event, hearer inferred … etc.) is rarely, if ever, morphologically differentiated within a single paradigm. This rarity of bidimensional systems may reflect the regular correlation, in most situations, between accessibility and evidence: direct access allows direct evidential reading, lack of direct access means that an assertion is founded on some form of evidence other than current mutual accessibility.
4. Engagement, level-shifting, and diachrony
Part of our rationale in progressing from demonstratives through engagement markers operating at clausal level, and on to markers with evidence/certainty in their scope, has been that the same processes of mutual coordination are at work, whatever the level in terms of syntactic or semantic structures. Up to this point, however, we have not examined languages where this connection is made clear. But we now pass to a Papuan language, Abui (Kratochvil, Reference Kratochvil2011a, Reference Kratochvil, Yap, Grunow-Hårsta and Wrona2011b), which illustrates the connections remarkably clearly thanks to the way it deploys its demonstratives with various levels of syntactic scope – a way somewhat reminiscent of how some Australian languages deploy case-suffixes at various syntactic levels (embedded NP, NP at clause level, embedded clause) with differential semantic effects; see also Schapper and San Roque (2011) concerning clause-level demonstratives in other Timor-Alor-Pantar languages.
We have already surveyed, in §5 of Part I, an interesting system of basic demonstratives in Abui, which recombines the proximal vs. medial distinction with both speaker and addressee anchor-points. In doing so, the language draws on two sets, a ‘basic’ set which most commonly functions adnominally and situates individual entities, and an ‘adverbial’ set which situates states of affairs more generally and has meanings like ‘be here’, ‘be there near you’, etc. (though they are not true verbs in the sense of being able to be used alone).
We will now see that, by applying these demonstratives with sentential scope, a range of engagement-type meanings can be coerced. Note that the engagement-related meanings are only a subset of the very rich range of metaphorical extensions found with the Abui demonstrative system – others, which we do not discuss, include their use to indicate tense and various kinds of modality.
Both basic and adverbial Abui demonstratives can be used in ways that are relevant to engagement. From the adverbial set (shown in the right half of Table 3), “the addressee-based forms are used when the speaker wants to evaluate or interact with addressee’s perspective” (Kratochvil, Reference Kratochvil2011a, p. 8). For example, say the addressee and the speaker are sitting in a traditional house with a leaking thatched roof. The speaker inquires whether the addressee is affected by the rain (there are no windows and it’s dark inside). Since they are together, he may simply say (18a). However, it is also possible to say (18b), using the addressee-proximal form ta to specifically invite the addressee’s assessment of the quality of the thatched roof above where the addressee is seated.
-
(18) a. anui ma o-pa=ng sei?
rain be.sp.prx 2sg.recipi-touch.ipf=see come.down.cont
‘Is it raining on you here?’
-
b. anui ta o-pa=ng sei?
rain be.ad.prx 2sg.recipi-touch.ipf=see come.down.cont
‘Is it raining on you here (where you are)?’
The addressee-based medial fa is used to indicate non-proximate location with respect to the addressee. Typically, this occurs when “the speaker wants to stress that the addressee is in another place or not aware of the location of an event or participant” (Kratochvil, Reference Kratochvil2011a, p. 9). For example, in performing a ‘matching task’, the speaker may be describing a picture to the addressee, so that he can match the description to a picture in the set he was given. Here “the speaker uses fa to locate two balls on the picture that the addressee is unable to see”:
-
(19) kaan-r-i, bal do fa ayoku
good.cpl=reach-pfv ball sp.prx be.ad.med be.two
‘right, there are two balls there’
[perhaps a closer translation would be ‘right, these balls (i.e., “these” on my picture) there’s (a picture) there (on your side) (where) there are two (of them)’]
At a higher syntactic and semantic level, members from the basic set can be placed in sentence-final position to index the distribution and extent of knowledge among speech-act participants. Speaker-proximal do can stress the speaker’s foundation for his assertion in immediate experience (20):
-
(20) na nala nee-ti beeka do
1sg.a something.eat-phsl bad cannot sp.prx
‘I couldn’t eat up (swallow) anything.’
In questions, the addressee-based medial form can be used to appeal to what they may know of a situation, while the addressee-proximate form, if used with exclamatory force, can indicate that the question is redundant and that the information should be available to the addressee, thus functioning as a reproach – invoking both a type of evidence (perception) and a judgment about what the addressee could vs. did perceive. This is reminiscent of the Duna -noko suffix discussed above.
-
(21) A: mangmat, ma e-ya yo ?
foster.child sp.prx 2sg-mother ad.med
‘Child, what about your mother?’
B: ni-ya ha-rik to !
1pl.exc-mother 3patient-hurt ad.prx
‘My mother is sick (as you could see).’
The addressee-medial form, likewise, may be used in sentence-final position in a reproachful way – in this context, “the speaker stresses that the addressee knew about the funeral and yet failed to attend” (Kratochvil, Reference Kratochvil, Yap, Grunow-Hårsta and Wrona2011b, p. 773).
-
(22) pi yaar-i ni-ya do nabuk yo
1pl.inc go-pfv 1pl.exc-mother sp.prx bury ad.med
‘We went to bury our mother (as you could have known).’
The essence of the Abui system of recycling demonstratives is thus to shift their function upward, from coordinating attention to objects, in their basic use, to coordinating attention to states of affairs and their epistemic status, in the extended uses we have discussed (examples (20)–(22)). It is not unreasonable to see the unusual starting point of the basic system – which separates the proximal vs. medial contrast from that between speaker and addressee anchor-point – as providing an ideal semantic affordance for the extension into the more general management of epistemic gradients between speaker and hearer. Footnote 7
In the case of Abui, the demonstratives remain as separate words even as their function and syntactic position shifts to higher scopes. However, an interesting case where original demonstratives turn into verbal prefixes encoding semantic values of engagement is found in Marind, a language of Southern New Guinea (Olsson, Reference Olsson2016). In the present tense, Marind features two sets of verbal subject prefix complexes, encoding person, number, gender, and a category Olsson terms ‘absconditive’ (< Latin absconditus ‘hidden, concealed’), which are “used to establish joint attention, by instructing the Adr to ‘align’ her attention with Spr’s, and thereby get access to previously unavailable information” (p. 3). Summarising Olsson (Reference Olsson2016), the two main circumstances in which absconditive-series prefixes are used are when the speaker:
-
(i) “wants to draw attention to something outside Adr’s visual focus” (p. 1), e.g. when a speaker tells a child’s mother that the child’s nose is snotty, something the mother cannot see because the child is sitting on her lap, and
-
(ii) to “‘update common ground’ by denying Adr’s presuppositions” (p. 1), e.g. when someone tells an old woman that she should be talking Marind to the linguist so that he can learn, and the woman retorts that she is indeed doing that, using the absconditive in a way that would be translated into English as ‘I AM talking to him’ or ‘BUT I AM SO talking to him’.
What is relevant to our argument here is that the forms of absconditive prefixes can be broken down into two parts: an initial gender element, and a second deictic element identical in form to demonstratives. Interestingly, the use of the absconditive can be triggered either by the addressee’s (non-)attention to an entity, or to a state of affairs; it appears that the proximate vs. distal semantics of the deictic element is primarily exploited when the location of an entity is involved. Where the focus is on a state of affairs, the one example given by Olsson employs the form derived from the distal form.
The Marind absconditive is thus intriguingly parallel to the level-shifting trajectory we saw for Abui, but in a way takes it further by grammaticalising the deictic elements into actual prefixes on the verb. In doing this, it illustrates one grammaticalisation path by which verbs can evolve engagement morphology. What these two languages clearly demonstrate is the logical link between achieving mutual attention to objects in the here-and-now, and the more abstract job of producing convergence of epistemic positioning between speaker and addressee.
5. Conclusion
We have tried to shatter the illusion that definite reference is simple and self-evident by demonstrating how it requires mutual knowledge, which complicates matters enormously. But virtually every other aspect of meaning and reference also requires mutual knowledge, which also is at the very heart of the notion of linguistic convention and speaker meaning. Mutual knowledge is an issue we cannot avoid. It is likely to complicate matters for some time to come.
(Clark & Marshall, Reference Clark, Marshall, Joshi, Webber and Sag1981, p. 58)
The languages we have surveyed illustrate the proposition with which we began this paper: that it is possible for languages to place epistemic coordination systems right in the heart of their grammars. Languages like Andoke and Kogi have paradigmatically structured categories that show the speaker’s epistemic access, and their assessment of that of the addressee, as potentially independent variables to be monitored and appealed to as conversation unfolds. Such languages thus place, at the core of the grammatical system, the central role of dialogue as an ongoing transaction in which mutual attention and knowledge is closely monitored and repeatedly recalibrated.
The grammaticalisation of epistemic assessment is not virgin territory to linguistic investigation. There are long-standing traditions for investigating the modelling and updating of mutual knowledge that is needed to successfully use a system of definite articles (see, e.g., Clark & Marshall, Reference Clark, Marshall, Joshi, Webber and Sag1981; Epstein, Reference Epstein and Goldberg1997; Verhagen, Reference Verhagen1986) and discourse particles of an epistemic nature (e.g., Enfield, Brown, & de Ruiter, Reference Enfield, Brown, de Ruiter and de Ruiter2013; Hayano, Reference Hayano2012; Simon-Vandenbergen & Aijmer, Reference Simon-Vandenbergen and Aijmer2007; Verhagen, Reference Verhagen2005, ch. 4). There have also been a growing number of studies illustrating the ways in which speaker assessment of addressee attention can be built into demonstrative systems, as we illustrated in §5 of Part I. What has remained unclear, however, has been the way that comparable intersubjective assessment can be integrated into grammatical paradigms with scope over clauses or propositions, or even potentially evidential qualification (§3), depending on whether we characterise scope syntactically or semantically.
As in many other areas of typology, it is useful to set up canonical cases as clear conceptual reference points (cf. Brown, Chumakina, & Corbett, Reference Brown, Chumakina and Corbett2013). The systems found in Andoke (Part I, §2) and Kogi (this Part, §2.1) demonstrate with particular clarity what a canonical system of engagement with clausal scope looks like, because of the symmetry with which they independently assign positive and negative epistemic values to speaker and addressee.
On the other hand, we also find languages that exhibit only some of the characteristics of canonical engagement paradigms – just as we find departures from semantic purity in virtually every grammatical category, e.g., the much better-known dimension of tense, with its cross-linguistically variable differences in degree of structuration, from neat paradigms to relatively unintegrated free words, strung out along a grammaticalisation trajectory including more heterogeneous options such as systems that mix in periphrasis. Kakataibo (§2.2) was presented as an example in which engagement is grammaticalised in a less canonical way: it includes a number of values, on verbal inflections and second position clitics, that correspond to key values in canonical engagement systems, but compared to Andoke and Kogi they are less integrated into a single, symmetric paradigm.
The same point about variability in canonicity may be made in terms of grammaticalisation, since the emergence of one category (here: engagement) from another (e.g., demonstratives) is typically marked by phenomena exhibiting transitional or mixed status. An interesting example of this is the grammaticalisation of engagement examined for Abui in §4, which lifts the speaker vs. addressee x proximal vs. medial contrast found in its basic demonstratives and reapplies it at clausal level to produce an engagement system with propositional scope, though one in which the relevant markers (demonstratives in sentence-final position) remain transparently multifunctional without becoming a specialised grammatical system as they are in Andoke and Kogi. While the Abui case provides a good example of engagement categories appearing to have been recruited from demonstratives, it is unlikely that this is the only diachronic source: other candidates include time adverbials in Lakandon Maya Footnote 8 (Bergqvist, Reference Bergqvist2008, in press), pronominal clitics in Jaminjung/Ngaliwurru (Schultze-Berndt, Reference Schultze-Berndt2017), Footnote 9 and “ethical datives”, also called ‘non-selected arguments’ (Bergqvist & Kittilä, Reference Bergqvist and Kittilä2017; cf. Bosse, Bruening, & Masahiro, Reference Bosse, Bruening and Masahiro2012).
We have taken the canonical case of engagement as a grammatical system for encoding the relative mental directedness of speaker and addressee towards an entity or state of affairs – thus allowing knowledge and attention (etc.) to be tracked and dynamically updated in discourse. This leads naturally to the question of what is the full set of typological dimensions involved? In this paper we have focused on two – the set of permutations of epistemic authority across the speaker and addressee, and the semantic and syntactic level at which this applies – (i) deictically indicated entity (demonstratives), (ii) state-of-affairs/proposition/clause, and (iii) metaproposition in the case of certain evidentials. But other syntactic levels and semantic dimensions may also prove relevant.
One promising dimension for future investigation concerns the interaction of engagement values with tense/time. In other words, is the monitoring of relative epistemic authority/directedness confined to the here-and-now, or can it be displaced? For example, work by Fleck (Reference Fleck2007) on the Peruvian language Matses has shown that the psychological event of inferring an action from evidence can be located in time independently of the speech event or the reported event (e.g., recently or long ago, I may have seen the tracks of a peccary that crossed the path; and that path-crossing may have been immediately prior to or a long time before I saw the tracks, generating a four-way system of tensed evidentials in Matses).
A priori, we may expect the object of presumed attention or knowledge to be likewise locatable in time. We have already seen a hint of this in the contrast between Kakataibo -riapa and -riapënë, where in both cases the addressee is presumed to be unaware of what the speaker reports, but the speaker has discovered the fact at different times – earlier in the case of -riapa, vs. at the moment of speech in the case of -riapënë.
A second dimension for future investigation concerns the type of mental disposition involved. In our discussion throughout this paper we have focused on attention and knowledge. But the very rich literature on epistemic clitics and discourse markers focuses on other cognitive dispositions – particularly belief and expectation – and there is a long tradition of investigating their use as key argumentative resources to manage and overcome divergences in the belief states of speaker and addressee in the unfolding discourse, such as Foolen (Reference Foolen2003) on Dutch toch, Hayano (Reference Hayano, Stivers, Mondada and Steensig2011, Reference Hayano2012) for Japanese yo, Schwenter (Reference Schwenter1996) on Spanish independent si clauses, Wilkins (Reference Wilkins1986) for several Mparntwe Arrernte particles, Sekiguchi (1977 [1939]) for German doch and Leiss (Reference Leiss, Abraham and Leiss2012) for the German particle ja, and Matthews and Yip (Reference Matthews and Yip2011) on a variety of Cantonese particles. Evaluative attitude – like and dislike with respect to the event – as well as emotional disposition such as fear (in categories like the apprehensive) are also relevant parameters worth exploring.
In many well-known cases, such as German doch and Italian mica (Cinque, Reference Cinque1991; Visconti, Reference Visconti2009), there is a statistical bias towards an interpretation where the speaker asserts a state of affairs to hold, against a contrary belief imputed to the addressee (Er ist doch hier! ‘But he IS here!’, Non è mica freddo! ‘But it’s not cold at all’). But this alignment is not a necessary one, and it is also possible to use these particles in circumstances where the particle signals that it was the prior expectations of the speaker him or herself which turn out to be incorrect. It will now be interesting to revisit the study of particles from the perspective of more tightly structured systems of grammaticalised engagement marking, focusing on the extent to which they form tightly integrated systems patterning with the dimensions we have presented here.
Determiners of noun phrases are a third obvious dimension for the investigation of engagement, and indeed, as seen in our quote from Hawkins in Part I, §4, Footnote 10 to use the determiner system in English or similar languages the speaker “must constantly take into consideration knowledge of various kinds which he assumes his hearer to have”. We also know that determiners can “escape from the noun phrase” (Epps, Reference Epps2009, p. 87) to take scope over a clause as stance markers, like the Abui demonstratives (see also Yap, Grunow-Hårsta, & Wrona, Reference Yap, Grunow-Hårsta and Wrona2011). Now we know about the sort of four-way set of epistemic contrasts we saw earlier for Andoke and Kogi, we can ask whether such rich systems are also found in determiner systems. We can see that the English indefinite article is ambiguous between readings where the referent is not identifiable to the speaker or the addressee, vs. identifiable to the speaker but not the addressee – as with the ambiguity of ‘A man was waiting outside your door at 6 am’ between a situation where I know who the man is (following with ‘It was your brother.’), and one where I don’t know either. Some of this ambiguity can be removed with less grammaticalised means, such as ‘a certain’, as in ‘a certain colleague of mine always reacts that way’. In languages like Russian one can draw on words like nekto, to give expressions like nekto čelovek ‘a certain man, whose identity I know, but who I assume you don’t’ (cf. explication in Wierzbicka, Reference Wierzbicka1980, p. 326).
We close this paper with a final unanswered question. The sorts of epistemic management mechanisms we have illustrated, in the pointedly grammaticalised forms we have been calling engagement, have been widely investigated in the conversation analysis literature, but in the languages examined there the formal coding is much more diffuse – involving prosody, gesture, tactical restatement, or the use of epistemic particles or adverbials like well or actually. What difference does this semiotic investment make? For example, are speakers of languages with engagement markers dragooned into monitoring relative epistemic state much more frequently, even obligatorily? (A related issue, which current descriptions don’t fully resolve, is how far the marking of engagement is obligatory as opposed to strongly encouraged.) Alternatively, could the effects go the other way, with the smaller palette of a grammaticalised system offering fewer alternate ways of organising the task of epistemic management? Or could it simply make no difference – the epistemic management tasks go on being handled just the same, whether there is a grammaticalised system of engagement or not? As a next step in the research, we need studies of naturalistic conversation, closely analysed for the attentional states of both parties, across a sample of languages that includes some with canonical engagement systems. Only then can we understand the full import of these fascinating linguistic systems for the interface between grammar, intersubjectivity, and the management of interaction.
Abbreviations
1: first person, 2: second person, 3: third person, A: Actor, ABL: ablative, ABS: absolutive, AD.PRX: addressee proximal, ADR: addressee, AGT: agent, ART: article, ASSERT: assertion, ASYM: asymmetric, BENA: benefactive type A = unknown to beneficiary, BENB: benefactive type B = known to beneficiary, COMM: commitment, CONT: continuative, DAT: dative, DEF: definitive, DEM: demonstrative, DIR: directional, ENCL: enclosure, ERG: ergative, EXC: exclusive, EXP: expected knowledge, FOC: focus, FP: far past, FUT: future, GEN: genitive, HAB: habitual, IMP: imperative, IMPF: imperfective, INC: inclusive, IND: independent, INT: interrogative, INTJ: interjection, INTS: intensifier, INTSUBJ: intersubjective, IPF: imperfective, KTS, known to speaker, LOC: locative, MAL: malefactive, MIR: mirative, NAR: negative for addressee, NEG: negative, NMZ: nominaliser, O: object, OBJ: object, PERF: perfective, PFV: perfective, PHSL: phasal (roughly: ‘after’), PL: plural, POSS: possessor, POT: potential, POT.OBS, potential observation, PRES: present, PROG: progressive, PROX: proximal/proximate, PSN: personal name; PST: past, PTCP: participial, PTCY, participatory evidential, Q: question, RECIPI: recipient, REK: reported evidence known, REM: remote, REU: reported evidence previously unknown, RLZ: realization, S: intransitive subject, SBJ: subject, SG: singular, SPKR: speaker, SP.PRX: speaker proximal, SSA: seen by speaker and addressee, SUB: subjunctive, SUBJ: subjective, SYM: symmetric, UNM: unmarked.