We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Machine learning has exhibited substantial success in the field of natural language processing (NLP). For example, large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, at the same time, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack and give guarantees over their output. Computer vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms in their own right, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification, and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline that emerges from the progress in the field to date. Our contributions are twofold. First, we propose a general methodology to analyse the effect of the embedding gap – a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Second, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.
We present PCFTL (Probabilistic CounterFactual Temporal Logic), a new probabilistic temporal logic for the verification of Markov Decision Processes (MDP). PCFTL introduces operators for causal inference, allowing us to express interventional and counterfactual queries. Given a path formula φ, an interventional property is concerned with the satisfaction probability of φ if we apply a particular change I to the MDP (e.g., switching to a different policy); a counterfactual formula allows us to compute, given an observed MDP path τ, what the outcome of φ would have been had we applied I in the past and under the same random factors that led to observing τ. Our approach represents a departure from existing probabilistic temporal logics that do not support such counterfactual reasoning. From a syntactic viewpoint, we introduce a counterfactual operator that subsumes both interventional and counterfactual probabilities as well as the traditional probabilistic operator. This makes our logic strictly more expressive than PCTL⋆. The semantics of PCFTL rely on a structural causal model translation of the MDP, which provides a representation amenable to counterfactual inference. We evaluate PCFTL in the context of safe reinforcement learning using a benchmark of grid-world models.
The ‘linguistic turn’ in twentieth-century philosophy is reflected through Neurath’s writings of his British period. He responded to serious criticism that Bertrand Russell made in his book An Inquiry into Meaning and Truth, developing the physicalism of the Vienna Circle into a cautious approach to ‘terminology’. Neurath revealed details of his index verborum prohibitorum, a list of ‘dangerous’ words to be avoided due to their misleading and metaphysical connotations. However, Neurath was resistant to the formalist tendencies evident in the work of Vienna Circle associates, in particular Carnap’s development of semantics. Their disagreement on the matter is examined through their prolific correspondence of the 1940s. While Neurath is often portrayed as losing this battle, we discuss how his own approach to the philosophy of language (including his ‘terminology’ project) prefigured the later development of ‘ordinary language philosophy’ to a certain extent.
We present a practical verification method for safety analysis of the autonomous driving system (ADS). The main idea is to build a surrogate model that quantitatively depicts the behavior of an ADS in the specified traffic scenario. The safety properties proved in the resulting surrogate model apply to the original ADS with a probabilistic guarantee. Given the complexity of a traffic scenario in autonomous driving, our approach further partitions the parameter space of a traffic scenario for the ADS into safe sub-spaces with varying levels of guarantees and unsafe sub-spaces with confirmed counter-examples. Innovatively, the partitioning is based on a branching algorithm that features explainable AI methods. We demonstrate the utility of the proposed approach by evaluating safety properties on the state-of-the-art ADS Interfuser, with a variety of simulated traffic scenarios, and we show that our approach and existing ADS testing work complement each other. We certify five safe scenarios from the verification results and find out three sneaky behavior discrepancies in Interfuser which can hardly be detected by safety testing approaches.
Climate finance remains a relatively small part of the global finance market but is becoming increasinglt prominent; it is expected that all global finance will take on climate characteristics, causing climate standards and verification to become common. This chapter explores the current climate finance markets, the standards and other market infrastrucutre that have developed.
Delegated computation is a two-party task where there is a large asymmetry between the two parties: on the one hand, Alice would like to execute a quantum computation, but she does not have a powerful enough quantum computer to execute it. On the other hand, Bob has a quantum computer, but he is not trusted by Alice. Can Alice make sure that Bob executes her computation correctly for her? In this chapter we present three very different approaches to this problem. Each of the approaches is based on a different model for quantum computation, and the chapter also serves as an introduction to these models.
In previous work, summarized in this paper, we proposed an operation of parallel composition for rewriting-logic theories, allowing compositional specification of systems and reusability of components. The present paper focuses on compositional verification. We show how the assume/guarantee technique can be transposed to our setting, by giving appropriate definitions of satisfaction based on transition structures and path semantics. We also show that simulation and equational abstraction can be done componentwise. Appropriate concepts of fairness and deadlock for our composition operation are discussed, as they affect satisfaction of temporal formulas. We keep in parallel a distributed and a global view of composed systems. We show that these views are equivalent and interchangeable, which may help our intuition and also has practical uses as, for example, it allows global-style verification of a modularly specified system. Under consideration in Theory and Practice of Logic Programming (TPLP).
Here is a brief introduction to Ayer's radical criticism of religious belief. According to Ayer, a sentence like ‘God exists’ doesn't assert something false; rather, it fails to assert anything at all.
Verification and falsification are standard techniques for the evaluation of truth claims. Both are problematic because they rest on shifting understandings of these concepts and their operationalization. Science as a practice offers an alternative and more sophisticated approach to assessment.
From the earliest days of the nuclear age, the issue of verification has plagued efforts to restrain the development, testing, and deployment of nuclear weapons and to ensure their destruction. It continues to do so. Especially sensitive are on-site inspections, but they have proved their worth in disarmament treaties since the 1980s and the last years of the Cold War. This chapter looks at verification thematically, by reference to testing, non-proliferation, disarmament, and deployment of nuclear weapons.
Agent-based social simulations have historically been evaluated using two criteria: verification and validation. This article questions the adequacy of this dual evaluation scheme. It claims that the scheme does not conform to everyday practices of evaluation, and has, over time, fostered a theory-practice gap in the assessment of social simulations. This gap originates because the dual evaluation scheme, inherited from computer science and software engineering, on one hand, overemphasizes the technical and formal aspects of the implementation process and, on the other hand, misrepresents the connection between the conceptual and the computational model. The mismatch between evaluation theory and practice, it is suggested, might be overcome if practitioners of agent-based social simulation adopt a single criterion evaluation scheme in which: i) the technical/formal issues of the implementation process are tackled as a matter of debugging or instrument calibration, and ii) the epistemological issues surrounding the connection between conceptual and computational models are addressed as a matter of validation.
The success of any arms control treaty generally depends on its ability to achieve its primary objectives and intended outcomes. At the heart of measuring such success are effective compliance criteria and verification mechanisms. This includes the ability to apply metrics to assess tangible outcomes and measurable outputs and benchmarks of achievement, including on-site visits. In relation to nuclear issues, this also means that verification of both the non-diversion of nuclear material from declared peaceful activities (i.e., correctness of conduct), and the absence of undeclared or clandestine nuclear activities in a particular state (i.e., completeness in following treaty terms).
Nothing about developing and implementing a treaty on the prohibition of nuclear weapons is easy. While supporters of the TPNW undoubtedly claim a victory in its coming into being, its opponents note its shortcomings warning of the adverse and dire consequences. The degree to which such concerns will materialize remains to be seen. What is certain, however, is that the adoption of the TPNW has marked the beginning of a new schism in the international community. The word schism is appropriate in this context, loosely defined as “a split or division between strongly opposed sections or parties, caused by differences in opinion or belief.”
The Treaty on the Prohibition of Nuclear Weapons 2017 marks an important development in nuclear arms control law, diplomacy and relations between states. Adopted by the UN General Assembly on July 7, 2017, it was supported by 122 nations, representing a potential disruptor to the nuclear status quo. It is the first treaty to ban nuclear weapons outright, taking a clear humanitarian approach to disarmament. Despite its success in coming to fruition, however, it is not celebrated by all nations. The permanent members of the UN Security Council neither participated in its negotiations, nor adopted the final text. No state with nuclear weapons endorses the Treaty and indeed they openly oppose its very existence.
Concolic testing is a popular software verification technique based on a combination of concrete and symbolic execution. Its main focus is finding bugs and generating test cases with the aim of maximizing code coverage. A previous approach to concolic testing in logic programming was not sound because it only dealt with positive constraints (by means of substitutions) but could not represent negative constraints. In this paper, we present a novel framework for concolic testing of CLP programs that generalizes the previous technique. In the CLP setting, one can represent both positive and negative constraints in a natural way, thus giving rise to a sound and (potentially) more efficient technique. Defining verification and testing techniques for CLP programs is increasingly relevant since this framework is becoming popular as an intermediate representation to analyze programs written in other programming paradigms.
Chapter 3 explores the promises and contradictions inherent in the information drawn from these local and global knowledge networks. There were tensions that were never quite resolved between the production of locally relevant knowledge that rejected theoretical approaches and a global intellectual movement that praised universal knowledge. The Economic Society responded to this by carefully negotiating the sources of knowledge which it received from its networks, especially on the topics of natural history and medical botany, and building up its own epistemologies and definitions of practical Enlightenment that made the local applicability of any information the ultimate test of its value. Frameworks of knowledge with universal aspirations, such as Linnaean taxonomy, were not welcome when local descriptions would be more translateable within Central America. I argue that these stubbornly local conceptualisations of knowledge became problematic when a comparison with other places was required, for instance in the context of attempting to export plants from Guatemala to other places, and in debating the merits of plantain trees with scholars in other parts of the empire.
This paper clarifies, revises, and extends the account of the transmission of truthmakers by core proofs that was set out in chap. 9 of Tennant (2017). Brauer provided two kinds of example making clear the need for this. Unlike Brouwer’s counterexamples to excluded middle, the examples of Brauer that we are dealing with here establish the need for appeals to excluded middle when applying, to the problem of truthmaker-transmission, the already classical metalinguistic theory of model-relative evaluations.
Arms control and disarmament are among the unfulfilled promises of the UN Charter. Alongside the establishment of an International Peace Force, a Standing Committee on Disarmament should oversee a binding and staged process of universal disarmament, leaving only those arms needed for ensuring internal security. This would require global monitoring and verification, an avoidance of destabilizing forces and building trust among countries. There are both positive recent developments in arms control, and a regression towards arms build-ups eroding the accomplishment of past disarmament proposals. The prevention and abolition of war should be a central focus of renewed global governance, required by fundamental changes in the nature of armed conflict, threats from new technologies and the involvement of new actors beyond states. There are also new capacities for collective good. Proposals for modern comprehensive disarmament must go beyond the simple destruction of weapons to include the adaptation and reconversion of all the economic resources, infrastructure and human resources presently devoted to military forces and the arms industry. Many obstacles are acknowledged and will have to be overcome, but eliminating the anachronism of war will free enormous resources for other, more constructive uses.
This study examined the activation of first language (L1) translations in second language (L2) word recognition in a lexical decision task. Test materials included English words that differed in the frequency of their Chinese translations or in their surface lexical frequency while other lexical properties were controlled. Chinese speakers of English as a second language of different proficiencies and native speakers of English were tested. Native speakers produced a reliable lexical frequency effect but no translation frequency effect. English as a second language speakers of lower English proficiency showed both a translation frequency effect and a lexical frequency effect, but those of higher English proficiency showed a lexical frequency effect only. The results were discussed in a verification model of L2 word recognition. According to the model, L2 word recognition entails a checking procedure in which activated L2 words are checked against their L1 translations. The two frequency effects are seen to have two different loci. The lexical frequency effect is associated with the initial activation of L2 lemmas, and the translation frequency effect arises in the verification process. Existing evidence for verification in L2 word recognition and new issues this model raises are discussed.