paper-celeriac.tex


\subsection{Observable Contracts}
\label{sec:celeriac}

This subsection reports on an analysis of the observable behaviors
of the subject programs.
%
In the absence of written documentation and contracts, developers
often infer the ``contract'' for a program by reasonably generalizing
the program's behavior.

%To guide our analysis we posed the following three questions which
%build on the results for written specification in the previous section:

%\begin{research-question}
%  To what extent are observable behaviors domain- or
%  application-independent?
%\end{research-question}

%From previous work on pluggable types~\cite{DietlDEMS2011} (and
%corroborated in Section~\ref{sec:survey}), we expect the distribution
%of behaviors to be tied to the application-domain.

%\begin{research-question}
%  Are Code Contracts able to concisely express commonly observable
%  behaviors?
%\end{research-question}

%Since Code Contracts are written as method calls at the beginning of
%a method, they can be verbose for short properties (e.g.,
%nullness). Additionally, the limits of C\#'s type inference mean that
%expressing contracts involving the result of a method are verbose
%since the return expression must be parameterized by the method's
%return type.
%
%The Code Contracts team recently introduced ``Argument Validator'' and
%``Contract Abbreviator'' methods to make expressing common patterns
%more concise; we wanted validate their impact on expressing common
%program behaviors.
\  \\
From the data in Section~\ref{sec:survey} (and our own experience), we
hypothesized that developers disproportionately write generic, or
simple, generic contracts. Additionally, Polikarpova et al. note that
developers are typically worse at writing postconditions than
preconditions~\cite{Polikarpova2012}.

\subsubsection{Methodology}

We exercised the behavior of Quick Graph using its unit tests; Sando
using its integration test suite; Mishra Reader by using the
application; and Labs by running the Labs for the Rxx
project~\footnote{https://rxx.codeplex.com/}.
\\ \\
%As a proxy for the behaviors that a developer could observe by calling
%an API, we used the Daikon invariant detector~\cite{daikon}
%to infer invariants. Daikon takes as input one or more execution
%traces, and reports data properties that were true over the observed
%executions.
\\ \\
Daikon employs statistical methods (e.g., minimum support and
confidence heursistics) to infer likely method preconditions, method
postconditions, and object invariants.
\\ \\
The contracts that Daikon infers are sound with respect to the
observed executions --- i.e., it does not infer any properties that
are falsified by one of the observed traces.
%
%We address the shortcomings of Daikon as they relate to this study in
%Section~\ref{sec:celeriac-threats}.

\subsubsection{Abstract Type Inference}
\label{sec:comparability}

By default, Daikon attempts to compare all values that have the same
internal representation in Daikon. Since reference types are all
represented in Daikon using their hashcode, Daikon will report
comparisons between every reference value (including those in
violation of the language's typing rules).
\\ \\
To reduce the number of spurious contracts reported, Daikon supports
comparability sets for the expressions at each program point (object,
method entry, and method exit).
\\ \\
A comparability set defines a group of related variables. For example,
in a program with \verb|int|s, some variables may represent (have an
abstract type of) months, days, or years. Given comparability
information, Daikon would only output invariants relating expressions
with the same abstract type.
\\ \\
Existing Daikon comparability analyses for Java and C/C++ programs are
dynamic, recording variable interactions at runtime
(e.g., DynComp\cite{Guo2006}). For Celeriac, we opted to implement a
conservative static comparability analysis using the Common Compiler
Infrastructure: Code Model and AST
API\footnote{https://cciast.codeplex.com/}. The analysis works in
three steps:

\begin{enumerate}
\item For each method, calculate which expressions (fields,
  parameters, and return values) are used together in a binary
  operation or assignment statement.
\item Until a fix point is reached, for each callsite, update the
  caller's comparability information using information from the method
  being called (the callee).
\item For each type, calculate the comparability set by merging the
  comparability sets of its methods.
\end{enumerate}

For calls to external assemblies (i.e., methods that don't have a
comparability summary), the analysis assumes that all method arguments
with compatible types are in the same comparability set.
\\ \\
Two types are considered to be compatible if either either type is
assignable to the other.

\subsubsection{Results}
Results from the analysis are presented in Table 3 below, using the same programs described in Section \ref{sec:survey}.
%\subsubsection{Domain-Independence}
%
%Table~\ref{table:celeriac} shows the inferred behavioral properties
%across three projects using scheme similar to that used for
%Table~\ref{table:cc-usage-table}.
%
%The results indicate a higher ratio of application-specific behaviors
%(e.g., expression comparison, membership, universal quantification,
%and implication) than as recorded by developers.
%
%However, the absolute number of general behavioral properties still
%represents a sizeable amount of annotation effort. For example, there
%are a large number of equalities that a develop could potentially
%specify as a postcondition for a method.

\begin{center}\begin{table}{\small \input{celeriac-table}}
  \caption{Behavioral properties dynamically inferred by Daikon; the
    contents of each category are mutually exclusive.  The Mishra
    Reader View Models component and Sando Indexer component are the subjects of the developer case study.     
 %  in Section~\ref{sec:observe}.
%    %
%    \todo{The NullOrBlank, Non-Empty, Bounds Check, and Indicator
%      categories need to be fixed.}
    }
\end{table}
\end{center}
\label{table:celeriac}

%\subsubsection{Expressivity}
%To get an intuition for how often different methods exhibit the same
%behavior, we calculated the frequency at which
%properties were found to hold for multiple methods within a class
%(Figure~\ref{abbrev-histogram}). Properties appearing for every method
%in a class are not shown, as they would be inferred to be an object
%invariant.
%
%This data does not directly validate the benefit of abbreviator
%methods, however. The value of an abbreviator is related to both the
%number of contracts specified in the grouping, and the number of
%methods for which the grouping applies.
%
%Finding the optimal set of method abbreviators is not straight-forward
%since an abbreviator method can refer to another abbreviator method.
%To approximate the expected benefit, calculated the savings (in terms
%of number of contracts and abbreviator calls written) when greedily
%introducing abbreviators. \todo{Results}

%\begin{figure}
%    \includegraphics[width=0.5\textwidth]{study/celeriac/data/abbrev-histogram}
%\caption{A histogram showing the frequency at which instance-expression properties are
%  inferred for one or more methods in a class (for AutoDiff).
%  Properties appearing for every method are not included, as they are
%  reported as object invariants.
%  %
%  If two or more properties co-occur for multiple methods, the
%  developer can group the properties using a Contract Abbreviator
%  method.}
%\label{fig:abbrev-histogram}
%\end{figure}
%\begin{figure}
%  \includegraphics[width=0.5\textwidth]{study/celeriac/data/impact-histogram}
%  \caption{A histogram showing the ratio of data behavior properties
%    that can be eliminated for a project by greedily creating Contract
%    Abbreviator methods (for AutoDiff). The data indicates that for
%    most classes abbreviator methods provide no benefit, but for some
%    classes abbreviator methods can greatly reduce the of contracts
%    needed to express inferred behaviors.}
%\label{fig:impact-histogram}
%\end{figure}

\subsubsection{Difference Between Contracts and Observed Behavior}
Comparing Tables 2 and 3 shows some stark differences between the types of contracts developers included and the types of contracts they could include based on the observed behavior. There are almost no implication contracts included in the subject programs, but there are many possible for each type as possible conditions for each program. These post-conditions implications are useful because they can describe precisely the function of the method. \texttt{return > -1 => x is in this.elements[]} is an example of such a post-condition that describes the return condition of the binary search method. As well, there ForAll invariants were detected for all programs, especially MishraReader and SandoIndexer, but weren't included frequently (or at all for the two previous mentioned projects). These present another opportunity for developers to add contracts. From our contact with the MishraReader developer, he did not even know that such contracts were possible.
\\ \\
We believe there is a significant opportunity for developers to improve their use of Code Contracts, and the programs overall, by utilizing the Celeriac tool to help identify contracts they didn't realize themselves.
%\subsubsection{Threats to Validity}
%\label{sec:celeriac-threats}
%
%\paragraph{Contract Precision}
%
%In \cite{NimmerE02:ISSTA}, Nimmer and Ernst reported that contracts
%inferred from even small test suites were relatively precise, with
%less than 10\% on average being incorrect (from a verification
%perspective). Polikarpova et al. later reported in
%~\cite{Polikarpova2009} that one third of contracts inferred by Daikon
%were incorrect or irrelevant for larger software written in Eiffel.
%
%The Eiffel trace-generator, however, does not perform comparability
%analysis.

%Daikon's object-oriented support is limited to object invariants -- it
%does not incorporate inheritance information, and therefore does not
%consider the implication of contracts on behavioral subtyping. As,
%Csallner and Smaragdakis point out inheritance can cause Daikon to
%output internally inconsistent contracts~\cite{Csallner06}.

%\paragraph{Contract Recall}

%Since Daikon works by instantiating contract templates and
%invalidating the templates against the trace, it's output is directly
%tied to the ``grammar'' of contracts provided.

%Additionally, since Daikon only calls methods that are known to be
%pure, missing pure methods would result in contracts involving those
%methods being missed. Celeriac's purity assumptions are conservative,
%only including auto-generated property getters and methods annotated
%with the \verb|Pure| annotation (only libraries annotated with Code
%Contracts include the \verb|Pure| annotation.