# Approximation algorithm

In computer science and operations research, approximation algorithms are an approach to attacking difficult optimization problems. Approximation algorithms are often associated with NP-hard problems. Since it is unlikely that there can ever be efficient (Polynomial Time) exact algorithms solving NP-hard problems, one settles for non-optimal solutions, but requires them to be found in polynomial time. Unlike heuristics, which usually only find reasonably good solutions reasonably fast, one wants provable solution quality and provable run time bounds. Ideally, the approximation is optimal up to a small constant factor (say within 5% of the optimal solution). It should be noted that approximation algorithms are increasingly being used for problems where polynomial algorithms are known but are too expensive due to the sizes of the data sets.

A typical example for an approximation algorithm is the one for vertex cover in graphs: Find an uncovered edge and add both endpoints to the vertex cover, until none remain. It is clear that the resulting cover is at most twice as large as the optimal one. This is in fact a constant factor approximation algorithm with a factor of 2.

NP-hard problems vary greatly in their approximability; some, such as the bin packing problem, can be approximated within any factor greater than 1 (such approximation algorithms are often called polynomial time approximation schemes or PTAS). Others are impossible to approximate within any constant (or even polynomial) factor (unless P = NP), such as the maximum clique problem.

NP-hard problems can often be expressed as integer programs (IP) and solved exactly in exponential time. Many approximation algorithms are from the LP-relaxation of the integer program.

Not all approximation algorithms are suitable for all practical applications. They often use IP/LP/Semidefinite solvers, complex data structures or sophisticated algorithmic techniques which lead to difficult implementation problems. Also, for some approximation algorithms, the polynomial running time can be quite bad, say O(n2000). Yet study of even very expensive algorithms is not a completely theoretical pursuit as they can yield valuable insight. A classic example is the initial PTAS for Euclidean TSP due to Sanjeev Arora which had prohibitive running time, yet within a year, Arora refined the ideas into a linear time algorithm. Such algorithms are also worthwhile in some applications where the running times and cost can be justified e.g. computational biology, financial engineering, transportation planning, and inventory management. In such scenarios, they must compete with the corresponding direct IP formulations.

Another limitation of the approach is that it applies only to optimization problems and not to "pure" decision problems like satisfiability (although it's often possible to conceive optimization versions of such problems, such as the maximum satisfiability problem).

## Performance guarantees

For some approximation algorithms it is possible to prove certain properties about the approximation of the optimum result. For example, in the case of a ρ-approximation algorithm it has been proven that the approximation a will not be more (or less, depending on the situation) than a factor ρ times the optimum solution s.

$\begin{cases}s \leq a \leq \rho s,\qquad\mbox{if } \rho > 1; \\ \rho s \leq a \leq s,\qquad\mbox{if } \rho < 1.\end{cases}$

The factor ρ is called the relative performance guarantee. An approximation algorithm has an absolute performance guarantee or bounded error ε, if it has been proven that

$(s - \epsilon) \leq a \leq (s + \epsilon).$

Similarly, the absolute performance ratio $\Rho_A$ of some approximation algorithm $A$, where $I$ refers to an instance of a problem, and where $R_A(I)$ is the performance guarantee of $A$ on $I$ (i.e. $\rho$ for problem instance $I$) is:

$\Rho_A = \inf \{ r \geq 1 | R_A(I) \leq r, \forall I \}$

That is to say that $\Rho_A$ is the largest bound on the approximation ratio, $r$, that one sees over all possible instances of the problem. Likewise, the asymptotic performance ratio $R_A^\infty$ is:

$R_A^\infty = \inf \{ r \geq 1 | \exists n \in \mathbb{Z}^+, R_A(I) \leq r, \forall I, I \geq n\}$

That is to say that it is the same as the absolute performance ratio, with a lower bound $n$ on the size of problem instances. These two types of ratios are used because there exist algorithms where the difference between these two is significant.

Domination analysis provides an alternative way to analyze the quality of an approximation algorithm in terms of the rank of the computed solution in the sorted sequence of all possible solutions.

## References

• Vijay Vazirani Approximation Algorithms. ISBN 3-540-65367-8.
• Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 35: Approximation Algorithms, pp.1022–1056.
• Dorit H. Hochbaum, ed. Approximation Algorithms for NP-Hard problems, PWS Publishing Company, 1997. ISBN 0-534-94968-1. Chapter 9: Various Notions of Approximations: Good, Better, Best, and More