使用插值在序列中查找遗漏的值

　　在有些应用程序中，序列化值的范围能够连续是重要的。当在这个序列新增值的时候，它可能会在序列中优先挑选一个“空的（hole）”或者遗漏的值来填充，而不是取序列最大值之后的下一个值，或者使用一个序号产生器。例如电话号码、社会安全号码和IP地址都有非常严格的范围限制，所以说最好使用未用的号码来填充。
　　
　　当数据库中有很多值的时候经常会出现这个问题。遗漏的值明显是不被索引的，所以定位遗漏的数字的最明显的方法是顺序查询所有的值，然后在遇到期望的值的时候将其返回：
　　
　　REM - table in question is "massive" with a sequence "id"
　　set serveroutput on;
　　set timing on;
　　declare
　　　　l_id　　　　integer := null;
　　begin
　　　　for row in (select id from massive order by id) loop
　　　　　　if l_id is null then
　　　　　　　　l_id := row.id;
　　　　　　else
　　　　　　　　l_id := l_id + 1;
　　　　　　end if;
　　　　　　exit when l_id != row.id;
　　　　end loop;
　　　　dbms_output.put_line('lowest missing value = '||l_id);
　　end;
　　/
　　
　　另外一个方法是使用“分而治之”的策略。如果我们已经知道在一个给定的范围内应该有多少个值，并且表上建有索引的话，我们可以相当快地取得实际的记录的条数。如果有一个遗漏的值，实际的记录条数将会比期望的记录的条数少。
　　
　　我们可以应用相同的方法检测在较小的一半是否有遗漏的值。如果有的话，就继续将其对分。如果没有遗漏的值，我们检测另外一半。最后，我们将要检测的记录集合的大小能够正好只找出遗漏的数据：
　　
　　下面是这种技术的PL/SQL示例：
　　
　　set serveroutput on
　　declare
　　　　l_min　　　　　　　 integer;
　　　　l_max　　　　　　　 integer;
　　　　actual_count　　　　integer;
　　　　expected_count　　　integer;
　　　　half　　　　　　　　integer;
　　begin
　　　　select max(id),min(id),count(*)
　　　　　into l_max,l_min,actual_count
　　　　　from massive;
　　　　expected_count := l_max - l_min + 1;
　　　　if expected_count = actual_count then
　　　　　　dbms_output.put_line('there are no missing values');
　　　　end if;
　　　　while l_max - l_min >= 1 loop
　　　　　　-- try lower half of range
　　　　　　half := trunc(expected_count/2);
　　　　　　expected_count := expected_count - half;
　　　　　　select count(*)
　　　　　　　into actual_count
　　　　　　　from massive
　　　　　　 where id between l_min and l_max - half;
　　　　　　exit when actual_count = 0;
　　　　　　if actual_count = expected_count then
　　　　　　　　-- missing value must be in upper half
　　　　　　　　l_min := l_min + half;
　　　　　　else
　　　　　　　　l_max := l_max - half;
　　　　　　end if;
　　　　end loop;
　　end;
　　/
　　
　　对于具有一百万条记录，并且有一个正规索引的表，如果其中漏值只有一个话，有非正式的结果表明第二个脚本的执行速度是第一个的五倍。

上一篇：外部程序使一切变得简单下一篇：使用XMLTransform和XSL-FO格式化报表