Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsFormsForms ProgrammingQueriesModules / DAO / VBAReports / PrintingMacrosDatabase DesignSecurityConversionImporting / LinkingSQL Server / ADPMultiuser / NetworkingReplicationSetup / ConfigurationDeveloper ToolkitsActiveX ControlsNew UsersGeneral 1General 2
Access DirectoryToolsTutorialsUser Groups
Related Topics
SQL ServerOther DB ProductsMS OfficeMore Topics ...

MS Access Forum / SQL Server / ADP / November 2005

Tip: Looking for answers? Try searching our database.

Coalescing date ranges

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Robert Morley - 28 Nov 2005 17:39 GMT
Hi all,

First, sorry for cross-posting, but as I have flexibility on how I do this,
I thought I'd post it to the two most-relevant groups.  I can do this either
server-side, using a stored procedure, or I can do it client-side using VBA.
The recordset is only a few thousand records, and we all have at least 3M
down/800k up DSL if not 10/100 LAN connections, so client-side isn't really
a huge deal.  Anyway, here's what I'm trying to do:

I'm working in an Access ADP, with SQL Server 2000 as the back-end.  I have
a recordset with the following structure (simplified for the sake of keeping
this straight-forward):

AccountID - char (4)
PersonID - int
BeginDate - smalldatetime - date only, always populated (i.e., never NULL)
EndDate - smalldatetime - date only, but NULL if person is currently
assigned to this account (i.e., NULL is treated as higher than any non-NULL
date in a range)

These records are coming to me from a different server, which tracks based
on sub-accounts, and I need to track only by the main account.  So for every
record that has the same AccountID/PersonID, I need to coalesce overlapping
date ranges into a single date range using the largest possible range--even
if it mixes the BeginDate from one record with the EndDate from
another--while not coalescing date ranges that are completely separate.
(There's an unrelated unique identifier in the final table, so multiple
AccountID/PersonID with different date ranges are fine in the final
product.)

Note:  the source records can either be dealt with as unmodifiable, or I can
dump them into a temp table to allow for modification/deletion of
records...whatever method is easiest.

So to give you an example:

A001    1    2001-Dec-02    2002-Jul-07
A001    1    2003-Jan-01    2005-Jan-31
A001    1    2004-Feb-05    NULL
A001    2    2002-Aug-01    2003-Jul-31
A001    2    2001-Jan-16    2005-Feb-28

should "coalesce" to:

A001    1    2001-Dec-02    2002-Jul-07
A001    1    2003-Jan-01    NULL    (combining two records to return largest
range, while not deleting unrelated range)
A001    2    2001-Jan-16    2005-Feb-28    (using only record with largest
range, ignoring the one with embedded range)

Ultimately, I know I can do this in VBA, though I might not use the best
method to get from point A to point B; so if anybody can provide a better
method, great...if you can provide insight into doing it entirely
server-side, even better!

Rob
Sylvain Lafontaine - 28 Nov 2005 23:31 GMT
For doing this on the SQL-Server side, the most obvious solution would be to
use a cursor with a temporary table.  If you want a SELECT only solution,
the following should fill the bill but it is a little more complicated to
understand: Table4 is the source table and IdK is its primary key:

-- All records with no overlap:
SELECT t1.IdK, t1.AccountId, T1.PersonId, 0 as Overlap, BeginDate as
MinBeginDate, EndDate as MaxEndDate
FROM Table4 t1
WHERE Not Exists (
select * from table4 t2
where (t1.idk != t2.idk)
 And (t1.AccountId = t2.AccountId)
 And (t1.PersonId = t2.PersonId)
 And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
   OR
       (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))
 )
)

-- Last record of each group of records with overlap:
UNION
SELECT t1.IdK, t1.AccountId, T1.PersonId, 1 as Overlap,
 (Select Min (BeginDate) from Table4 t2 where t1.idk >= t2.idk
 And (t1.AccountId = t2.AccountId)
 And (t1.PersonId = t2.PersonId)
 And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
   OR
       (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))
 )
) as MinBeginDate,

 (Select Case When Max(IsNull (EndDate, '12/31/9999')) = '12/31/9999' then
Null Else Max (EndDate) End

 From Table4 t2
 Where t1.idk >= t2.idk
  And (t1.AccountId = t2.AccountId)
  And (t1.PersonId = t2.PersonId)
  And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
    OR
        (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))
  )
) as MaxEndDate

FROM Table4 t1
WHERE Exists (
select * from table4 t2
where (t1.idk > t2.idk)
 And (t1.AccountId = t2.AccountId)
 And (t1.PersonId = t2.PersonId)
 And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
   OR
       (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))
 )
)

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

> Hi all,
>
[quoted text clipped - 53 lines]
>
> Rob
Robert Morley - 29 Nov 2005 16:49 GMT
I posted last night saying I'd look at this this morning, but it seems my
post never showed up.  In any event, I've looked over the SQL query, and it
makes a lot of sense.  I've used UNION queries to pull off any number of
amusing queries before, but the joins you used simply never occurred to me.
Nice bit of coding, thanks!

Rob

> For doing this on the SQL-Server side, the most obvious solution would be
> to use a cursor with a temporary table.  If you want a SELECT only
[quoted text clipped - 120 lines]
>>
>> Rob
Robert Morley - 29 Nov 2005 18:58 GMT
(Note:  I have discontinued this thread in the Access Modules/Coding group,
as it no longer applies to that group.)

Okay, I just tried it out, and it appears to work correctly at first glance,
but if I understand the algorithm correctly, the assumption is being made
that Table4 will be ordered by AccountID, PersonID, and BeginDate, and that
IdK will also increment according to that order, correct?  (I've tried not
sorting in that order, and the resulting set completely left out the records
that violated that rule.)

This isn't a problem, as I have to create a temporary table to provide an
identity field anyway (the original database I'm getting this from doesn't
seem to believe in identity values, or at least isn't exposing them to me),
I just want to be certain I'm understanding it properly.

Oh, one other oddity I came across in this query:  when I used '12/31/9999',
it rejected the date as being out-of-range for a smalldatetime (which, of
course, is incorrect), but when I used '12/31/2006', it was fine.  On
further exploration, and switching to an unambiguous format, I found that
'Jun 6, 2079' is the cut-off...'Jun 7, 2079' and beyond will return an
out-of-range error.  I don't have rights on the physical server to check the
international settings in its control panel, but I DO have full rights to
the SQL Server instance, so I can tell you that this doesn't in any way
correspond to the two-digit date cutoff there (which is 1950-2049).  Any
thoughts?

Thanks again,
Rob

> For doing this on the SQL-Server side, the most obvious solution would be
> to use a cursor with a temporary table.  If you want a SELECT only
[quoted text clipped - 120 lines]
>>
>> Rob
Robert Morley - 29 Nov 2005 19:09 GMT
Never mind the bit in the previous message about the smalldatetime.  I'm
still a little new to SQL Server and failed to notice that the
"smalldatetime data type, described" also included data on the datetime data
type, so mis-read the limits of the type as being '12/31/9999'.

Thanks,
Rob
Sylvain Lafontaine - 29 Nov 2005 19:44 GMT
Hum, no, I've made no assumption about the data beeind ordered by Account
ID, PersonID and BeginDate.  The only assumption was that the IdK column is
a primary key.

However, after taking a second look at the design, the second part of the
UNION is obviously wrong: the idea was to return only a single record (any
one of them) from a group of two or more overlaping records and it is
obvious to me now that adding the condition « (t1.idk > t2.idk)  » is not
sufficient to achieve that: it will work only when there is only two
records, not more.  By the same token, the condition (t1.idk >= t2.idk) must
also be removed from the two subqueries that are computing the MinDate and
the MaxDate

To correct this, you should add a Not Exist clause to the whole thing or use
a Max(IdK) (or a Min()) clause and make an equality comparaison with the
IdK:

First solution:

WHERE Exists (
       ....
 )

AND Not Exists (
   select * from table4 t2
   where (t1.idk < t2.idk)
   And (t1.AccountId = t2.AccountId)
   And (t1.PersonId = t2.PersonId)
   And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
       OR
      (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))

)

Or better, the second solution using the Max() or the Min() function:

WHERE t1.IdK =  (
   select Max (IdK) from table4 t2
   where (t1.AccountId = t2.AccountId)
   And (t1.PersonId = t2.PersonId)
   And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR
t1.EndDate IS NULL))
       OR
      (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR
t2.EndDate IS NULL))

)

Sorry for the bugs but I was working late and I was simply using the test
data that you provided without thinking any further.  Also, I didn't take
the time to rebuild the test table and test this solution, so maybe there is
still a bug lurking there  but feel free to post again if you have still
some trouble with this.

And don't forget to remove the (t1.idk >= t2.idk) from the two subqueries.

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

> (Note:  I have discontinued this thread in the Access Modules/Coding
> group, as it no longer applies to that group.)
[quoted text clipped - 149 lines]
>>>
>>> Rob
Sylvain Lafontaine - 29 Nov 2005 19:48 GMT
Well, I'm really tired: in the second solution with the Max() function, you
must also add the condition (t1.idk != t2.idk) to retrieve only records with
a least one overlap with another record.

Sorry again, I think that I will go to sleep early this evening.

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

> (Note:  I have discontinued this thread in the Access Modules/Coding
> group, as it no longer applies to that group.)
[quoted text clipped - 149 lines]
>>>
>>> Rob
Robert Morley - 29 Nov 2005 20:06 GMT
> Sorry again, I think that I will go to sleep early this evening.

Hehehe...as you can see from my other posts, I think that should apply to
both of us! :)

Rob
Robert Morley - 29 Nov 2005 21:38 GMT
I tried that adding t1.IdK != t2.IdK, but that just returns an empty
recordset every time, so maybe I'm misunderstanding something?  I've tried
playing with the logic, but just can't seem to get it to work no matter what
I try.

Just to check a worst-case scenario, here's the "ultimate" test recordset.
It contains three non-overlapping ranges, where one range (the last three
records) has three separate intersecting sets, but the first & last sets do
not overlap with each other, they only overlap by virtue of the second set
joining them.

A001    1    Jan 1, 2001    Dec 31, 2001
A001    1    Jan 1, 2002    Jul 31, 2002
A001    1    Jun 1, 2002    Dec 31, 2002
A001    1    Jan 1, 2003    Jan 31, 2004
A001    1    Jan 1, 2004    Jan 31, 2005
A001    1    Jan 1, 2005    NULL

...which should collapse to...

A001    1    Jan 1, 2001    Dec 31, 2001
A001    1    Jan 1, 2002    Dec 31, 2002
A001    1    Jan 1, 2003    NULL
Sylvain Lafontaine - 30 Nov 2005 00:48 GMT
I've just take a deep look at your data and I'm very sorry to say but if the
first part (non-overlaping records) is easy, the second one is not because
of a hierarchical relation problem: the 4th and the 5th record are
overlaping and the 5th and the 6th are overlaping too but the 4th and the
5th are not; so it's much more difficult to group them then I tought at
first.

SQL-Server 2000 doesn't offer native support for recursivity (SQL-2005
does), so I've decided to give up and use my first idea of using a cursor:

CREATE PROCEDURE [dbo].[table4_test] AS

Set NoCount ON
create table #t (IdK int primary key, AccountId nchar (4), PersonId int,
BeginDate DateTime, EndDate DateTime)

declare @IdK int
declare @IdK2 int
declare @AccountId nchar(4)
declare @PersonId int
declare @BeginDate DateTime
declare @EndDate DateTime

Declare Curseur cursor for
Select IdK, AccountId, PersonId, BeginDate, EndDate
From Table4
Order By BeginDate

Open Curseur
Fetch Next From Curseur into @IdK, @AccountId, @PersonId, @BeginDate,
@EndDate

While @@Fetch_Status = 0
Begin
Select @IdK2 = (
 Select Idk
 From #T
 Where (#T.AccountId = @AccountId)
  And (#T.PersonId = @PersonId)
    And ( (#T.BeginDate <= @BeginDate AND (#T.EndDate >= @BeginDate OR
#T.EndDate IS NULL))
       OR
           (@BeginDate <= #T.BeginDate AND (@EndDate >= #T.BeginDate OR
@EndDate IS NULL))
    )
  )

if (@IdK2 is Not Null)
begin
 Update #T
 Set BeginDate = Case When BeginDate < @BeginDate Then BeginDate Else
@BeginDate End,
  EndDate = Case When EndDate is Null or @EndDate is Null Then Null When
EndDate >= @EndDate Then EndDate Else @EndDate End
 From #T
 Where #T.IdK = @IdK2
End Else
Begin
 Insert into #T
 Select @IdK, @AccountId, @PersonId, @BeginDate, @EndDate
End

Fetch Next From Curseur into @IdK, @AccountId, @PersonId, @BeginDate,
@EndDate
End

Close Curseur
Deallocate Curseur

Select  IdK, AccountId, PersonId, BeginDate, EndDate from #T
GO

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

>I tried that adding t1.IdK != t2.IdK, but that just returns an empty
>recordset every time, so maybe I'm misunderstanding something?  I've tried
[quoted text clipped - 19 lines]
> A001    1    Jan 1, 2002    Dec 31, 2002
> A001    1    Jan 1, 2003    NULL
Sylvain Lafontaine - 30 Nov 2005 00:59 GMT
Oups, I wanted to write: « the 4th and the 5th record are overlaping and the
5th and the 6th are overlaping too but the 4th and the 6th are not; »

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

> I've just take a deep look at your data and I'm very sorry to say but if
> the first part (non-overlaping records) is easy, the second one is not
[quoted text clipped - 91 lines]
>> A001    1    Jan 1, 2002    Dec 31, 2002
>> A001    1    Jan 1, 2003    NULL
Robert Morley - 30 Nov 2005 04:46 GMT
I just wrote the code in VBA earlier this afternoon to work out the kinks,
since that's what I'm more familiar with, and was going to convert it to a
Stored Procedure tomorrow...the code below will undoubtedly save me a lot of
time. :)

Je te remerci,
Rob

> I've just take a deep look at your data and I'm very sorry to say but if
> the first part (non-overlaping records) is easy, the second one is not
[quoted text clipped - 91 lines]
>> A001    1    Jan 1, 2002    Dec 31, 2002
>> A001    1    Jan 1, 2003    NULL
Robert Morley - 30 Nov 2005 19:27 GMT
Hi Sylvain (and anybody else following this thread),

I just came up with my final code, which takes a different approach from
yours, but seems to work very well.  It parses the original 5023-record set
into the final set of 4643 records in about 2-3 seconds.

The code below is what I ultimately used.  Note that this code includes
additional fields, since I had simplified my original recordset for the sake
of example.  If you spot any bugs, or have any suggested improvements to the
function below, I'd love to hear them.

The basic logic is to initialize one record, move to the next record, and
compare the current record in the set with the previous one.  If they're
within the same subset (AccountID, TypeID, and ContactID are equal to the
previous record), compare the start date of the current date with the end
date of the previous one (we don't need to compare start dates, since
they're already sorted in order and the current one is guaranteed to be >=
the previous one).  If the dates intersect, update the end date to be the
later of the two dates.  If the dates don't intersect, or we've moved into a
new subset, write the saved values out to the table, update the saved values
appropriately and continue.

A final note, for anybody wondering, at this point GetsInterviewReports and
GetsAccountReports can be looked up based on TypeID, so we know we only need
to track them at the subset level...it was simply convenient to include them
in this recordset rather than doing lookups/joins later on.

CREATE FUNCTION dbo.DDSTRepTable ()
RETURNS @AcctTeam table (
   AccountID char(4),
   TypeID tinyint,
   ContactID smallint,
   GetsInterviewReports bit,
   GetsAccountReports bit,
   StartDate smalldatetime,
   EndDate smalldatetime
)
BEGIN
   DECLARE @AccountID char(4)
   DECLARE @AccountIDSave char(4)
   DECLARE @TypeID tinyint
   DECLARE @TypeIDSave tinyint
   DECLARE @ContactID smallint
   DECLARE @ContactIDSave smallint
   DECLARE @IntRpts bit
   DECLARE @IntRptsSave bit
   DECLARE @AcctRpts bit
   DECLARE @AcctRptsSave bit
   DECLARE @StartDate smalldatetime
   DECLARE @StartDateSave smalldatetime
   DECLARE @EndDate smalldatetime
   DECLARE @EndDateSave smalldatetime
   DECLARE DDSTRepCursor cursor for
       SELECT AccountID, TypeID, ContactID, InterviewReports,
AccountReports, StartDate, EndDate
       FROM DDSTRepTeam
       ORDER BY AccountID, TypeID, ContactID, StartDate

   OPEN DDSTRepCursor
   FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID,
@IntRpts, @AcctRpts, @StartDate, @EndDate
   IF @@FETCH_STATUS = 0
   BEGIN
       SET @AccountIDSave = @AccountID
       SET @TypeIDSave = @TypeID
       SET @ContactIDSave = @ContactID
       SET @IntRptsSave = @IntRpts
       SET @AcctRptsSave = @AcctRpts
       SET @StartDateSave = @StartDate
       SET @EndDateSave = @EndDate
       FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID,
@IntRpts, @AcctRpts, @StartDate, @EndDate
   END

   WHILE @@FETCH_STATUS = 0
   BEGIN
       IF (@AccountID = @AccountIDSave) AND (@TypeID = @TypeIDSave) AND
(@ContactID = @ContactIDSave)
           IF (@StartDate < ISNULL(@EndDateSave, 'Jun 6, 2079'))
           BEGIN
               IF ISNULL(@EndDate, 'Jun 6, 2079) > @EndDateSave SET
@EndDateSave = @EndDate
           END --Necessary only so the above 'IF' isn't associated with the
'ELSE' below.
           ELSE
           BEGIN
               INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave,
@ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave
               SET @StartDateSave = @StartDate
               SET @EndDateSave = @EndDate
           END
       ELSE
       BEGIN
           INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave,
@ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave
           SET @AccountIDSave = @AccountID
           SET @TypeIDSave = @TypeID
           SET @ContactIDSave = @ContactID
           SET @IntRptsSave = @IntRpts
           SET @AcctRptsSave = @AcctRpts
           SET @StartDateSave = @StartDate
           SET @EndDateSave = @EndDate
       END
       FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID,
@IntRpts, @AcctRpts, @StartDate, @EndDate
   END
   INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave,
@ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave

   CLOSE DDSTRepCursor
   DEALLOCATE DDSTRepCursor
   RETURN
END
Sylvain Lafontaine - 30 Nov 2005 20:02 GMT
Yes, you're right: ordering the values by time date make it unecessary to
look at more than a single previous record.

The only (small) bug that I've found in your code is the possibility of a
source with 0 records: in this case, the last INSERT will insert a record
with all null values and this record will be returned as the result instead
of returning an empty result.

Signature

Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF

> Hi Sylvain (and anybody else following this thread),
>
[quoted text clipped - 109 lines]
>    RETURN
> END
Robert Morley - 30 Nov 2005 20:34 GMT
Good catch, thanks!

In theory, that'll never be a problem (I sure as !$#% HOPE we never have 0
employees, anyway), but I'll make the adjustment just to be safe.

Thanks yet again,
Rob

> Yes, you're right: ordering the values by time date make it unecessary to
> look at more than a single previous record.
[quoted text clipped - 117 lines]
>>    RETURN
>> END
Robert Morley - 29 Nov 2005 18:59 GMT
Note:  I have discontinued replying to this group, as it clearly no longer
applies to Access Modules/Coding.

Rob
Robert Morley - 29 Nov 2005 19:07 GMT
Oh crap, I'm an idiot.  Please ignore the above message for the SQL Server
newsgroup. <blush>

*MODERATORS*:  If you happen to notice this message, please delete this post
and the one above it for the SQL Server newsgroup only!

Rob
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.