MS Access Forum / SQL Server / ADP / November 2005
Coalescing date ranges
|
|
Thread rating:  |
Robert Morley - 28 Nov 2005 17:39 GMT Hi all,
First, sorry for cross-posting, but as I have flexibility on how I do this, I thought I'd post it to the two most-relevant groups. I can do this either server-side, using a stored procedure, or I can do it client-side using VBA. The recordset is only a few thousand records, and we all have at least 3M down/800k up DSL if not 10/100 LAN connections, so client-side isn't really a huge deal. Anyway, here's what I'm trying to do:
I'm working in an Access ADP, with SQL Server 2000 as the back-end. I have a recordset with the following structure (simplified for the sake of keeping this straight-forward):
AccountID - char (4) PersonID - int BeginDate - smalldatetime - date only, always populated (i.e., never NULL) EndDate - smalldatetime - date only, but NULL if person is currently assigned to this account (i.e., NULL is treated as higher than any non-NULL date in a range)
These records are coming to me from a different server, which tracks based on sub-accounts, and I need to track only by the main account. So for every record that has the same AccountID/PersonID, I need to coalesce overlapping date ranges into a single date range using the largest possible range--even if it mixes the BeginDate from one record with the EndDate from another--while not coalescing date ranges that are completely separate. (There's an unrelated unique identifier in the final table, so multiple AccountID/PersonID with different date ranges are fine in the final product.)
Note: the source records can either be dealt with as unmodifiable, or I can dump them into a temp table to allow for modification/deletion of records...whatever method is easiest.
So to give you an example:
A001 1 2001-Dec-02 2002-Jul-07 A001 1 2003-Jan-01 2005-Jan-31 A001 1 2004-Feb-05 NULL A001 2 2002-Aug-01 2003-Jul-31 A001 2 2001-Jan-16 2005-Feb-28
should "coalesce" to:
A001 1 2001-Dec-02 2002-Jul-07 A001 1 2003-Jan-01 NULL (combining two records to return largest range, while not deleting unrelated range) A001 2 2001-Jan-16 2005-Feb-28 (using only record with largest range, ignoring the one with embedded range)
Ultimately, I know I can do this in VBA, though I might not use the best method to get from point A to point B; so if anybody can provide a better method, great...if you can provide insight into doing it entirely server-side, even better!
Rob
Sylvain Lafontaine - 28 Nov 2005 23:31 GMT For doing this on the SQL-Server side, the most obvious solution would be to use a cursor with a temporary table. If you want a SELECT only solution, the following should fill the bill but it is a little more complicated to understand: Table4 is the source table and IdK is its primary key:
-- All records with no overlap: SELECT t1.IdK, t1.AccountId, T1.PersonId, 0 as Overlap, BeginDate as MinBeginDate, EndDate as MaxEndDate FROM Table4 t1 WHERE Not Exists ( select * from table4 t2 where (t1.idk != t2.idk) And (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL)) ) )
-- Last record of each group of records with overlap: UNION SELECT t1.IdK, t1.AccountId, T1.PersonId, 1 as Overlap, (Select Min (BeginDate) from Table4 t2 where t1.idk >= t2.idk And (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL)) ) ) as MinBeginDate,
(Select Case When Max(IsNull (EndDate, '12/31/9999')) = '12/31/9999' then Null Else Max (EndDate) End
From Table4 t2 Where t1.idk >= t2.idk And (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND (t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL)) ) ) as MaxEndDate
FROM Table4 t1 WHERE Exists ( select * from table4 t2 where (t1.idk > t2.idk) And (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL)) ) )
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
> Hi all, > [quoted text clipped - 53 lines] > > Rob Robert Morley - 29 Nov 2005 16:49 GMT I posted last night saying I'd look at this this morning, but it seems my post never showed up. In any event, I've looked over the SQL query, and it makes a lot of sense. I've used UNION queries to pull off any number of amusing queries before, but the joins you used simply never occurred to me. Nice bit of coding, thanks!
Rob
> For doing this on the SQL-Server side, the most obvious solution would be > to use a cursor with a temporary table. If you want a SELECT only [quoted text clipped - 120 lines] >> >> Rob Robert Morley - 29 Nov 2005 18:58 GMT (Note: I have discontinued this thread in the Access Modules/Coding group, as it no longer applies to that group.)
Okay, I just tried it out, and it appears to work correctly at first glance, but if I understand the algorithm correctly, the assumption is being made that Table4 will be ordered by AccountID, PersonID, and BeginDate, and that IdK will also increment according to that order, correct? (I've tried not sorting in that order, and the resulting set completely left out the records that violated that rule.)
This isn't a problem, as I have to create a temporary table to provide an identity field anyway (the original database I'm getting this from doesn't seem to believe in identity values, or at least isn't exposing them to me), I just want to be certain I'm understanding it properly.
Oh, one other oddity I came across in this query: when I used '12/31/9999', it rejected the date as being out-of-range for a smalldatetime (which, of course, is incorrect), but when I used '12/31/2006', it was fine. On further exploration, and switching to an unambiguous format, I found that 'Jun 6, 2079' is the cut-off...'Jun 7, 2079' and beyond will return an out-of-range error. I don't have rights on the physical server to check the international settings in its control panel, but I DO have full rights to the SQL Server instance, so I can tell you that this doesn't in any way correspond to the two-digit date cutoff there (which is 1950-2049). Any thoughts?
Thanks again, Rob
> For doing this on the SQL-Server side, the most obvious solution would be > to use a cursor with a temporary table. If you want a SELECT only [quoted text clipped - 120 lines] >> >> Rob Robert Morley - 29 Nov 2005 19:09 GMT Never mind the bit in the previous message about the smalldatetime. I'm still a little new to SQL Server and failed to notice that the "smalldatetime data type, described" also included data on the datetime data type, so mis-read the limits of the type as being '12/31/9999'.
Thanks, Rob
Sylvain Lafontaine - 29 Nov 2005 19:44 GMT Hum, no, I've made no assumption about the data beeind ordered by Account ID, PersonID and BeginDate. The only assumption was that the IdK column is a primary key.
However, after taking a second look at the design, the second part of the UNION is obviously wrong: the idea was to return only a single record (any one of them) from a group of two or more overlaping records and it is obvious to me now that adding the condition « (t1.idk > t2.idk) » is not sufficient to achieve that: it will work only when there is only two records, not more. By the same token, the condition (t1.idk >= t2.idk) must also be removed from the two subqueries that are computing the MinDate and the MaxDate
To correct this, you should add a Not Exist clause to the whole thing or use a Max(IdK) (or a Min()) clause and make an equality comparaison with the IdK:
First solution:
WHERE Exists ( .... )
AND Not Exists ( select * from table4 t2 where (t1.idk < t2.idk) And (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL))
)
Or better, the second solution using the Max() or the Min() function:
WHERE t1.IdK = ( select Max (IdK) from table4 t2 where (t1.AccountId = t2.AccountId) And (t1.PersonId = t2.PersonId) And ( (t1.BeginDate <= t2.BeginDate AND(t1.EndDate >= t2.BeginDate OR t1.EndDate IS NULL)) OR (t2.BeginDate <= t1.BeginDate AND (t2.EndDate >= t1.BeginDate OR t2.EndDate IS NULL))
)
Sorry for the bugs but I was working late and I was simply using the test data that you provided without thinking any further. Also, I didn't take the time to rebuild the test table and test this solution, so maybe there is still a bug lurking there but feel free to post again if you have still some trouble with this.
And don't forget to remove the (t1.idk >= t2.idk) from the two subqueries.
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
> (Note: I have discontinued this thread in the Access Modules/Coding > group, as it no longer applies to that group.) [quoted text clipped - 149 lines] >>> >>> Rob Sylvain Lafontaine - 29 Nov 2005 19:48 GMT Well, I'm really tired: in the second solution with the Max() function, you must also add the condition (t1.idk != t2.idk) to retrieve only records with a least one overlap with another record.
Sorry again, I think that I will go to sleep early this evening.
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
> (Note: I have discontinued this thread in the Access Modules/Coding > group, as it no longer applies to that group.) [quoted text clipped - 149 lines] >>> >>> Rob Robert Morley - 29 Nov 2005 20:06 GMT > Sorry again, I think that I will go to sleep early this evening. Hehehe...as you can see from my other posts, I think that should apply to both of us! :)
Rob
Robert Morley - 29 Nov 2005 21:38 GMT I tried that adding t1.IdK != t2.IdK, but that just returns an empty recordset every time, so maybe I'm misunderstanding something? I've tried playing with the logic, but just can't seem to get it to work no matter what I try.
Just to check a worst-case scenario, here's the "ultimate" test recordset. It contains three non-overlapping ranges, where one range (the last three records) has three separate intersecting sets, but the first & last sets do not overlap with each other, they only overlap by virtue of the second set joining them.
A001 1 Jan 1, 2001 Dec 31, 2001 A001 1 Jan 1, 2002 Jul 31, 2002 A001 1 Jun 1, 2002 Dec 31, 2002 A001 1 Jan 1, 2003 Jan 31, 2004 A001 1 Jan 1, 2004 Jan 31, 2005 A001 1 Jan 1, 2005 NULL
...which should collapse to...
A001 1 Jan 1, 2001 Dec 31, 2001 A001 1 Jan 1, 2002 Dec 31, 2002 A001 1 Jan 1, 2003 NULL
Sylvain Lafontaine - 30 Nov 2005 00:48 GMT I've just take a deep look at your data and I'm very sorry to say but if the first part (non-overlaping records) is easy, the second one is not because of a hierarchical relation problem: the 4th and the 5th record are overlaping and the 5th and the 6th are overlaping too but the 4th and the 5th are not; so it's much more difficult to group them then I tought at first.
SQL-Server 2000 doesn't offer native support for recursivity (SQL-2005 does), so I've decided to give up and use my first idea of using a cursor:
CREATE PROCEDURE [dbo].[table4_test] AS
Set NoCount ON create table #t (IdK int primary key, AccountId nchar (4), PersonId int, BeginDate DateTime, EndDate DateTime)
declare @IdK int declare @IdK2 int declare @AccountId nchar(4) declare @PersonId int declare @BeginDate DateTime declare @EndDate DateTime
Declare Curseur cursor for Select IdK, AccountId, PersonId, BeginDate, EndDate From Table4 Order By BeginDate
Open Curseur Fetch Next From Curseur into @IdK, @AccountId, @PersonId, @BeginDate, @EndDate
While @@Fetch_Status = 0 Begin Select @IdK2 = ( Select Idk From #T Where (#T.AccountId = @AccountId) And (#T.PersonId = @PersonId) And ( (#T.BeginDate <= @BeginDate AND (#T.EndDate >= @BeginDate OR #T.EndDate IS NULL)) OR (@BeginDate <= #T.BeginDate AND (@EndDate >= #T.BeginDate OR @EndDate IS NULL)) ) )
if (@IdK2 is Not Null) begin Update #T Set BeginDate = Case When BeginDate < @BeginDate Then BeginDate Else @BeginDate End, EndDate = Case When EndDate is Null or @EndDate is Null Then Null When EndDate >= @EndDate Then EndDate Else @EndDate End From #T Where #T.IdK = @IdK2 End Else Begin Insert into #T Select @IdK, @AccountId, @PersonId, @BeginDate, @EndDate End
Fetch Next From Curseur into @IdK, @AccountId, @PersonId, @BeginDate, @EndDate End
Close Curseur Deallocate Curseur
Select IdK, AccountId, PersonId, BeginDate, EndDate from #T GO
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
>I tried that adding t1.IdK != t2.IdK, but that just returns an empty >recordset every time, so maybe I'm misunderstanding something? I've tried [quoted text clipped - 19 lines] > A001 1 Jan 1, 2002 Dec 31, 2002 > A001 1 Jan 1, 2003 NULL Sylvain Lafontaine - 30 Nov 2005 00:59 GMT Oups, I wanted to write: « the 4th and the 5th record are overlaping and the 5th and the 6th are overlaping too but the 4th and the 6th are not; »
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
> I've just take a deep look at your data and I'm very sorry to say but if > the first part (non-overlaping records) is easy, the second one is not [quoted text clipped - 91 lines] >> A001 1 Jan 1, 2002 Dec 31, 2002 >> A001 1 Jan 1, 2003 NULL Robert Morley - 30 Nov 2005 04:46 GMT I just wrote the code in VBA earlier this afternoon to work out the kinks, since that's what I'm more familiar with, and was going to convert it to a Stored Procedure tomorrow...the code below will undoubtedly save me a lot of time. :)
Je te remerci, Rob
> I've just take a deep look at your data and I'm very sorry to say but if > the first part (non-overlaping records) is easy, the second one is not [quoted text clipped - 91 lines] >> A001 1 Jan 1, 2002 Dec 31, 2002 >> A001 1 Jan 1, 2003 NULL Robert Morley - 30 Nov 2005 19:27 GMT Hi Sylvain (and anybody else following this thread),
I just came up with my final code, which takes a different approach from yours, but seems to work very well. It parses the original 5023-record set into the final set of 4643 records in about 2-3 seconds.
The code below is what I ultimately used. Note that this code includes additional fields, since I had simplified my original recordset for the sake of example. If you spot any bugs, or have any suggested improvements to the function below, I'd love to hear them.
The basic logic is to initialize one record, move to the next record, and compare the current record in the set with the previous one. If they're within the same subset (AccountID, TypeID, and ContactID are equal to the previous record), compare the start date of the current date with the end date of the previous one (we don't need to compare start dates, since they're already sorted in order and the current one is guaranteed to be >= the previous one). If the dates intersect, update the end date to be the later of the two dates. If the dates don't intersect, or we've moved into a new subset, write the saved values out to the table, update the saved values appropriately and continue.
A final note, for anybody wondering, at this point GetsInterviewReports and GetsAccountReports can be looked up based on TypeID, so we know we only need to track them at the subset level...it was simply convenient to include them in this recordset rather than doing lookups/joins later on.
CREATE FUNCTION dbo.DDSTRepTable () RETURNS @AcctTeam table ( AccountID char(4), TypeID tinyint, ContactID smallint, GetsInterviewReports bit, GetsAccountReports bit, StartDate smalldatetime, EndDate smalldatetime ) BEGIN DECLARE @AccountID char(4) DECLARE @AccountIDSave char(4) DECLARE @TypeID tinyint DECLARE @TypeIDSave tinyint DECLARE @ContactID smallint DECLARE @ContactIDSave smallint DECLARE @IntRpts bit DECLARE @IntRptsSave bit DECLARE @AcctRpts bit DECLARE @AcctRptsSave bit DECLARE @StartDate smalldatetime DECLARE @StartDateSave smalldatetime DECLARE @EndDate smalldatetime DECLARE @EndDateSave smalldatetime DECLARE DDSTRepCursor cursor for SELECT AccountID, TypeID, ContactID, InterviewReports, AccountReports, StartDate, EndDate FROM DDSTRepTeam ORDER BY AccountID, TypeID, ContactID, StartDate
OPEN DDSTRepCursor FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID, @IntRpts, @AcctRpts, @StartDate, @EndDate IF @@FETCH_STATUS = 0 BEGIN SET @AccountIDSave = @AccountID SET @TypeIDSave = @TypeID SET @ContactIDSave = @ContactID SET @IntRptsSave = @IntRpts SET @AcctRptsSave = @AcctRpts SET @StartDateSave = @StartDate SET @EndDateSave = @EndDate FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID, @IntRpts, @AcctRpts, @StartDate, @EndDate END
WHILE @@FETCH_STATUS = 0 BEGIN IF (@AccountID = @AccountIDSave) AND (@TypeID = @TypeIDSave) AND (@ContactID = @ContactIDSave) IF (@StartDate < ISNULL(@EndDateSave, 'Jun 6, 2079')) BEGIN IF ISNULL(@EndDate, 'Jun 6, 2079) > @EndDateSave SET @EndDateSave = @EndDate END --Necessary only so the above 'IF' isn't associated with the 'ELSE' below. ELSE BEGIN INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave, @ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave SET @StartDateSave = @StartDate SET @EndDateSave = @EndDate END ELSE BEGIN INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave, @ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave SET @AccountIDSave = @AccountID SET @TypeIDSave = @TypeID SET @ContactIDSave = @ContactID SET @IntRptsSave = @IntRpts SET @AcctRptsSave = @AcctRpts SET @StartDateSave = @StartDate SET @EndDateSave = @EndDate END FETCH NEXT FROM DDSTRepCursor INTO @AccountID, @TypeID, @ContactID, @IntRpts, @AcctRpts, @StartDate, @EndDate END INSERT INTO @AcctTeam SELECT @AccountIDSave, @TypeIDSave, @ContactIDSave, @IntRptsSave, @AcctRptsSave, @StartDateSave, @EndDateSave
CLOSE DDSTRepCursor DEALLOCATE DDSTRepCursor RETURN END
Sylvain Lafontaine - 30 Nov 2005 20:02 GMT Yes, you're right: ordering the values by time date make it unecessary to look at more than a single previous record.
The only (small) bug that I've found in your code is the possibility of a source with 0 records: in this case, the last INSERT will insert a record with all null values and this record will be returned as the result instead of returning an empty result.
 Signature Sylvain Lafontaine, ing. MVP - Technologies Virtual-PC E-mail: http://cerbermail.com/?QugbLEWINF
> Hi Sylvain (and anybody else following this thread), > [quoted text clipped - 109 lines] > RETURN > END Robert Morley - 30 Nov 2005 20:34 GMT Good catch, thanks!
In theory, that'll never be a problem (I sure as !$#% HOPE we never have 0 employees, anyway), but I'll make the adjustment just to be safe.
Thanks yet again, Rob
> Yes, you're right: ordering the values by time date make it unecessary to > look at more than a single previous record. [quoted text clipped - 117 lines] >> RETURN >> END Robert Morley - 29 Nov 2005 18:59 GMT Note: I have discontinued replying to this group, as it clearly no longer applies to Access Modules/Coding.
Rob
Robert Morley - 29 Nov 2005 19:07 GMT Oh crap, I'm an idiot. Please ignore the above message for the SQL Server newsgroup. <blush>
*MODERATORS*: If you happen to notice this message, please delete this post and the one above it for the SQL Server newsgroup only!
Rob
|
|
|