sql server - Should I partition by tenant or group of tenants for offline data store? -


i'm in process of designing offline data store used improve performance of lists , reports in asp.net application. considering adopting sql server partitioning , looking guidance around number of partitions use.

the offline data store use multi-tenant using shared database, shared schema approach , sql server 2014. contains tenantid column uses uniqueidentifier datatype.

on regular schedule schedule throughout day, etl process consisting of several ssis packages pull data source databases offline data store. single instance of offline data store need pull 500-1,000 single-tenant sql server source databases. process takes between 1-2 minutes.

ideally, we'd able run etl process multiple tenants @ time minimize latency of data store. in preliminary testing handful of tenants, encountered number of deadlocks writing destination tables in offline data store. around this, tried implementing partitioning on tenantid column using below partition function creates 16 partitions based on tenantid:

    create partition function tenant_range (uniqueidentifier)      range right values      (      '00000000-0000-0000-0000-000000000000',      '00000000-0000-0000-0000-100000000000',      '00000000-0000-0000-0000-200000000000',      '00000000-0000-0000-0000-300000000000',      '00000000-0000-0000-0000-400000000000',      '00000000-0000-0000-0000-500000000000',      '00000000-0000-0000-0000-600000000000',      '00000000-0000-0000-0000-700000000000',      '00000000-0000-0000-0000-800000000000',      '00000000-0000-0000-0000-900000000000',      '00000000-0000-0000-0000-a00000000000',      '00000000-0000-0000-0000-b00000000000',      '00000000-0000-0000-0000-c00000000000',      '00000000-0000-0000-0000-d00000000000',      '00000000-0000-0000-0000-e00000000000',      '00000000-0000-0000-0000-f00000000000'  ) 

this worked , eliminated blocking during etl process. have ability in our scheduling service ensure no 2 tenants in same partition run etl process @ same time. since of our tables contain tenantid , queries database filter on tenantid, i've been able verify queries enjoying performance benefits partition elimination.

we're debating whether continue using above partition function or create separate partition each tenant. require alter partition function , call split range when onboard new tenant, should feasible. eliminate randomness of current partitioning function, since have no control on how many tenants fall given partition since tenant ids assigned randomly. however, i'm concerned amount of overhead introduce creating 1,000 partitions. tenants small while others larger. i'm expecting footprint in offline data store range 50 mb 5 gb.

should go ahead , partition individual tenant ids or keep current strategy , expand number of partitions needed? guidance appreciated.


Popular posts from this blog