performance - Randomize timestamp column in large MySQL table -


i have test database table ~100m rows generated cloning original 3k rows multiple times. let's table describes events have timestamps. due cloning have ~10m events per day far real cases. i'd randomize date column , scatter records several days. here procedure i've come with:

drop procedure if exists `randomizedates`; delimiter // create procedure `randomizedates`(in `daterange` int) begin   declare id int unsigned;   declare buf timestamp;   declare done int default false;   declare cur1 cursor select event_id events;   declare continue handler not found set done = true;    open cur1;   the_loop: loop     fetch cur1 id;     if done       leave the_loop;     end if;     set buf = (select now() - interval floor(rand() * daterange) day);     update events set starttime = buf event_id = id;   end loop the_loop;   close cur1; end // delimiter ; 

on 3k table executes ~6 seconds assuming linear сomplexity take ~50 hours applied on 100m table. there way speed up? or maybe procedure incorrect @ all?

just do:

set @datarange = 7; update `events` set starttime = now() - interval floor(rand()) * @datarange day; 

databases not @ fetching , processing single rows in lopp, used in procedural languages (iterators, each loops, arrays etc), best at, , optimized processing sql, essetially declarative language - declare want without specyfying how it, in contrast procedural languages, used specify steps program must do.

remember - row row = slow slow.

@ simple example simulates table , compares procedure update:

drop table `events`; create table `events`  select * information_schema.tables 1=0;   alter table `events` add column event_id int primary key auto_increment first;  alter table `events` change column create_time starttime timestamp;  insert `events` select null, t.* information_schema.tables t cross join (   select 1 information_schema.tables   limit 100 ) xx  mysql> select count(*) `events`; +----------+ | count(*) | +----------+ |    17200 | +----------+ 

we created table 17 thousand rows. call procedure:

mysql> call `randomizedates`(7); query ok, 0 rows affected (34.26 sec) 

and update command:

mysql>     set @datarange = 7; query ok, 0 rows affected (0.00 sec)  mysql>     update `events`     ->     set starttime = now() - interval floor(rand()) * @datarange day; query ok, 17200 rows affected (0.23 sec) rows matched: 17200  changed: 17200  warnings: 0 

as see - 34 seconds / 0.23 second = 14782 % faster - it's huge difference !!!


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -