performance - PostgreSQL Simple JOIN very slow -
performance - PostgreSQL Simple JOIN very slow -
i have simple query, , 2 tables:
drilldown
create sequence drilldown_id_seq; create table drilldown ( transactionid bigint not null default nextval('drilldown_id_seq'), userid bigint not null default 0 references users(id), pathid bigint not null default 0, reqms bigint not null default 0, quems bigint not null default 0, clicktime timestamp default current_timestamp, primary key(transactionid) ); alter sequence drilldown_id_seq owned drilldown.transactionid; create index drilldown_idx1 on drilldown (clicktime);
querystats
create sequence querystats_id_seq; create table querystats ( id bigint not null default nextval('querystats_id_seq'), transactionid bigint not null default 0 references drilldown(transactionid), querynameid bigint not null default 0 references queryname(id), queryms bigint not null default 0, primary key(id) ); alter sequence querystats_id_seq owned querystats.id; create index querystats_idx1 on querystats (transactionid); create index querystats_idx2 on querystats (querynameid);
drilldown
has 1.5 1000000 records, , querystats
has 10 1000000 records; problem happens when bring together between two.
query
explain analyse select avg(qs.queryms) querystats qs bring together drilldown d on (qs.transactionid=d.transactionid) querynameid=1;
query plan
aggregate (cost=528596.96..528596.97 rows=1 width=8) (actual time=5213.154..5213.154 rows=1 loops=1) -> hash bring together (cost=274072.53..518367.59 rows=4091746 width=8) (actual time=844.087..3528.788 rows=4117717 loops=1) hash cond: (qs.transactionid = d.transactionid) -> bitmap heap scan on querystats qs (cost=88732.62..210990.44 rows=4091746 width=16) (actual time=309.502..1321.029 rows=4117717 loops=1) recheck cond: (querynameid = 1) -> bitmap index scan on querystats_idx2 (cost=0.00..87709.68 rows=4091746 width=0) (actual time=307.916..307.916 rows=4117718 loops=1) index cond: (querynameid = 1) -> hash (cost=162842.29..162842.29 rows=1371250 width=8) (actual time=534.065..534.065 rows=1372574 loops=1) buckets: 4096 batches: 64 memory usage: 850kb -> index scan using drilldown_pkey on drilldown d (cost=0.00..162842.29 rows=1371250 width=8) (actual time=0.015..364.657 rows=1372574 loops=1) total runtime: 5213.205 ms (11 rows)i know there tuning parameters can adjust postgresql, want know query doing optimal way of joing 2 tables?
or maybe sort of inner join? i'm not sure.
any pointers appreciated!
editdatabase#\d drilldown table "public.drilldown" column | type | modifiers ---------------+-----------------------------+-------------------------------------------------------- transactionid | bigint | not null default nextval('drilldown_id_seq'::regclass) userid | bigint | not null default 0 pathid | bigint | not null default 0 reqms | bigint | not null default 0 quems | bigint | not null default 0 clicktime | timestamp without time zone | default now() indexes: "drilldown_pkey" primary key, btree (transactionid) "drilldown_idx1" btree (clicktime) foreign-key constraints: "drilldown_userid_fkey" foreign key (userid) references users(id) referenced by: table "querystats" constraint "querystats_transactionid_fkey" foreign key (transactionid) references drilldown(transactionid) database=# \d querystats table "public.querystats" column | type | modifiers ---------------+--------+--------------------------------------------------------- id | bigint | not null default nextval('querystats_id_seq'::regclass) transactionid | bigint | not null default 0 querynameid | bigint | not null default 0 queryms | bigint | not null default 0 indexes: "querystats_pkey" primary key, btree (id) "querystats_idx1" btree (transactionid) "querystats_idx2" btree (querynameid) foreign-key constraints: "querystats_querynameid_fkey" foreign key (querynameid) references queryname(id) "querystats_transactionid_fkey" foreign key (transactionid) references drilldown(transactionid)
so here 2 tables requested , version
postgresql 9.1.7 on x86_64-unknown-linux-gnu, compiled gcc (ubuntu/linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
so query doing getting average rows values of queryms each query type (querynameid)
name | current_setting | source ----------------------------+----------------------------------+---------------------- application_name | psql | client client_encoding | utf8 | client datestyle | iso, mdy | configuration file default_text_search_config | pg_catalog.english | configuration file enable_seqscan | off | session external_pid_file | /var/run/postgresql/9.1-main.pid | configuration file lc_messages | en_us.utf-8 | configuration file lc_monetary | en_us.utf-8 | configuration file lc_numeric | en_us.utf-8 | configuration file lc_time | en_us.utf-8 | configuration file log_line_prefix | %t | configuration file log_timezone | localtime | environment variable max_connections | 100 | configuration file max_stack_depth | 2mb | environment variable port | 5432 | configuration file shared_buffers | 24mb | configuration file ssl | on | configuration file timezone | localtime | environment variable unix_socket_directory | /var/run/postgresql | configuration file (19 rows)
i see enable_seqscan=off, have not touched settings, default install.
update
i made changes below comments , here results.
explain analyse select (select avg(queryms) total querystats querynameid=3) total querystats qs bring together drilldown d on (qs.transactionid=d.transactionid) qs.querynameid=3 limit 1; query plan --------------------------------------------------------------------------------------------------------------------------------------------------------- limit (cost=196775.99..196776.37 rows=1 width=0) (actual time=2320.876..2320.876 rows=1 loops=1) initplan 1 (returns $0) -> aggregate (cost=196775.94..196775.99 rows=1 width=8) (actual time=2320.815..2320.815 rows=1 loops=1) -> bitmap heap scan on querystats (cost=24354.25..189291.69 rows=2993698 width=8) (actual time=226.516..1144.690 rows=2999798 loops=1) recheck cond: (querynameid = 3) -> bitmap index scan on querystats_idx (cost=0.00..23605.83 rows=2993698 width=0) (actual time=225.119..225.119 rows=2999798 loops=1) index cond: (querynameid = 3) -> nested loop (cost=0.00..1127817.12 rows=2993698 width=0) (actual time=2320.876..2320.876 rows=1 loops=1) -> seq scan on drilldown d (cost=0.00..76745.10 rows=1498798 width=8) (actual time=0.009..0.009 rows=1 loops=1) -> index scan using querystats_idx on querystats qs (cost=0.00..0.60 rows=2 width=8) (actual time=0.045..0.045 rows=1 loops=1) index cond: ((querynameid = 3) , (transactionid = d.transactionid)) total runtime: 2320.940 ms (12 rows)
it's behaving though have set enable_seqscan = off
, because using index scan populate hash table. never set of planner options off except diagnostic step, , if showing plan, please show options used. can run show lot of useful information:
select version(); select name, current_setting(name), source pg_settings source not in ('default', 'override');
it helps if tell runtime environment, amount of ram on machine, storage scheme looks like, , size of database (or better, active info set of referenced info in database).
as rough breakdown, 5.2 seconds breaks downwards to:
1.3 seconds find 4,117,717querystats
rows match selection criterion. 2.3 seconds randomly match against drilldown
records. 1.6 seconds pass 4,117,717 rows , calculate average. so, though seem have crippled ability utilize fastest plan, taking 1.26 microseconds (millionths of second) locate each row, bring together another, , work calculation of average. that's not bad on absolute basis, can faster plan.
first off, if using 9.2.x x less 3, upgrade 9.2.3 immediately. there performance regression types of plans fixed in recent release might impact query. in general, try remain up-to-date on minor releases (where version number changes past sec dot).
you can test different plans in single session setting planning factors on connection , running query (or explain
on it). seek this:
set seq_page_cost = 0.1; set random_page_cost = 0.1; set cpu_tuple_cost = 0.05; set effective_cache_size = '3gb'; -- utilize shared_buffers plus os cache
make sure enable_
settings on
.
performance postgresql join
Comments
Post a Comment