“Why is DEFAULT_FDW_TUPLE_COST so insanely low?”
That was the subject of the email thread initiated by David Rowley to discuss this topic. I found the subject line amusing, accurate, and fully descriptive of the problem at hand. The discussion resulted in a commit that changed the default value from 0.01 to 0.2. Because I had previously written about postgres_fdw and also mentioned fdw_tuple_cost, I figured it would be good to go deeper into what this parameter is about and why it makes sense to default it to 0.2.
The Origin of ‘fdw_tuple_cost’
Background on FDWs
Foreign Data Wrappers (FDWs) in PostgreSQL allow database administrators and users to access data from external sources as if they were regular tables within a PostgreSQL database. This capability facilitates seamless data integration and aggregation from diverse sources.
Introduction to ‘fdw_tuple_cost’
Born from the need to account for the additional overhead of transferring data across servers, fdw_tuple_cost emerged in PostgreSQL version 9.1. This floating-point value represents the estimated cost per row retrieved from the foreign server. It factors in aspects like network latency, serialization, deserialization, and potential data transformations. By incorporating this cost into query planning, PostgreSQL attempts to choose the most efficient execution strategy, considering both local and remote processing options.
Uses of ‘fdw_tuple_cost’
Query Optimization
‘fdw_tuple_cost’ is pivotal in optimizing queries that involve foreign tables. By adjusting this parameter, database administrators can influence the query planner’s choice, favoring execution plans that are more cost-effective based on the actual performance characteristics of the foreign data source.
Cost-Based Decision Making
The PostgreSQL query planner uses ‘fdw_tuple_cost’ in conjunction with other cost parameters (like ‘seq_page_cost’ and ‘cpu_tuple_cost’) to determine the most efficient way to execute a query. This cost-based approach ensures that the planner can adapt to the specific characteristics of each data source.
Impact on Foreign Data Retrieval
The value set for ‘fdw_tuple_cost’ directly impacts the decision on whether to execute a query remotely on the foreign server or to fetch the data and process it locally. A lower cost may encourage more aggressive use of foreign tables, while a higher cost may lead the planner to favor local execution to avoid perceived inefficiencies.
The Change from 0.01 to 0.2
Rationale for Change
Here is an excerpt from the email from David Rowley:
For comparison, cpu_tuple_cost, something we probably expect to be in a CPU cache is also 0.01. We've defined DEFAULT_PARALLEL_TUPLE_COST to be 0.1, which is 10x cpu_tuple_cost. That's coming from a shared memory segment. So why do we think DEFAULT_FDW_TUPLE_COST should be the same as cpu_tuple_cost when that's probably pulling a tuple from some remote server over some (possibly slow) network?
I think this is righting a previous wrong. It makes sense to have the default cost of retrieving data from a foreign server appropriately set to incorporate the time lag introduced due to network latency. David ran a few experiments to prove that 0.2 is an appropriate default value to start with.
Adjustment Recommendations
Because this is a change in the default out-of-the-box value, existing systems are not impacted by it. I would, however, encourage DBAs to consider continuous evaluation of the performance of queries involving foreign tables. Adjusting ‘fdw_tuple_cost’ may involve benchmarking with different values to find the optimal setting that reflects the true cost of foreign data access in your specific environment.
High vs. Low ‘fdw_tuple_cost’ – Implications
High ‘fdw_tuple_cost’
Setting a high ‘fdw_tuple_cost’ reflects an assumption that accessing foreign data is expensive. This could be due to network latency, unreliable connections, or the processing overhead on the foreign server. A high cost discourages the planner from using foreign tables extensively, which could be beneficial if foreign data access is indeed costly or slow.
Low ‘fdw_tuple_cost’
Conversely, a low ‘fdw_tuple_cost’ suggests that accessing foreign data is relatively inexpensive, comparable to local data access. This setting can make foreign tables more attractive for query planning, encouraging their use even for operations that are sensitive to performance.
Balancing Performance and Cost
Finding the right balance for ‘fdw_tuple_cost’ involves understanding the performance characteristics of your foreign data sources and the network infrastructure connecting them to your PostgreSQL database. Adjusting this parameter requires careful consideration and potential benchmarking to determine the optimal setting for your specific environment.
Benefits of Accurate Estimation
Precise costing through fdw_tuple_cost offers several advantages:
Improved Query Performance: Choosing the right execution strategy reduces unnecessary data transfer and minimizes overall query execution time.
Efficient Resource Utilization: By accurately predicting resource consumption, PostgreSQL can allocate resources effectively, leading to better overall system performance.
Optimized Data Processing: In some scenarios, pushing down filtering or aggregation operations to the foreign server might prove more efficient. Accurate cost estimation helps identify such opportunities.
Beyond the Default
Remember, the default value of 0.2 serves as a starting point. Consider these factors for fine-tuning:
Network bandwidth and latency: High latency demands a higher cost to discourage excessive data transfer.
Foreign server capabilities: Adjust the cost if the foreign server has limited processing power compared to the local instance.
Data transformations: Factor in any significant data manipulation occurring on the foreign server.
Advanced Cost Estimation Techniques:
use_remote_estimate: Setting this parameter to true instructs PostgreSQL to obtain cost estimates directly from the foreign server, potentially providing more accurate information.
Analyze statistics: Regularly refresh statistics on the foreign tables to ensure local cost estimates are up-to-date.
Conclusion
Understanding fdw_tuple_cost and its nuances empowers you to optimize data access using PostgreSQL FDWs. By accurately accounting for data transfer costs, you can achieve better query performance, efficient resource utilization, and informed decisions about processing strategies. While the default value change reflects the fixing of a previously impractical value, remember to tailor it to your specific environment for optimal results.
As your data landscape grows and evolves, continuous evaluation and adjustment of fdw_tuple_cost will ensure your queries navigate the path of least resistance, extracting maximum value from your distributed data ecosystem.