Professional Data Engineer
A Professional Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems.
The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.
The Google Cloud Certified - Professional Data Engineer exam assesses your ability to:
Detailed information: https://www.testsimulate.com/Professional-Data-Engineer-study-materials.html
The Data Engineer also analyzes data to gain insight into business outcomes, builds statistical models to support decision-making, and creates machine learning models to automate and simplify key business processes.
The Google Cloud Certified - Professional Data Engineer exam assesses your ability to:
- check Build and maintain data structures and databases
- check Design data processing systems
- check Analyze data and enable machine learning
- check Model business processes for analysis and optimization
- check Design for reliability
- check Visualize data and advocate policy
- check Design for security and compliance
Detailed information: https://www.testsimulate.com/Professional-Data-Engineer-study-materials.html
About this certification exam
This exam objectively measures an individual’s ability to demonstrate the critical job skills for the role. To earn this certification you must pass the Professional Data Engineer exam. The format is multiple choice and multiple select. The exam has no prerequisites. This exam must be taken in-person at one of our testing center locations.Locate a test center near you.
- Length: 2 hours
- Registration fee: USD $200
- Language: English, Japanese, Spanish, Portuguese
Path to Success
Review the exam guide
View an outline of the topics that may appear on the exam and you are expected to know in order to demonstrate proficiency. Some of the questions on the exam may refer you to a case study that describes a fictitious business and solution concept. Review the sample case studies that may appear on your exam.
Take the training courses
Get yourself up to speed and learn best practices. Find classes focusing on each of the technical skills covered in the exam.
Practice with Qwiklabs
Get hands-on practice working with Google Cloud technologies. Learn at your own pace with a series of labs that are available on-demand. Start with GCP Essentials followed by the Data Engineering quest.
Documentation
Visit the documentation page with overviews and in-depth discussions on the concepts and critical components of GCP.
Draw on your own experience
We can’t stress enough the value of your own work experience. Use the provided resources along with your work experience to prepare for this exam.
Assess your knowledge
Familiarize yourself with the type of questions that will be on the exam. Check your readiness to take this exam.
Schedule your exam
Register and find a location near you.
We offer free demo:
NEW QUESTION: 1
You are working on a sensitive project involving private user data. You have set up a project on
Google Cloud Platform to house your work internally. An external consultant is going to assist with
coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should
you maintain users' privacy?
A. Create a service account and allow the consultant to log on with it.
B. Grant the consultant the Viewer role on the project.
C. Grant the consultant the Cloud Dataflow Developer role on the project.
D. Create an anonymized sample of the data for the consultant to work with in a different project.
Answer: A
NEW QUESTION: 2
Your software uses a simple JSON format for all messages. These messages are published to
Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard
for the CFO. During testing, you notice that some messages are missing in the dashboard. You check
the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do
next?
A. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
B. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing
messages to Cloud Dataflow.
C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
D. Check the dashboard application to see if it is not displaying correctly.
Answer: A
NEW QUESTION: 3
You create an important report for your large team in Google Data Studio 360. The report uses
Google BigQuery as its data source. You notice that visualizations are not showing data that is less
than 1 hour old.
What should you do?
A. Refresh your browser tab showing the visualizations.
B. Disable caching in BigQuery by editing table details.
C. Clear your browser history for the past hour then reload the tab showing the virtualizations.
D. Disable caching by editing the report settings.
Answer: D
Explanation
Reference https://support.google.com/datastudio/answer/7020039?hl=en
NEW QUESTION: 4
You want to process payment transactions in a point-of-sale application that will run on Google
Cloud Platform. Your user base could grow exponentially, but you do not want to manage
infrastructure scaling.
Which Google database service should you use?
A. BigQuery
B. Cloud Datastore
C. Cloud Bigtable
D. Cloud SQL
Answer: D
NEW QUESTION: 5
You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub
subscription as the source. You need to make an update to the code that will make the new Cloud
Dataflow pipeline incompatible with the current version. You do not want to lose any data when
making this update. What should you do?
A. Update the current pipeline and use the drain flag.
B. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.
C. Update the current pipeline and provide the transform mapping JSON object.
D. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
Answer: B
NEW QUESTION: 6
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has
decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster
would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using
that much block storage. You want to minimize the storage cost of the migration. What should you
do?
A. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
C. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent
Disk.
D. Put the data into Google Cloud Storage.
Answer: B
NEW QUESTION: 7
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-
use Hadoop jobs they have already created and minimize the management of the cluster as much as
possible. They also want to be able to persist data beyond the life of the cluster. What should you
do?
A. Create a Google Cloud Dataflow job to process the data.
B. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
C. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
D. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.
E. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
Answer: A
NEW QUESTION: 8
Your company built a TensorFlow neutral-network model with a large number of neurons and
layers. The model fits well for the training data. However, when tested against new data, it performs
poorly. What method can you employ to address this?
A. Serialization
B. Dropout Methods
C. Dimensionality Reduction
D. Threading
Answer: B
Explanation
Reference
https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-
tensorflow-30505
NEW QUESTION: 9
You are building a model to predict whether or not it will rain on a given day. You have
thousands of input features and want to see if you can improve training speed by removing some
features while having a minimum effect on model accuracy. What can you do?
A. Eliminate features that are highly correlated to the output labels.
B. Remove the features that have null values for more than 50% of the training records.
C. Instead of feeding in each feature individually, average their values in batches of 3.
D. Combine highly co-dependent features into one representative feature.
Answer: D
NEW QUESTION: 10
You are designing a basket abandonment system for an ecommerce company. The system will
send a message to a user based on these rules:
* No interaction by the user on the site for 1 hour
* Has added more than $30 worth of products to the basket
* Has not completed a transaction
You use Google Cloud Dataflow to process the data and decide if a message should be sent. How
should you design the pipeline?
A. Use a sliding time window with a duration of 60 minutes.
B. Use a session window with a gap time duration of 60 minutes.
C. Use a fixed-time window with a duration of 60 minutes.
D. Use a global window with a time based trigger with a delay of 60 minutes.
Answer: D
You are working on a sensitive project involving private user data. You have set up a project on
Google Cloud Platform to house your work internally. An external consultant is going to assist with
coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should
you maintain users' privacy?
A. Create a service account and allow the consultant to log on with it.
B. Grant the consultant the Viewer role on the project.
C. Grant the consultant the Cloud Dataflow Developer role on the project.
D. Create an anonymized sample of the data for the consultant to work with in a different project.
Answer: A
NEW QUESTION: 2
Your software uses a simple JSON format for all messages. These messages are published to
Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard
for the CFO. During testing, you notice that some messages are missing in the dashboard. You check
the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do
next?
A. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
B. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing
messages to Cloud Dataflow.
C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
D. Check the dashboard application to see if it is not displaying correctly.
Answer: A
NEW QUESTION: 3
You create an important report for your large team in Google Data Studio 360. The report uses
Google BigQuery as its data source. You notice that visualizations are not showing data that is less
than 1 hour old.
What should you do?
A. Refresh your browser tab showing the visualizations.
B. Disable caching in BigQuery by editing table details.
C. Clear your browser history for the past hour then reload the tab showing the virtualizations.
D. Disable caching by editing the report settings.
Answer: D
Explanation
Reference https://support.google.com/datastudio/answer/7020039?hl=en
NEW QUESTION: 4
You want to process payment transactions in a point-of-sale application that will run on Google
Cloud Platform. Your user base could grow exponentially, but you do not want to manage
infrastructure scaling.
Which Google database service should you use?
A. BigQuery
B. Cloud Datastore
C. Cloud Bigtable
D. Cloud SQL
Answer: D
NEW QUESTION: 5
You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub
subscription as the source. You need to make an update to the code that will make the new Cloud
Dataflow pipeline incompatible with the current version. You do not want to lose any data when
making this update. What should you do?
A. Update the current pipeline and use the drain flag.
B. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.
C. Update the current pipeline and provide the transform mapping JSON object.
D. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
Answer: B
NEW QUESTION: 6
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has
decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster
would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using
that much block storage. You want to minimize the storage cost of the migration. What should you
do?
A. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
C. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent
Disk.
D. Put the data into Google Cloud Storage.
Answer: B
NEW QUESTION: 7
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-
use Hadoop jobs they have already created and minimize the management of the cluster as much as
possible. They also want to be able to persist data beyond the life of the cluster. What should you
do?
A. Create a Google Cloud Dataflow job to process the data.
B. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
C. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
D. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.
E. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
Answer: A
NEW QUESTION: 8
Your company built a TensorFlow neutral-network model with a large number of neurons and
layers. The model fits well for the training data. However, when tested against new data, it performs
poorly. What method can you employ to address this?
A. Serialization
B. Dropout Methods
C. Dimensionality Reduction
D. Threading
Answer: B
Explanation
Reference
https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-
tensorflow-30505
NEW QUESTION: 9
You are building a model to predict whether or not it will rain on a given day. You have
thousands of input features and want to see if you can improve training speed by removing some
features while having a minimum effect on model accuracy. What can you do?
A. Eliminate features that are highly correlated to the output labels.
B. Remove the features that have null values for more than 50% of the training records.
C. Instead of feeding in each feature individually, average their values in batches of 3.
D. Combine highly co-dependent features into one representative feature.
Answer: D
NEW QUESTION: 10
You are designing a basket abandonment system for an ecommerce company. The system will
send a message to a user based on these rules:
* No interaction by the user on the site for 1 hour
* Has added more than $30 worth of products to the basket
* Has not completed a transaction
You use Google Cloud Dataflow to process the data and decide if a message should be sent. How
should you design the pipeline?
A. Use a sliding time window with a duration of 60 minutes.
B. Use a session window with a gap time duration of 60 minutes.
C. Use a fixed-time window with a duration of 60 minutes.
D. Use a global window with a time based trigger with a delay of 60 minutes.
Answer: D
评论
发表评论