1. 2 links 機械手臂

  2. 實作 DDPG 方法應用於機械手臂控制

  • 訓練出來的效果


  • Tensorflow
  • Numpy
  • Pyglet (畫面呈現)
  • tensorboard 圖表


利用 tensorflow 建立 Actor(表演者) Critic(評審) 的網路

def _build_a(self, s, scope, trainable):
        with tf.variable_scope(scope):
            net = tf.layers.dense(s, 100, activation=tf.nn.relu, name='l1', trainable=trainable)
            a = tf.layers.dense(net, self.a_dim, activation=tf.nn.tanh, name='a', trainable=trainable)
            return tf.multiply(a, self.a_bound, name='scaled_a')

def _build_c(self, s, a, scope, trainable):
    with tf.variable_scope(scope):
        n_l1 = 100
        w1_s = tf.get_variable('w1_s', [self.s_dim, n_l1], trainable=trainable)
        w1_a = tf.get_variable('w1_a', [self.a_dim, n_l1], trainable=trainable)
        b1 = tf.get_variable('b1', [1, n_l1], trainable=trainable)
        net = tf.nn.relu(tf.matmul(s, w1_s) + tf.matmul(a, w1_a) + b1)
        return tf.layers.dense(net, 1, trainable=trainable)  # Q(s,a)

Q-target 以及 Td-Error

q_target = self.R + GAMMA * q_
td_error = tf.losses.mean_squared_error(labels=q_target, predictions=q)
self.ctrain = tf.train.AdamOptimizer(LR_C).minimize(td_error, var_list=self.ce_params)

  • 學習

def learn(self):
    # soft target replacement

    indices = np.random.choice(MEMORY_CAPACITY, size=BATCH_SIZE)
    bt = self.memory[indices, :]
    bs = bt[:, :self.s_dim]
    ba = bt[:, self.s_dim: self.s_dim + self.a_dim]
    br = bt[:, -self.s_dim - 1: -self.s_dim]
    bs_ = bt[:, -self.s_dim:]

    self.sess.run(self.atrain, {self.S: bs})
    self.sess.run(self.ctrain, {self.S: bs, self.a: ba, self.R: br, self.S_: bs_})

  • 儲存記憶

def store_transition(self, s, a, r, s_):
    transition = np.hstack((s, a, [r], s_))
    index = self.pointer % MEMORY_CAPACITY  # replace the old memory with new memory
    self.memory[index, :] = transition
    self.pointer += 1
    if self.pointer > MEMORY_CAPACITY:      # indicator for learning
        self.memory_full = True


  • 不收斂的問題


  • V-rep使用

這部分可能等開學後和老師進行討論,有想果幾種可能方案進行,目前在 V-rep 中碰撞等問題尚未解決。

  • 2維問題轉成 3維問題


  • 速度規劃


  • Path planing

嘗試將目前的 2D 機械手臂進行路徑規劃並輸出 G-code


